WO2022016350A1 - 光场图像的处理方法、编码器、解码器及存储介质 - Google Patents

光场图像的处理方法、编码器、解码器及存储介质 Download PDF

Info

Publication number
WO2022016350A1
WO2022016350A1 PCT/CN2020/103177 CN2020103177W WO2022016350A1 WO 2022016350 A1 WO2022016350 A1 WO 2022016350A1 CN 2020103177 W CN2020103177 W CN 2020103177W WO 2022016350 A1 WO2022016350 A1 WO 2022016350A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sub
light field
aperture
resolution
Prior art date
Application number
PCT/CN2020/103177
Other languages
English (en)
French (fr)
Inventor
元辉
付丛睿
李明
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to EP20946182.1A priority Critical patent/EP4156685A4/en
Priority to CN202080104551.6A priority patent/CN116210219A/zh
Priority to PCT/CN2020/103177 priority patent/WO2022016350A1/zh
Publication of WO2022016350A1 publication Critical patent/WO2022016350A1/zh
Priority to US18/079,174 priority patent/US20230106939A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/21Indexing scheme for image data processing or generation, in general involving computational photography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10052Images from lightfield camera

Definitions

  • the embodiments of the present application relate to a light field image encoding and decoding technology, and in particular, to a light field image processing method, an encoder, a decoder, and a storage medium.
  • a light field image collected by a common camera array or a light field camera has a large size, and therefore, it is often necessary to compress the light field image to save storage space.
  • the compression schemes for light field images mainly include image-based direct compression methods and pseudo-video sequence-based indirect compression methods.
  • the current codec standard is mainly used in the processing of ordinary images, so the effect of directly using the current codec standard to process light field images is not ideal; at the same time, the indirect compression method that converts light field images into sub-aperture images and then processes them , will greatly increase the computational complexity, and the processing accuracy is not high.
  • Embodiments of the present application provide a light field image processing method, an encoder, a decoder and a storage medium, which can reduce the transmission code stream and greatly improve the encoding and decoding efficiency.
  • an embodiment of the present application provides a method for processing a light field image, which is applied to a light field image decoder, and the method includes:
  • the initial sub-aperture image is input into a super-resolution reconstruction network, and the reconstructed sub-aperture image is output; wherein, the spatial resolution and angular resolution of the reconstructed sub-aperture image are both greater than the spatial resolution of the initial sub-aperture image. rate and angular resolution;
  • the reconstructed sub-aperture image is input into the quality enhancement network, and the target sub-aperture image is output.
  • an embodiment of the present application provides a method for processing a light field image, which is applied to a light field image encoder, and the method includes:
  • An encoding process is performed based on the image pseudo-sequence to generate a code stream.
  • an embodiment of the present application provides a light field image decoder, where the light field image decoder includes: a parsing part and a first acquiring part,
  • the parsing part is configured to parse the code stream to obtain an initial sub-aperture image
  • the first acquisition part is configured to input the initial sub-aperture image into the super-resolution reconstruction network, and output the reconstructed sub-aperture image; wherein, the spatial resolution and angular resolution of the reconstructed sub-aperture image are greater than the spatial resolution and angular resolution of the initial sub-aperture image; and inputting the reconstructed sub-aperture image into a quality enhancement network to output a target sub-aperture image.
  • an embodiment of the present application provides a light field image decoder.
  • the light field image decoder includes a first processor and a first memory storing executable instructions of the first processor. When the instructions are executed by the first processor, the above-described light field image processing method is implemented.
  • an embodiment of the present application provides a light field image encoder, the light field image encoder includes: a second acquiring part and a generating part,
  • the second acquisition part is configured to acquire a microlens image through acquisition by a light field camera
  • the generating section configured to generate a sub-aperture image according to the microlens image
  • the second acquisition part is further configured to perform downsampling processing on the sub-aperture image to obtain an initial sub-aperture image
  • the generating part is further configured to generate an image pseudo-sequence corresponding to the sub-aperture image based on a preset arrangement order and the initial sub-aperture image; and perform encoding processing based on the image pseudo-sequence to generate a code stream.
  • an embodiment of the present application provides a light field image encoder, where the light field image encoder includes a second processor and a second memory storing executable instructions of the second processor. When executed by the second processor, the above-described light field image processing method is implemented.
  • an embodiment of the present application provides a computer-readable storage medium on which a program is stored, which is applied to a light-field image decoder and a light-field image encoder.
  • the program is executed by the first processor, the In the light field image processing method according to the first aspect, when the program is executed by the second processor, the light field image processing method according to the second aspect is implemented.
  • the embodiments of the present application provide a light field image processing method, an encoder, a decoder and a storage medium.
  • the light field image decoder parses a code stream to obtain an initial sub-aperture image; the initial sub-aperture image is input to a super-resolution reconstruction network , output the reconstructed sub-aperture image; the spatial resolution and angular resolution of the reconstructed sub-aperture image are both greater than those of the original sub-aperture image; input the reconstructed sub-aperture image to the quality enhancement network , output the target subaperture image.
  • the light field image encoder acquires the microlens image through the light field camera, and generates the sub-aperture image according to the micro-lens image; performs down-sampling processing on the sub-aperture image to obtain the initial sub-aperture image; based on the preset arrangement order and the initial sub-aperture image , generate an image pseudo-sequence corresponding to the sub-aperture image; perform encoding processing based on the image pseudo-sequence to generate a code stream.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency. It can be seen that this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • the light field image encoder before compressing and encoding the light field image, the light field image encoder can perform spatial and angular downsampling processing on the light field image to obtain a low resolution light field image, which can reduce the need for encoding.
  • the light field image decoder can use the super-resolution reconstruction network to perform spatial and angular upsampling on the low-resolution light field image, and then construct a high-resolution light field image. , so that the transmission code stream can be reduced, and the encoding and decoding efficiency can be greatly improved.
  • Figure 1 is a schematic diagram of the principle of light field imaging
  • FIG. 2 is a schematic diagram of a direct compression method
  • FIG. 3 is a schematic diagram of an indirect compression method
  • FIG. 4 is a schematic diagram 1 of the realization flow of the processing method of the light field image
  • 5 is a schematic diagram 1 of a preset arrangement sequence
  • FIG. 6 is a schematic diagram 2 of a preset arrangement sequence
  • Fig. 7 is the schematic diagram 3 of preset arrangement sequence
  • FIG. 8 is a schematic diagram 4 of a preset arrangement sequence
  • FIG. 10 is a schematic diagram of the processing flow of the decoding end
  • Figure 11 is a schematic structural diagram of a super-resolution reconstruction network
  • FIG. 12 is a schematic structural diagram of a branch module
  • Fig. 13 is the structural representation of ResDB module
  • FIG. 15 is a schematic diagram 2 of the realization flow of the processing method of the light field image
  • Figure 16 is a partial enlarged view of a microlens image
  • Fig. 17 is the schematic diagram of preliminary extraction
  • Figure 18 is a sub-aperture image array
  • 19 is a schematic diagram of a light field image encoder and a light field image decoder for implementing light field image processing
  • 20 is a schematic structural diagram of a light field image processing method
  • FIG. 21 is a schematic diagram 1 of the composition and structure of a light field image decoder
  • FIG. 22 is a schematic diagram 2 of the composition structure of the light field image decoder
  • Figure 23 is a schematic diagram of the composition structure of the light field image encoder 1;
  • FIG. 24 is a second schematic diagram of the composition and structure of the light field image encoder.
  • the light field is the sum of light rays in any position and direction in space, and the light field image can be acquired in the following two ways: acquisition by a common camera array or acquisition by a light field camera.
  • a two-dimensional array with two-dimensional images as elements can be obtained through a common camera array, in which each image represents light information at different viewing angles; while a light-field camera is obtained with a macro pixel block as a unit of a larger size different pixels in the same macro pixel block represent the light information of the same object point at different viewing angles.
  • the body of a light field camera is similar to that of a traditional digital camera, but the internal structure is very different.
  • Traditional cameras capture light with the main lens and focus it on the film or photoreceptor behind the lens. The sum of all the light forms a small dot on the photo that displays the image.
  • the light field camera is equipped with a microscope array filled with 90,000 micro lenses between the main lens and the photoreceptor. Each small mirror array receives the light from the main mirror neck, transmits it to the photoreceptor, and precipitates Focus the light and convert the light data to record it digitally.
  • the light field camera's built-in software operates the "expanded light field" to track where each ray of light falls on the image at different distances, and after digital refocusing, the perfect photo is captured.
  • light field cameras are unconventional, reducing the lens aperture size and depth of field, using a small mirror array to control the extra light, revealing the depth of field of each image, and then projecting tiny sub-images onto the photoreceptor, all focused images around the hazy aperture Change to "Clear", maintaining the increased luminosity, reduced photo-taking time and graining of the older cameras' large aperture, without sacrificing depth of field and image clarity.
  • Figure 1 is a schematic diagram of the principle of light field imaging.
  • the light field camera needs to add a microlens array in front of the ordinary main lens to capture the light in the scene, and will capture the light of the object.
  • the field image is processed by a series of algorithms to be presented.
  • the output of an imaging sensor is called a lenslet image, which consists of many tiny images produced by a microlens.
  • the microlens image in the spatial domain (x, y) can be further transformed into multiple sub-aperture images (SAI) in the angular domain (s, t) after processing, corresponding to different perspective views for subsequent applications.
  • SAI sub-aperture images
  • the compression schemes for light field images mainly include image-based direct compression methods and pseudo-video sequence-based indirect compression methods.
  • the image-based direct compression method is to directly use the existing image compression scheme for the microlens image output by the sensor, such as Joint Photographic Experts Group (JPEG), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), etc.
  • JPEG Joint Photographic Experts Group
  • HEVC High Efficiency Video Coding
  • VVC Versatile Video Coding
  • Figure 2 is a schematic diagram of the direct compression method.
  • the light field data is collected by a light field camera, and output as a microlens image through the sensor, and then the microlens image is directly sent to the existing image coding as a normal natural image.
  • Standard such as JPEG, HEVC, VVC, etc. are compressed, and the bit stream is transmitted to the decoding end.
  • the corresponding decoding method is used for decoding, and the light field image, that is, the microlens image, is reconstructed.
  • the light field microlens image Due to the existence of the microlens array in the light field camera, the light field microlens image produces circle-like artifacts, which have different characteristics from traditional natural images, while the existing image compression schemes are mainly designed for natural images, not It can well achieve efficient compression of light field images.
  • the indirect compression method based on pseudo video sequence is to process the light field image captured by the light field camera into multiple sub-aperture images, and then these sub-aperture images can be sorted in different ways, such as rotation order, raster scan order, zigzag and U They are arranged in a pseudo-sequence, and the existing coding standards are used to compress and encode them.
  • Figure 3 is a schematic diagram of the indirect compression method.
  • the output of the sensor can be converted into a sub-aperture image before compression.
  • the converted light field microlens image captured by the microlens-based light field camera can be Processed into multiple sub-aperture images, and these sub-aperture images can be arranged into pseudo-sequences according to different sorting methods, such as rotation order, raster scan order, zigzag and U-shaped scan order, etc., and compressed using existing coding standards encode and transmit the bit stream to the decoding end.
  • the decoding end performs corresponding decoding to obtain the reconstructed light field pseudo-sequence, and then rearranges it back to the light field sub-aperture image array according to the arrangement of the encoding end.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images, therefore, at the encoding end, a super-resolution reconstruction network can be used.
  • the downsampling process reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field. Image compression efficiency.
  • this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the efficiency of compression processing can be significantly improved, and at the same time, the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • the light field image encoder before compressing and encoding the light field image, can perform spatial and angular downsampling processing on the light field image to obtain a low resolution light field image, which can reduce the need for encoding.
  • the light field image decoder can use the super-resolution reconstruction network to perform spatial and angular upsampling on the low-resolution light field image, and then construct a high-resolution light field image. , so that the transmission code stream can be reduced, and the encoding and decoding efficiency can be greatly improved.
  • the embodiments of this application propose an end-to-end low-bit-rate light field image compression scheme based on light field sub-aperture images, which may mainly include light field image preprocessing, light field image data encoding, bit stream transmission, light field Decoding of image data, post-processing of light field images.
  • it may mainly include construction of a pseudo-sequence of low-resolution sub-aperture images, encoding and decoding of a pseudo-sequence of low-resolution sub-aperture images, and super-resolution reconstruction of the decoded low-resolution sub-aperture image.
  • a network structure for spatial and angular super-resolution reconstruction of light field images is designed based on the idea of EPI.
  • the super-resolution reconstruction network adopts a branch-fusion structure as a whole, which can effectively The resolution subaperture images are subjected to spatial and angular super-resolution reconstruction.
  • a quality enhancement network Enhance-net is also designed to enhance the image quality of the reconstructed sub-aperture images output by the super-resolution reconstruction network.
  • the experimental data shows that the light field image processing method proposed in this application can significantly improve the super-resolution of light field images through the super-resolution reconstruction network at the decoding end. The downsampling process reduces the amount of data transmitted.
  • an embodiment of the present application provides a method for processing a light field image, and the method for processing a light field image is applied to a light field image decoder.
  • FIG. 4 is a schematic diagram of an implementation flowchart of the method for processing a light field image.
  • a method for processing a light field image by a light field image decoder may include the following steps:
  • Step 101 Parse the code stream to obtain an initial sub-aperture image.
  • the light field image decoder may first parse the code stream, so as to obtain an initial sub-aperture image, where the initial sub-aperture image is a low-resolution light field sub-aperture image.
  • a frame of microlens image collected by the light field camera can be processed into multiple sub-aperture images, and after downsampling the multiple sub-aperture images respectively, a corresponding initial sub-aperture image is obtained, wherein, Since the initial sub-aperture image is obtained after down-sampling, the initial sub-aperture image is a sub-aperture image with lower resolution.
  • the light field image decoder can obtain the pseudo image sequence and the preset arrangement order by parsing the code stream.
  • the initial sub-aperture images may be reordered according to a preset arrangement order, thereby obtaining a corresponding pseudo-sequence of images.
  • the light field image decoder may obtain the initial sub-aperture image by inverse transformation by using the pseudo-sequence of the images and the preset arrangement order determined by parsing the code stream. That is to say, in the embodiment of the present application, the light field image decoder can parse the code stream, and then obtain the image pseudo-sequence and the preset arrangement order; then, based on the preset arrangement order and the image pseudo-sequence, finally An initial subaperture image can be generated.
  • the preset arrangement order may include any one of various arrangement orders
  • FIG. 5 is a schematic diagram 1 of the preset arrangement order
  • FIG. 6 is a schematic diagram 2 of the preset arrangement order
  • Figure 7 is a schematic diagram three of the preset arrangement order
  • Figure 8 is a schematic diagram of the preset arrangement order four, as shown in Figures 5, 6, 7, and 8,
  • the arrangement order can be rotation order, raster scan order, zigzag and U Any one of the raster scan sequences can be selected.
  • a raster scan sequence can be selected to arrange a plurality of sub-aperture images, so as to obtain a corresponding pseudo-sequence of images.
  • Step 102 Input the initial sub-aperture image into the super-resolution reconstruction network, and output the reconstructed sub-aperture image; wherein, the spatial resolution and angular resolution of the reconstructed sub-aperture image are greater than those of the initial sub-aperture image. resolution.
  • the light field image decoder can input the initial sub-aperture image into the super-resolution reconstruction network, so as to output the reconstructed sub-aperture image.
  • the reconstructed sub-aperture image is a high-resolution sub-aperture image, that is, the spatial resolution and angular resolution of the reconstructed sub-aperture image are greater than those of the original sub-aperture image.
  • the super-resolution reconstruction network may be a branch fusion super-resolution network (Branch Fusion Super Resolution Net, BFSRNet) model, and the image resolution is mainly improved by upsampling processing, which can make the output
  • the resolution of the image is higher than the resolution of the input image.
  • the core of the embodiments of the present application is to design a branch fusion neural network model (ie, a super-resolution reconstruction network) to improve the spatial resolution and angular resolution of an image, that is, spatial and angular super-resolution.
  • a branch fusion neural network model ie, a super-resolution reconstruction network
  • the light field image decoder when the light field image decoder inputs the initial sub-aperture image into the super-resolution reconstruction network and outputs the reconstructed sub-aperture image, it may specifically perform extraction processing based on the initial sub-aperture image, Obtain an Epipolar Plane Image (EPI) set; then upsampling and feature extraction can be performed on the initial EPI set to obtain a target EPI set; wherein, the resolution of the images in the target EPI set is greater than that of the images in the initial EPI set. resolution; finally, the target EPI set can be fused to obtain the reconstructed sub-aperture image.
  • EPI Epipolar Plane Image
  • the super-resolution reconstruction network may be an EPI-based light-field image super-resolution network structure, which can achieve super-resolution of the entire light field image by performing super-resolution on EPI images of different dimensions. distinguishing effect.
  • the initial EPI set may be at least one EPI set corresponding to any direction.
  • the light field image decoder when performing extraction processing based on the initial sub-aperture images to obtain the initial EPI set, may first perform sorting processing on the initial sub-aperture images to obtain a stereo image set; The stereoscopic image set is extracted according to at least one direction to obtain at least one initial EPI set; wherein one direction corresponds to one initial EPI set.
  • FIG. 9 is a schematic diagram of extracting the initial EPI set, as shown in FIG. 9
  • (a) is the schematic diagram after the initial sub-aperture images are arranged, that is, the low-resolution images obtained based on the preset arrangement order.
  • the light field image decoder can arrange and overlap the initial sub-aperture images according to any scanning method, such as rotation order, raster scanning order, zigzag scanning order, and U-shaped scanning order, to form a stereo collection, That is, a stereoscopic image set is obtained, as shown in (b).
  • the slicing operation is performed on the stereoscopic image set, that is, the pixels at the same height of all images in the stereoscopic image set are extracted, A series of images with linear characteristics are obtained, which are EPI images, as shown in (d).
  • Each row corresponds to an EPI image, so the EPI images after row slicing operations form an initial EPI set.
  • the same slicing operation is also performed along the x-axis and the n-axis, the respective initial EPI sets on the corresponding axes can be obtained.
  • one axis is one direction
  • one direction (one axis) corresponds to one initial EPI set.
  • the light field image decoder when the light field image decoder performs upsampling processing and feature extraction on the initial EPI set to obtain the target EPI set, it can first parse the code stream to obtain sampling parameters; The EPI set is up-sampled to obtain the sampled EPI set; then one or more convolutional layers can be used to extract the features of the sampled EPI set to obtain the feature image corresponding to the initial EPI set; finally, the sampled EPI set and Feature images to construct the target EPI set.
  • the sampling parameters may include a sampling multiple corresponding to upsampling for spatial resolution and a sampling multiple corresponding to upsampling for angular resolution.
  • the resolution of each EPI image in the initial EPI set may be improved by upsampling first, and then the resolution of each EPI image in the initial EPI set may be improved by upsampling It is calculated to perform shallow feature extraction and deep feature extraction on the sampled EPI set to obtain a feature image corresponding to the initial EPI set.
  • a super-resolved EPI image set can be obtained by skip-connecting the sampled EPI set and the feature image obtained after the up-sampling process, that is, a high-resolution target EPI set is constructed.
  • the resolution of the images in the target EPI set is greater than the resolution of the images in the initial EPI set.
  • the light field image decoder can sequentially complete the super-resolution of the initial EPI set corresponding to each direction according to the above method, and finally construct each The target EPI set corresponding to the direction.
  • the light field image decoder when the light field image decoder performs fusion processing on the target EPI set and obtains the reconstructed sub-aperture image, it can perform weighted average fusion on at least one target EPI set corresponding to at least one EPI set. , to obtain the reconstructed sub-aperture image.
  • the light field image decoder after the light field image decoder completes the construction of all target EPI sets, it can perform fusion processing on the target EPI sets corresponding to each direction, and finally complete the reconstruction of the sub-aperture image.
  • a weighted average linear method can be used to fuse the target EPI sets corresponding to each direction, so as to obtain a reconstructed sub-aperture image.
  • the light field image decoder can use a simple weighted average method for fusion processing. After fusion processing, the final output result of the super-resolution reconstruction network can be obtained, that is, the reconstructed sub-aperture image.
  • Step 103 Input the reconstructed sub-aperture image into the quality enhancement network, and output the target sub-aperture image.
  • the light field image decoder may continue to input the reconstructed sub-aperture image into the quality enhancement network , so that the target sub-aperture image can be output.
  • the image quality of the target sub-aperture image is higher than that of the reconstructed sub-aperture image.
  • the quality of the image may be further enhanced by using the quality enhancement network.
  • the quality enhancement network can enhance the quality of the reconstructed sub-aperture images of each frame, and can also enhance the quality of the reconstructed sub-aperture images of some frames. That is, the quality enhancement network is not fixed.
  • the quality enhancement network (Quality Enhancement Net, QENet) is mainly used to enhance the quality of at least one frame of image in the reconstructed sub-aperture image.
  • This embodiment provides a method for processing a light field image.
  • the light field image decoder parses a code stream to obtain an initial sub-aperture image; inputs the initial sub-aperture image into a super-resolution reconstruction network, and outputs the reconstructed sub-aperture image; wherein , the spatial resolution and angular resolution of the reconstructed sub-aperture image are larger than those of the original sub-aperture image; the reconstructed sub-aperture image is input into the quality enhancement network, and the target sub-aperture image is output.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency.
  • this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the efficiency of compression processing can be significantly improved, and at the same time, the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • FIG. 10 is a schematic diagram of the processing flow at the decoding end.
  • the initial sub-aperture image obtained by parsing the code stream is input to the super-resolution reconstruction
  • the reconstructed sub-aperture image can be output, and then after input to the quality enhancement network, the target sub-aperture image can be output.
  • the resolution of the reconstructed sub-aperture image is higher than that of the original sub-aperture image, and the quality of the target sub-aperture image is higher than that of the reconstructed sub-aperture image.
  • the light field image decoder designs a low-resolution light field image super-resolution reconstruction network EPI-SREH-Net.
  • the designed network can achieve both spatial and angular super-resolution of light field images.
  • the specific process of the light field image decoder to achieve super-resolution is as follows:
  • the outputs of the three branches of SR-net are fused using a weighted average linear method.
  • the super-resolution reconstruction network (such as the branch fusion super-resolution network model) mainly realizes super-resolution of the initial sub-aperture images in different dimensions at the same time, and fuses the super-resolution output by means of weighted average. result.
  • the quality enhancement network (such as the quality enhancement network model) mainly improves the image quality of the output result of the super-resolution reconstruction network.
  • the super-resolution reconstruction network as the core for realizing the super-resolution function, may specifically include at least one branch module and one fusion module.
  • FIG. 11 is a schematic structural diagram of a super-resolution reconstruction network.
  • the overall framework of the super-resolution reconstruction network may include a first branch (represented by B1_SRNet) module, a second The branch (denoted by B2_SRNet) module, the third branch (denoted by B3_SRNet) module, and the fusion (denoted by Fusion) module.
  • the three branch modules such as the first branch module, the second branch module and the third branch module, consider three directions in the stereo image set, and each branch module can be regarded as the initial EPI of different directions in the stereo image set Collection operations.
  • the three branch modules adopt similar network structures, and only the parameters of the ConvTranspose3d layer in the three-dimensional convolution module are different; here, the ConvTranspose3d layer may be called a transposed 3D convolutional layer, or May be called 3D deconvolution layer, 3D deconvolution layer, etc.
  • the output results can be input into the fusion module again.
  • the outputs of the three branch modules can be weighted and averaged through the fusion module, and finally the reconstructed sub-aperture image can be obtained.
  • the embodiment of the present application may adopt a simple weighted average method for fusion, and after fusion, the final output result of the branch fusion super-resolution network is obtained.
  • FIG. 12 is a schematic structural diagram of the branch module.
  • the super-resolution network firstly converts the current dimension through the upsampling module (that is, using a simple upsampling operator).
  • the resolution of the initial EPI image set is improved, and then it goes through the convolution calculation module, including: shallow feature extraction of two Conv2d layers and a series of ResDB modules (i.e.
  • ResDB 1 module, ..., ResDB d module, ..., ResDB D module, etc. deep feature extraction, where each ResDB module itself uses residual learning, the outputs of multiple ResDB modules are concatenated through the connection (Concat) layer, and then the 1 ⁇ 1 Conv2d layer is used to reduce the number of feature channels. .
  • the residual reconstruction also uses a Conv2d layer to connect the images obtained by the upsampling module by skipping to obtain a super-resolved EPI image set.
  • the ConvTranspose3d layer also includes a linear unit with leakage correction (Leaky Rectified Linear Unit, Leaky ReLU) function.
  • FIG. 13 is a schematic structural diagram of the ResDB module. As shown in FIG. 13 , it may be composed of three Conv2d layers with activation functions and one 1*1 Conv2d layer. Here, each ResDB module is densely connected, and the outputs of the three Conv2d layers are spliced through Concat, and then the 1*1 Conv2d layer is used for dimensionality reduction.
  • a skip connection is used between the ResDB module and the ResDB module, that is, the output of the previous block (that is, the ResDB d-1 module) is superimposed with the output of the current block (that is, the ResDB d module), and then the sum value is used as the next block (that is, the output of the ResDB d module). ResDB d+1 module) input.
  • the activation function can be a linear rectification function (Rectified Linear Unit, ReLU), also known as a modified linear unit, which is a commonly used activation function in artificial neural networks, usually referring to the ramp function and its variants. nonlinear function.
  • ReLU Rectified Linear Unit
  • the Leaky ReLU function is a variant of the classic (and widely used) ReLu function. Since the input value of the ReLU function is negative, the output is always 0, and its first derivative is always 0; in order to solve the shortcoming of the ReLU function, a leaky value is introduced in the negative half of the ReLU function, which is called Leaky ReLU function.
  • the sampling parameters obtained by parsing the code stream include the spatial resolution sampling multiple ⁇ and the angular resolution sampling multiple ⁇ .
  • the spatial resolution and angular resolution of a single EPI image in the initial EPI set are improved according to the sampling parameters; then, it can go through the shallow feature extraction of two convolutional layers and the deep layer of a series of densely connected ResDB convolutional blocks Feature extraction, and residual map reconstruction of a 2D convolutional layer, which are then added to the upsampled image to obtain a resolution-enhanced EPI image; further, a 3D deconvolution can be used to convert the three-dimensional resolution Features are improved; finally, 3D convolution is used to complete the reconstruction of super-resolution light field images.
  • the specific data flow for improving the resolution through the resolution network is: first parse to obtain a low-resolution image pseudo-sequence X s′, t′ ( x', y', n), slice along the x-axis dimension to obtain the initial EPI set consisting of x EPI images i ⁇ R n, i ⁇ x '. Then, use the SR-net network to perform ⁇ -fold super-resolution on each EPI image in the initial EPI set, and finally get the output of this part of the network i ⁇ R n, i ⁇ x '.
  • the x outputs can be re-stacked to get T s', t' (x', ⁇ y', ⁇ n).
  • the x outputs can be re-stacked to get T s', t' (x', ⁇ y', ⁇ n).
  • the quality enhancement network can enhance the image quality frame by frame, and can also enhance the image quality of some frames. That is to say, the quality enhancement network is not fixed.
  • the quality enhancement network package is the QENet model.
  • the QENet model can use any existing image quality enhancement network model, such as the super-resolution convolutional neural network (Super-resolution convolutional neural network).
  • SRCNN Artifacts Reduction Convolutional Neural Network
  • ARCNN Artifacts Reduction Convolutional Neural Network
  • VDSR Very Deep convolutional networks for Super-Resolution
  • RBPN Video Super-Resolution Resolution Recurrent Back-Projection Network for Video Super-Resolution
  • EDVR Video Restoration with Enhanced Deformable Convolutional Networks
  • FIG 14 is a schematic diagram of the structure of the quality enhancement network.
  • the quality enhancement network can be composed of four convolutional layers. Except for the convolutional layer used for reconstruction in the last layer, the remaining three layers are convolved with PReLU function activation.
  • the reconstructed sub-aperture images output by the super-resolution reconstruction network can be sent to the quality enhancement network one by one as input. and a 2D convolutional layer for image reconstruction, which can finally be translated into a sub-aperture image with enhanced quality, that is, the target sub-aperture image.
  • the input of the quality enhancement network is each high-resolution reconstructed sub-aperture image
  • the output of the quality enhancement network is the quality-enhanced high-resolution target sub-aperture image.
  • This embodiment provides a method for processing a light field image.
  • the light field image decoder parses a code stream to obtain an initial sub-aperture image; inputs the initial sub-aperture image into a super-resolution reconstruction network, and outputs the reconstructed sub-aperture image; wherein , the spatial resolution and angular resolution of the reconstructed sub-aperture image are larger than those of the original sub-aperture image; the reconstructed sub-aperture image is input into the quality enhancement network, and the target sub-aperture image is output.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency.
  • this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the efficiency of compression processing can be significantly improved, and at the same time, the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • an embodiment of the present application provides a method for processing a light field image, and the method for processing a light field image is applied to a light field image encoder.
  • FIG. 15 is an implementation flow of the method for processing a light field image. Schematic diagram 2, as shown in FIG. 15 , in an embodiment of the present application, the method for processing a light field image by a light field image encoder may include the following steps:
  • Step 201 acquiring a microlens image through acquisition by a light field camera, and generating a sub-aperture image according to the microlens image.
  • the light field image encoder may first acquire the microlens image through acquisition by the light field camera, and then may generate the sub-aperture image according to the microlens image.
  • the light field camera may be collected through the configured microscope array, so as to obtain a light field image, that is, a microlens image.
  • the light field image encoder does not directly use the existing codec after acquiring the microlens image through the light field camera.
  • the decoding standard technology compresses the microlens image, but can first convert the microlens image to obtain the corresponding sub-aperture image.
  • the brighter part in the central area of the microlens image can be extracted first, and then all the pixels in the same position are extracted and re-processed. arranged so that a sub-aperture image can be obtained.
  • Figure 16 is a partial enlarged view of the microlens image
  • Figure 17 is a schematic diagram of preliminary extraction.
  • the brightness of the microlens image gradually becomes darker from the center to the periphery, because the edges of the microlens image Partly due to the strong convergence of light, less light is received, so only the brighter part of the central area of the microlens image needs to be extracted.
  • the images of the extracted central area can be sorted according to the order of microlens arrangement, wherein, Ai is the central pixel of each image, Bi, Ci, Di are the pixels of three positions respectively, and i is 1, 2, ....
  • the sub-aperture image extraction process mainly includes extraction of the center points of the micro-lens image and sorting of the center points of the micro-lens image, thereby generating the sub-aperture image.
  • the sub-aperture image obtained by converting the microlens image acquired by the light field camera is an image formed by each angle of the light field, and may have angular resolution and spatial resolution.
  • the spatial resolution of the sub-aperture image can be the number of pixels of the sub-aperture image, that is, it can represent the number of microlenses in the microlens array of the light field camera; the angular resolution of the sub-aperture image can be the number of sub-aperture images.
  • the sub-aperture images generated based on the microlens images may be a series of 2D image arrays LF(x, y, s, t), of which there are a total of (s ⁇ t) images
  • the sub-aperture image that is, the angular resolution is (s ⁇ t)
  • the size of each sub-aperture image is (x ⁇ y), that is, the spatial resolution is (x ⁇ y).
  • Figure 18 shows the sub-aperture image array, as shown in Figure 18, it should be noted that in the sub-aperture image array, the sub-aperture images at the four corners are darker than those in the middle part, and the image distortion and The more blurred the phenomenon is. Therefore, the sub-aperture images at these edge corners are deleted in the general compression process.
  • the microlens array has 300 rows and 400 columns, that is, there are 1.2w microlenses, and there are 30 ⁇ 30 pixels under each microlens. That is, the radius is about 30 pixels, then 30 ⁇ 30 sub-aperture images can be extracted, each sub-aperture image is 300 ⁇ 400 pixels, and the sub-aperture image of the central angle can sequentially extract the pixel value of the center position of each microlens, A new 300 ⁇ 400 image is reassembled in sequence, which is the sub-aperture image.
  • the pixel value of the center of the microlens at the (0, 0) position is extracted, that is, the pixel value of the sub-aperture image at the (0, 0) position, and the pixel value of the center of the micro-lens at the (0, 1) position is the sub-aperture image.
  • Step 202 Perform down-sampling processing on the sub-aperture image to obtain an initial sub-aperture image.
  • the subaperture image may be down-sampled first, so as to obtain the sampling Post sub-aperture image.
  • the resolution of the sub-aperture image after sampling is smaller than the resolution of the sub-aperture image before down-sampling.
  • the light field image encoder when the light field image encoder performs downsampling processing on the sub-aperture image, it can separately perform down-sampling processing on the spatial resolution and angular resolution of the sub-aperture image according to the sampling parameters , and finally the construction of the initial sub-aperture image can be completed.
  • the sampling parameters may include the sampling multiple and the corresponding downsampling for the downsampling of the spatial resolution.
  • the sampling multiplier for downsampling the resolution angle rate may include the sampling multiple and the corresponding downsampling for the downsampling of the spatial resolution.
  • the light field image encoder can respectively perform angular resolution downsampling and spatial resolution downsampling on the sub-aperture image according to the corresponding sampling multiples, so as to obtain the initial sub-aperture image.
  • the initial sub-aperture image after down-sampling processing is a low-resolution sub-aperture image.
  • Step 203 based on the preset arrangement order and the initial sub-aperture image, generate an image pseudo-sequence corresponding to the sub-aperture image.
  • the light field image encoder after the light field image encoder performs downsampling processing on the sub-aperture image to obtain the initial sub-aperture image, it can further generate the corresponding sub-aperture image based on the preset arrangement order and the initial sub-aperture image. Image pseudo-sequence.
  • the light-field image encoder may sort the initial sub-aperture images in a certain order, that is, to generate the initial sub-aperture images based on a preset order. The corresponding image pseudo-sequence.
  • the preset arrangement order may be any one of various arrangement orders, for example, the preset arrangement order may be a rotation order, a raster scan order, a zigzag scan, and a U-shaped scan. any order, etc.
  • the light-field image encoder reordering process after the initial sub-aperture according to a preset arrangement order image may determine a corresponding sequence of image artifacts V L (x ', y' , n ).
  • Step 204 Perform encoding processing based on the image pseudo-sequence to generate a code stream.
  • the light-field image encoder can process the pseudo-sequence based on the image to generate the corresponding binary image sequence. code stream.
  • the light field image encoder can use an existing encoding standard to perform encoding processing, so that the image pseudo-sequence corresponding to the low-resolution initial sub-aperture image can be written into the code stream, Then the compression processing of the square image can be completed.
  • an efficient codec may select a codec of an existing mainstream coding standard such as HEVC or VVC.
  • the light field image encoder may further arrange the preset arrangement.
  • the sorting parameters corresponding to the order are written into the code stream. Wherein, the sorting parameter is used to indicate the preset sorting order of the generated image pseudo-sequence.
  • the light field image encoder uses the rotation order to sort the initial sub-aperture images, it can be written into the code stream after the sorting parameter is set to 0;
  • the initial sub-aperture image is sorted using raster scan order, which can be written into the code stream after setting the sorting parameter to 1;
  • the light-field image encoder uses zigzag to sort the initial sub-aperture image, it can be written in the code stream after the sorting parameter is set to 1.
  • the sorting parameter is set to 2
  • the light field image encoder uses the U-shaped scan order to sort the initial sub-aperture image, it can be written to the code stream after the sorting parameter is set to 4 middle.
  • the light field image encoder may also write the sampling parameters into the code stream and transmit it to decoding side.
  • the angular resolution and spatial resolution of the sub-aperture image can be reduced by downsampling, and a low-resolution initial sub-aperture image can be obtained.
  • the light-field image encoder can directly write the pseudo-sequence of the image corresponding to the initial sub-aperture into the code stream and transmit it to the light-field image decoder.
  • the light field image encoder acquires a microlens image through acquisition by a light field camera, and generates a sub-aperture image according to the microlens image; performs downsampling processing on the sub-aperture image to obtain an initial sub-aperture image; based on the preset arrangement order and the initial sub-aperture image, generate an image pseudo-sequence corresponding to the sub-aperture image; perform encoding processing based on the image pseudo-sequence to generate a code stream.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency.
  • this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the efficiency of compression processing can be significantly improved, and at the same time, the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • FIG. 19 is a schematic diagram of a light field image encoder and a light field image decoder for implementing light field image processing.
  • the light field image encoder may include A preprocessing module and an encoding module
  • the light field image decoder may include a decoding module and a post-processing module.
  • Figure 20 is a schematic structural diagram of the light field image processing method.
  • the preprocessing module can convert the microlens image into a sub-aperture image, and then follow The sub-aperture image is down-sampled by the sampling parameters to obtain a low-resolution initial sub-aperture image. Then, the initial sub-aperture can be reordered according to a preset arrangement order to output a pseudo-sequence of images; The image pseudo-sequence, sampling parameters and preset arrangement order are all written into the code stream, wherein the code stream is a binary bit stream, such as 1010001011101010.
  • the decoding module can parse the code stream to obtain the image pseudo-sequence, sampling parameters and the preset arrangement order, and then use the image pseudo-sequence and the preset arrangement order to obtain the low-resolution initial sub-sequence.
  • the post-processing module performs super-resolution reconstruction on the initial sub-aperture image through the super-resolution reconstruction network to obtain a high-resolution reconstructed sub-aperture image, and uses the quality enhancement network to further improve the image quality to obtain high-resolution images. quality target subaperture image.
  • the post-processing module may use the super-resolution network in the embodiment of the present application to realize super-resolution reconstruction. That is to say, the role of the super-resolution network is to super-resolution reconstruct the decoded low-resolution sub-aperture image into a high-resolution sub-aperture image.
  • the light field camera is different from the traditional imaging device that it can not only record the light intensity in the spatial scene, but also record its direction information. Due to the particularity of the microlens structure, the arrangement is very close, so there is only a very small horizontal parallax or vertical parallax between two adjacent sub-aperture images, which has a strong correlation. At the same time, each sub-aperture image itself also has a certain degree of spatial redundancy. Considering the strong correlation between adjacent sub-aperture images and the spatial correlation of each sub-aperture image itself, the light field image processing method proposed in this application does not need to compress all light field images at the encoding end.
  • the correlation between sub-aperture images can be used to reconstruct the unencoded part, thereby saving the code stream and improving the encoding efficiency.
  • the preprocessing method used by the preprocessing module at the encoding end may specifically be downsampling to reduce the resolution of the light field image, wherein the preprocessing method is not fixed, such as downsampling , color space conversion, etc.
  • the post-processing module at the decoding end correspondingly uses a super-resolution reconstruction network to restore the light field image. It can be seen that by using the down-sampling preprocessing method, the processing flow of the light field image mainly includes the construction of the pseudo-sequence of the low-resolution sub-aperture image, the encoding and decoding of the pseudo-sequence of the low-resolution sub-aperture image, and the decoded low-resolution image. Super-resolution reconstruction of subaperture images.
  • the light field sub-aperture image that needs to be compressed is firstly downsampled by several times according to the sampling parameters in space and angle, and then converted into YUV420 format. , construct a pseudo-sequence of low-resolution subaperture images. Then, the encoding and decoding of the pseudo-sequence is performed by using the existing encoding and decoding standards. Finally, at the decoding end, all sub-aperture images with the same resolution as the original sub-aperture image are reconstructed based on the designed super-resolution reconstruction network.
  • the light field image processing method proposed in this application can be applied to a low bit rate light field image compression scheme.
  • Most of the current light field images are of high resolution. If the entire light field image is directly compressed, the coding efficiency will be low and the code flow will be large.
  • the light field image processing method proposed in this application is just a low bit rate compression scheme, which can effectively solve the above problems.
  • the light field image decoder may first determine the first network parameters corresponding to the super-resolution reconstruction network, and then construct the super-resolution reconstruction network based on the first network parameters.
  • Super-resolution reconstruction network when constructing the super-resolution reconstruction network, the light field image decoder may first determine the first network parameters corresponding to the super-resolution reconstruction network, and then construct the super-resolution reconstruction network based on the first network parameters.
  • the light field image decoder may use various methods to determine the first network parameter corresponding to the super-resolution reconstruction network.
  • the light field image decoder may first obtain first training data; wherein, the first training data includes low-resolution images and corresponding high-resolution images; and then perform model training by using the first training data. , and finally the first network parameter can be determined.
  • the light field image decoder may parse the code stream to directly obtain the first network parameter. That is to say, at the encoding end, the light field image encoder can write the first network parameter into the code stream and transmit it to the decoding end.
  • the first training data may include multiple groups of images, each group of images is composed of a frame of low-resolution images and a corresponding frame of high-resolution images, and the first training data is used to train model parameters, to obtain the first network parameters of the super-resolution reconstruction network.
  • the model parameters can be trained according to the first training data; on the other hand, the model parameters can also be trained by the light field image encoder, and then The trained first network parameter is written into the code stream, and the light field image decoder directly obtains the first network parameter by parsing the code stream; the embodiment of the present application does not make any limitation.
  • the light field image decoder may first determine the second network parameters corresponding to the quality enhancement network, and then construct the quality enhancement network based on the second network parameters.
  • the light field image decoder may use various manners to determine the second network parameter corresponding to the quality enhancement network.
  • the light field image decoder may first obtain second training data; wherein, the first training data includes low-quality images and corresponding high-quality images; and then perform model training by using the second training data, and finally The second network parameter can be determined.
  • the light field image decoder may parse the code stream to directly obtain the second network parameter. That is to say, at the encoding end, the light field image encoder can write the second network parameter into the code stream and transmit it to the decoding end.
  • the second training data may include multiple groups of images, each group of images is composed of a frame of low-quality images and a corresponding frame of high-quality images, and the second training data is used to train model parameters to obtain A second network parameter of the quality enhancement network.
  • the model parameters can be trained according to the second training data; on the other hand, the model parameters can also be trained by the light field image encoder, and then the training The second network parameter is written into the code stream, and the light field image decoder directly obtains the second network parameter by parsing the code stream; the embodiment of the present application does not make any limitation.
  • the embodiments of the present application mainly solve the problem of low coding efficiency in the current light-field image compression process; downsampling during preprocessing (ie, downsampling processing) can be performed, and then reconstruction can be restored during post-processing. way to effectively solve the problem of low encoding and decoding efficiency at present.
  • the bicubic method is used for low-resolution downsampling.
  • the method here is not fixed, as long as the downsampling effect can be achieved That's it.
  • the super-resolution reconstruction of the decoded low-resolution sub-aperture image is not limited to the network structure designed in the embodiment of the present application, and other network structures with the same function can be replaced, but there may be a gap in reconstruction performance.
  • the network structure of the super-resolution reconstruction network in the embodiment of the present application may be changed. Specifically, the three branches of the branch fusion super-resolution network model can be appropriately deleted to meet the needs of different computing capabilities in different scenarios.
  • the network structure of the quality enhancement network is usually the ARCNN model in practical applications, but it is not limited to this, as long as the image quality enhancement effect can be satisfied. And all changes have the potential to make a difference to the final image quality.
  • the application of the super-resolution reconstruction network of the light field image is not limited to the post-processing of the decoded pseudo-sequence, but can also be applied to the inter-frame or intra-frame prediction part of the light field image encoder. Improve prediction accuracy.
  • the light field image processing method proposed in the embodiments of the present application greatly improves the coding efficiency.
  • the spatial and angular resolution of the light field image is down-sampled before compression encoding, which greatly reduces the amount of image data that needs to be encoded; after decoding, a super-resolution reconstruction network is used to perform corresponding up-sampling to restore high-resolution light field images.
  • the code rate can be significantly reduced, the coding efficiency can be greatly improved, and the transmission code stream can be reduced.
  • the design of the quality enhancement network adopted in the embodiments of the present application greatly improves the image quality. Therefore, applying the proposed super-resolution reconstruction network and quality enhancement network to the process of light field image compression can significantly improve the quality of the compressed light field image, and the super-resolution of light field images can be improved significantly. Effect.
  • the light field image processing method proposed in the embodiments of the present application can utilize the super-resolution reconstruction network and the quality enhancement network to simultaneously achieve two effects of super-resolution and quality improvement of the visual field image.
  • the decoding side since the decoding side can use the super-resolution reconstruction network to perform super-resolution reconstruction on the low-resolution sub-aperture image, therefore, on the encoding side, the light field image encoder converts the microlens image into After converting to a sub-aperture image, a low-resolution sub-aperture image is obtained through downsampling, and the existing codec standard technology can be used for encoding processing, which can be well adapted to the existing codec framework, without the need for encoding and decoding.
  • the structure of the light field image decoder is modified.
  • This embodiment provides a method for processing a light field image.
  • the light field image decoder parses a code stream to obtain an initial sub-aperture image; inputs the initial sub-aperture image into a super-resolution reconstruction network, and outputs the reconstructed sub-aperture image; wherein , the spatial resolution and angular resolution of the reconstructed sub-aperture image are larger than those of the original sub-aperture image; the reconstructed sub-aperture image is input into the quality enhancement network, and the target sub-aperture image is output.
  • the light field image encoder acquires the microlens image through the light field camera, and generates the sub-aperture image according to the micro-lens image; performs down-sampling processing on the sub-aperture image to obtain the initial sub-aperture image; based on the preset arrangement order and the initial sub-aperture image , generate an image pseudo-sequence corresponding to the sub-aperture image; perform encoding processing based on the image pseudo-sequence to generate a code stream.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency.
  • this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the efficiency of compression processing can be significantly improved, and at the same time, the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • the super-resolution reconstruction network and the quality enhancement network can be implemented in a PyTorch platform using an Nvidia GTX 1080Ti GPU on a PC.
  • the training set and test set used in the experiment come from the real light field dataset EPFL, Lytro Illum and the synthetic light field dataset HCI.
  • the experiment achieves 2x super-resolution results in both space and angle, that is to say, the sampling multiple of spatial resolution and the sampling multiple of angular resolution in the sampling parameters are both set to 2, that is, the resolution of the input light field image is x ⁇ y ⁇ s ⁇ t, the resolution of the output light field image is 2x ⁇ 2y ⁇ 2s ⁇ 2t.
  • PSNR Peak Signal to Noise Ratio
  • SSIM Structural SIMilarity
  • the directly obtained light field sub-aperture image is used as a high-resolution image, and the low-resolution sub-aperture image obtained by downsampling its spatial angle by 2 times is used as the input of the network using the Matlab platform, as follows. :
  • EPFL dataset The input light field image resolution is 217 ⁇ 312 ⁇ 4 ⁇ 4, and the output resolution is 434 ⁇ 624 ⁇ 8 ⁇ 8.
  • Lytro Illum dataset input resolution is 187 ⁇ 270 ⁇ 4 ⁇ 4, and output resolution is 374 ⁇ 540 ⁇ 8 ⁇ 8.
  • HCI dataset input resolution is 256 ⁇ 256 ⁇ 4 ⁇ 4, and output resolution is 512 ⁇ 512 ⁇ 8 ⁇ 8.
  • the light field image processing method proposed in this application is applied to a low bit rate light field compression scheme of sub-aperture images, which significantly improves the coding performance of light field images.
  • a complete light field image is encoded and transmitted.
  • the light field image encoder needs to encode 64 frames of 540P pseudo-sequences and use low bit rate compression. After the scheme, only 16 frames of 270P pseudo-sequences need to be encoded, which greatly improves the encoding efficiency and reduces the transmission code stream.
  • the embodiments of this application propose an end-to-end low-bit-rate light-field image compression scheme based on light-field sub-aperture images, which may mainly include light-field image preprocessing, light-field image data encoding, bit Streaming, decoding of light field image data, post-processing of light field images.
  • it may mainly include construction of a pseudo-sequence of low-resolution sub-aperture images, encoding and decoding of a pseudo-sequence of low-resolution sub-aperture images, and super-resolution reconstruction of the decoded low-resolution sub-aperture image.
  • a network structure for spatial and angular super-resolution reconstruction of light field images is designed based on the idea of EPI.
  • the super-resolution reconstruction network adopts a branch-fusion structure as a whole, which can effectively The resolution subaperture images are subjected to spatial and angular super-resolution reconstruction.
  • a quality enhancement network Enhance-net is also designed to enhance the image quality of the reconstructed sub-aperture images output by the super-resolution reconstruction network.
  • the experimental data shows that the light field image processing method proposed in this application can significantly improve the super-resolution of light field images through the super-resolution reconstruction network at the decoding end. The downsampling process reduces the amount of data transmitted.
  • This embodiment provides a method for processing a light field image.
  • the light field image decoder parses a code stream to obtain an initial sub-aperture image; inputs the initial sub-aperture image into a super-resolution reconstruction network, and outputs the reconstructed sub-aperture image; wherein , the spatial resolution and angular resolution of the reconstructed sub-aperture image are larger than those of the original sub-aperture image; the reconstructed sub-aperture image is input into the quality enhancement network, and the target sub-aperture image is output.
  • the light field image encoder acquires the microlens image through the light field camera, and generates the sub-aperture image according to the micro-lens image; performs down-sampling processing on the sub-aperture image to obtain the initial sub-aperture image; based on the preset arrangement order and the initial sub-aperture image , generate an image pseudo-sequence corresponding to the sub-aperture image; perform encoding processing based on the image pseudo-sequence to generate a code stream.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency.
  • this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the efficiency of compression processing can be significantly improved, and at the same time, the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • FIG. 21 is a schematic diagram of the composition and structure of a light field image decoder.
  • the light field image decoder 200 proposed in this embodiment of the present application may include a parsing part 201 and first acquisition part 202 , determination part 203 and construction part 204 .
  • the parsing part 201 is configured to parse the code stream to obtain an initial sub-aperture image
  • the first acquisition part 202 is configured to input the initial sub-aperture image into the super-resolution reconstruction network, and output the reconstructed sub-aperture image; wherein, the spatial resolution and angular resolution of the reconstructed sub-aperture image are equal to each other. greater than the spatial resolution and angular resolution of the initial sub-aperture image; and inputting the reconstructed sub-aperture image into a quality enhancement network to output a target sub-aperture image.
  • the parsing part 201 is specifically configured to parse the code stream to obtain an image pseudo-sequence and a preset arrangement order; based on the preset arrangement order and the image pseudo-sequence, The initial subaperture image is generated.
  • the first acquisition part 202 is specifically configured to perform extraction processing based on the initial sub-aperture image to obtain an initial epipolar line plan EPI set; Sampling processing and feature extraction to obtain a target EPI set; wherein, the resolution of the image in the target EPI set is greater than the resolution of the image in the initial EPI set; Perform fusion processing on the target EPI set to obtain the reconstructed sub-aperture image.
  • the first acquisition part 202 is further specifically configured to perform sorting processing on the initial sub-aperture images to obtain a stereoscopic image set; and perform extraction processing on the stereoscopic image set according to at least one direction , obtain at least one initial EPI set; wherein, one direction corresponds to one initial EPI set.
  • the first obtaining part 202 is further specifically configured to parse the code stream to obtain sampling parameters; perform up-sampling processing on the EPI set according to the sampling parameters to obtain the EPI after sampling Set; use one or more convolution layers to perform feature extraction on the sampled EPI set to obtain a feature image corresponding to the initial EPI set; build the target EPI based on the sampled EPI set and the feature image gather.
  • the first obtaining part 202 is further specifically configured to perform weighted average fusion on at least one target EPI set corresponding to at least one EPI set to obtain the reconstructed sub-aperture image.
  • the determining part 203 is configured to determine the first network parameter corresponding to the super-resolution reconstruction network
  • the construction part 204 is configured to construct the super-resolution reconstruction network based on the first network parameter.
  • the determining part 203 is specifically configured to acquire first training data; wherein, the first training data includes a low-resolution image and a corresponding high-resolution image; Model training is performed on the first training data, and the first network parameter is determined.
  • the determining part 203 is further specifically configured to parse the code stream to obtain the first network parameter.
  • the determining part 203 is further configured to determine a second network parameter corresponding to the quality enhancement network
  • the constructing part 204 is further configured to construct the quality enhancement network based on the second network parameter.
  • the determining part 203 is further specifically configured to acquire second training data; wherein, the second training data includes low-quality images and corresponding high-quality images; The second training data is used for model training, and the second network parameter is determined.
  • the determining part 203 is further specifically configured to parse the code stream to obtain the second network parameter.
  • FIG. 22 is a second schematic diagram of the composition and structure of the light field image decoder.
  • the light field image decoder 200 proposed in this embodiment of the present application may further include a first processor 205, and store the first processor 205 executable.
  • the above-mentioned first processor 205 is configured to parse the code stream to obtain an initial sub-aperture image; input the initial sub-aperture image into a super-resolution reconstruction network, and output the reconstructed sub-aperture image image; wherein, the spatial resolution and angular resolution of the reconstructed sub-aperture image are greater than the spatial resolution and angular resolution of the initial sub-aperture image; input the reconstructed sub-aperture image into a quality enhancement network , output the target subaperture image.
  • each functional module in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the integrated unit is implemented in the form of software function modules and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or correct. Part of the contribution made by the prior art or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal A computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the method in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • An embodiment of the present application provides a light field image decoder, the light field image decoder parses a code stream to obtain an initial sub-aperture image; inputs the initial sub-aperture image into a super-resolution reconstruction network, and outputs a reconstructed sub-aperture image; Among them, the spatial resolution and angular resolution of the reconstructed sub-aperture image are larger than those of the original sub-aperture image; the reconstructed sub-aperture image is input into the quality enhancement network, and the target sub-aperture image is output.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency.
  • this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the efficiency of compression processing can be significantly improved, and at the same time, the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • FIG. 23 is a schematic diagram of the composition and structure of a light field image encoder.
  • the light field image encoder 300 proposed by the embodiment of the present application includes: Two acquisition part 301 , generation part 302 , encoding part 303 .
  • the second acquisition part 301 is configured to acquire a microlens image through acquisition by a light field camera
  • the generating part 302 is configured to generate a sub-aperture image according to the microlens image
  • the second acquisition part 301 is further configured to perform down-sampling processing on the sub-aperture image to obtain an initial sub-aperture image;
  • the generating section 302 is further configured to generate an image pseudo-sequence corresponding to the sub-aperture image based on the preset arrangement order and the initial sub-aperture image; and perform encoding processing based on the image pseudo-sequence to generate a code stream.
  • the encoding part 303 is configured to, after generating the pseudo-image sequence corresponding to the sub-aperture image based on the preset arrangement order and the initial sub-aperture image, set the sorting parameter Write a code stream; wherein, the sorting parameter is used to indicate the preset sorting order.
  • the second acquisition part 301 is specifically configured to perform down-sampling processing on the spatial resolution and angular resolution of the sub-aperture image according to sampling parameters, so as to complete the initial Construction of subaperture images.
  • the encoding part 303 is further configured to perform down-sampling processing on the sub-aperture image, and after obtaining the initial sub-aperture image, write the sampling parameter into the code stream.
  • FIG. 24 is a second schematic diagram of the composition and structure of the light field image encoder.
  • the light field image encoder 300 proposed in this embodiment of the present application may further include a second processor 304, and store the second processor 304 executable.
  • the above-mentioned second processor 304 is configured to acquire a microlens image through acquisition by a light field camera, and generate a sub-aperture image according to the microlens image; Sampling to obtain an initial sub-aperture image; based on a preset arrangement order and the initial sub-aperture image, an image pseudo-sequence corresponding to the sub-aperture image is generated; encoding processing is performed based on the image pseudo-sequence to generate a code stream.
  • each functional module in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the integrated unit is implemented in the form of software function modules and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or correct. Part of the contribution made by the prior art or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal A computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the method in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • An embodiment of the present application provides a light field image encoder, the light field image encoder acquires a microlens image through acquisition by a light field camera, and generates a sub-aperture image according to the microlens image; performs downsampling on the sub-aperture image to obtain The initial sub-aperture image; based on the preset arrangement order and the initial sub-aperture image, an image pseudo-sequence corresponding to the sub-aperture image is generated; encoding processing is performed based on the image pseudo-sequence to generate a code stream.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency.
  • this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the efficiency of compression processing can be significantly improved, and at the same time, the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • Embodiments of the present application provide a computer-readable storage medium and a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method described in the foregoing embodiments is implemented.
  • a program instruction corresponding to a light field image processing method in this embodiment may be stored on a storage medium such as an optical disc, a hard disk, a U disk, etc.
  • a storage medium such as an optical disc, a hard disk, a U disk, etc.
  • the initial sub-aperture image is input into a super-resolution reconstruction network, and the reconstructed sub-aperture image is output; wherein, the spatial resolution and angular resolution of the reconstructed sub-aperture image are both greater than the spatial resolution of the initial sub-aperture image. rate and angular resolution;
  • the reconstructed sub-aperture image is input into the quality enhancement network, and the target sub-aperture image is output.
  • the program instructions corresponding to a light field image processing method in this embodiment may be stored on a storage medium such as an optical disc, a hard disk, a U disk, etc.
  • a storage medium such as an optical disc, a hard disk, a U disk, etc.
  • An encoding process is performed based on the image pseudo-sequence to generate a code stream.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • An apparatus implements the functions specified in a flow or flows of the implementation flow diagram and/or a block or blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the implementing flow diagram and/or the block or blocks of the block diagram.
  • the embodiments of the present application provide a light field image processing method, an encoder, a decoder and a storage medium.
  • the light field image decoder parses a code stream to obtain an initial sub-aperture image; the initial sub-aperture image is input to a super-resolution reconstruction network , output the reconstructed sub-aperture image; the spatial resolution and angular resolution of the reconstructed sub-aperture image are both greater than those of the original sub-aperture image; input the reconstructed sub-aperture image to the quality enhancement network , output the target subaperture image.
  • the light field image encoder acquires the microlens image through the light field camera, and generates the sub-aperture image according to the micro-lens image; performs down-sampling processing on the sub-aperture image to obtain the initial sub-aperture image; based on the preset arrangement order and the initial sub-aperture image , generate an image pseudo-sequence corresponding to the sub-aperture image; perform encoding processing based on the image pseudo-sequence to generate a code stream.
  • a super-resolution reconstruction network can be used at the decoding end to perform spatial and angular super-resolution reconstruction on low-resolution sub-aperture images
  • downsampling can be used at the encoding end
  • the processing reduces the spatial resolution and angular resolution of the sub-aperture image, so that only part of the light field image can be encoded and decoded, which effectively reduces the transmitted code stream data, greatly improves the encoding and decoding efficiency, and further improves the light field image quality. compression efficiency. It can be seen that this application adopts the design of the super-resolution reconstruction network, which can simultaneously improve the spatial resolution and angular resolution of the light field image.
  • the present application can also use a quality enhancement network to improve the image quality of the result output by the super-resolution reconstruction network, so the image quality can be improved.
  • the light field image encoder before compressing and encoding the light field image, the light field image encoder can perform spatial and angular downsampling processing on the light field image to obtain a low resolution light field image, which can reduce the need for encoding.
  • the light field image decoder can use the super-resolution reconstruction network to perform spatial and angular upsampling on the low-resolution light field image, and then construct a high-resolution light field image. , so that the transmission code stream can be reduced, and the encoding and decoding efficiency can be greatly improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

一种光场图像的处理方法、编码器、解码器及存储介质,该光场图像的处理方法包括:光场图像解码器解析码流,获得初始子孔径图像;将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率;将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。光场图像编码器通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像;对子孔径图像进行下采样处理,获得初始子孔径图像;基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列;基于图像伪序列进行编码处理,生成码流。

Description

光场图像的处理方法、编码器、解码器及存储介质 技术领域
本申请实施例涉及光场图像的编解码技术,尤其涉及一种光场图像的处理方法、编码器、解码器及存储介质。
背景技术
通过普通相机阵列采集或者通过光场相机采集的光场图像尺寸较大,因此,往往需要通过对光场图像进行压缩来节省存储空间。目前,针对光场图像的压缩方案主要可以包括基于图像的直接压缩方法和基于伪视频序列的间接压缩方法。
目前的编解码标准主要应用于普通图像的处理,因此直接使用目前的编解码标准处理光场图像的效果并不理想;同时,将光场图像转化为子孔径图像之后再进行处理的间接压缩方法,会大大增加计算复杂度,且处理精度并不高。
可见,针对光场图像的压缩方案,无法在现有的编解码标准的技术上进行高效率、高精度的处理。
发明内容
本申请实施例提供一种光场图像的处理方法、编码器、解码器及存储介质,可以减少传输码流,大大提高编解码效率。
本申请实施例的技术方案是这样实现的:
第一方面,本申请实施例提供了一种光场图像的处理方法,应用于光场图像解码器,所述方法包括:
解析码流,获得初始子孔径图像;
将所述初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,所述重建后子孔径图像的空间分辨率和角度分辨率均大于所述初始子孔径图像的空间分辨率和角度分辨率;
将所述重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。
第二方面,本申请实施例提供了一种光场图像的处理方法,应用于光场图像编码器,所述方法包括:
通过光场相机采集获得微透镜图像,并根据所述微透镜图像生成子孔径图像;
对所述子孔径图像进行下采样处理,获得初始子孔径图像;
基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列;
基于所述图像伪序列进行编码处理,生成码流。
第三方面,本申请实施例提供了一种光场图像解码器,所述光场图像解码器包括:解析部分和第一获取部分,
所述解析部分,配置为解析码流,获得初始子孔径图像;
所述第一获取部分,配置为将所述初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,所述重建后子孔径图像的空间分辨率和角度分辨率均大于所述初始子孔径图像的空间分辨率和角度分辨率;以及将所述重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。
第四方面,本申请实施例提供了一种光场图像解码器,所述光场图像解码器包括第一处理器、存储有所述第一处理器可执行指令的第一存储器,当所述指令被所述第一处理器执行时,实现如上所述的光场图像的处理方法。
第五方面,本申请实施例提供了一种光场图像编码器,所述光场图像编码器包括:第二获取部分和生成部分,
所述第二获取部分,配置为通过光场相机采集获得微透镜图像;
所述生成部分,配置为根据所述微透镜图像生成子孔径图像;
所述第二获取部分,还配置为对所述子孔径图像进行下采样处理,获得初始子孔径图像;
所述生成部分,还配置为基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列;以及基于所述图像伪序列进行编码处理,生成码流。
第六方面,本申请实施例提供了一种光场图像编码器,光场图像编码器包括第二处理器、存储有所述第二处理器可执行指令的第二存储器,当所述指令被所述第二处理器执行时,实现如上所述的光场图像的处理方法。
第七方面,本申请实施例提供了一种计算机可读存储介质,其上存储有程序,应用于光场图像解码器和光场图像编码器中,所述程序被第一处理器执行时,实现如第一方面所述的光场图像的处理方法,所述程序被第二处理器执行时,实现如第二方面所述的光场图像的处理方法。
本申请实施例提供了一种光场图像的处理方法、编码器、解码器及存储介质,光场图像解码器解析码流,获得初始子孔径图像;将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率;将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。光场图像编码器通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像;对子孔径图像进行下采样处理,获得初始子孔径图像;基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列;基于图像伪序列进行编码处理,生成码流。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。综上所述,在本申请中,光场图像编码器在压缩编码光场图像之前,可以对光场图像进行空间和角度的下采样处理,获得低分辨率的光场图像,能够降低待编码的数据量,相应地,在解码后,光场图像解码器可以采用超分辨率重建网络进行对低分辨率的光场图像进行空间和角度的上采样,进而构建出高分辨率的光场图像,从而可以减少传输码流,大大提高编解码效率。
附图说明
图1为光场成像原理示意图;
图2为直接压缩方法的示意图;
图3为间接压缩方法的示意图;
图4为光场图像的处理方法的实现流程示意图一;
图5为预设排列顺序的示意图一;
图6为预设排列顺序的示意图二;
图7为预设排列顺序的示意图三;
图8为预设排列顺序的示意图四;
图9为提取初始EPI集合的示意图;
图10为解码端的处理流程示意图;
图11为超分辨重建网络的结构示意图;
图12为分支模块的结构示意图;
图13为ResDB模块的结构示意图;
图14为质量增强网络的结构示意图;
图15为光场图像的处理方法的实现流程示意图二;
图16为微透镜图像的局部放大图;
图17为初步提取的示意图;
图18为子孔径图像阵列;
图19为实现光场图像处理的光场图像编码器和光场图像解码器的示意图;
图20为光场图像处理方法的结构示意图;
图21为光场图像解码器的组成结构示意图一;
图22为光场图像解码器的组成结构示意图二;
图23为光场图像编码器的组成结构示意图一;
图24为光场图像编码器的组成结构示意图二。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅仅用于解释相关申请,而非对该申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请相关的部分。
光场是空间中任意位置任意方向的光线的总和,光场图像可以通过以下两种方式获取:通过普通相机阵列采集或者通过光场相机采集。通过普通相机阵列可获取到以二维图像为元素的二维数组,其中每幅图像代表不同视角下的光信息;而通过光场相机获取的是一幅以宏像素块为单元的尺寸较大的图像,同一宏像素块中的不同像素代表了同一物点在不同视角下的光信息。
光场相机的机身和传统数码相机差不多,但内部结构大有不同。传统相机以主镜头捕捉光线,再聚焦在镜头后的胶片或感光器上,所有光线的总和形成相片上的小点,显示影像。而光场相机在主镜头及感光器之间,设置有一个布满9万个微型镜片的显微镜阵列,每个小镜阵列接收由主镜颈而来的光线后,传送到感光器前,析出聚焦光线及将光线资料转换,以数码方式记下。光场相机内置软件操作“已扩大光场”,追踪每条光线在不同距离的影像上的落点,经数码重新对焦后,便能拍出完美照片。
而且,光场相机一反传统,减低镜头孔径大小及景深,以小镜阵列控制额外光线,展露每个影像的景深,再将微小的次影像投射到感光器上,所有聚焦影像周围的朦胧光圈变为“清晰”,保持旧有相机的大孔径所带来的增加光度、减少拍照时间及起粒的情况,不用牺牲景深及影像清晰度。
图1为光场成像原理示意图,如图1所示,与传统相机相比,光场相机需要在普通的主镜头前加入一个微透镜阵列来捕获场景中的光线,并将捕获得到物体的光场图像进行一系列算法的处理才能得以呈现。成像传感器的输出被称为微透镜图像(lenslet Image),由微透镜产生的许多微小图像组成。空间域(x,y)的微透镜图像经过处理可以进一步转换成角度域(s,t)的多个子孔径图像(Sub Aperture Image,SAI),对应于后续应用的不同视角视图。
由于光场图像尺寸较大,为了节省存储空间,通常会对光场图像进行压缩。目前,针对光场图像的压缩方案主要可以包括基于图像的直接压缩方法和基于伪视频序列的间接压缩方法。
基于图像的直接压缩方法是对传感器输出的微透镜图像直接采用现有的图像压缩方案,如联合图像专家组(Joint Photographic Experts Group,JPEG)、高效率视频编码(High Efficiency Video Coding,HEVC)、多功能视频编码(Versatile Video Coding,VVC)等。
图2为直接压缩方法的示意图,如图2所示,用光场相机采集得到光场数据,通过传感器输出为微透镜图像,然后将微透镜图像作为普通自然图像直接送入现有的图像编码标准如JPEG、HEVC、VVC等中进行压缩,并将比特流传输至解码端。在解码端用相对应的解码方法进行解码,重建出光场图像,即微透镜图像。
由于光场相机中微透镜阵列的存在,光场微透镜图像产生了类似圆圈的伪影,与传统自然图像具有不同的特性,而现有的图像压缩方案主要是为自然图像设计的,并不能够很好的实现对光场图像的高效压缩。
基于伪视频序列的间接压缩方法是将光场相机捕获的光场图像处理为多个子孔径图像,然后将这些子孔径图像可以按照不同的排序方式,例如旋转顺序、光栅扫描顺序、之字形和U形扫描顺序等排列为伪序列,利用现有的编码标准对其进行压缩编码。
图3为间接压缩方法的示意图,如图3所示,可以先将传感器的输出转化为子孔径图像后再进行压缩,具体地,基于微透镜的光场相机捕获转换的光场微透镜图像可以处理为多个子孔径图像,而这些子孔径图像可以按照不同的排序方式,例如旋转顺序、光栅扫描顺序、之字形和U形扫描顺序等排列为伪序列,利用现有的编码标准对其进行压缩编码,并将比特流传输至解码端。解码端进行相对应的解码后得到重建的光场伪序列,之后按照编码端的排列方式重排回光场子孔径图像阵列。
然而,在将光场图像重现排列组成伪序列进行压缩的过程中,需要获取光场图像准确的几何信息,以便提取多视点图像,会大大增加计算复杂度,且对于光场图像内容中的纹理复杂区域的预测精度并不高。
可见,现有的针对光场图像的压缩方案,在处理光场图像时,无法在现有的编解码标准的技术 上进行高效率、高精度的编解码处理。而随着光场相机的普及,如何压缩由微透镜阵列获取的光场图像且达到较高的压缩率和较好的压缩性能是当前需要解决的问题。
虽然基于空间相关性的压缩算法可以通过探索光场图像的自相关性来提升编码效率,但是很多压缩算法对于光场图像内容中的纹理复杂区域的预测精度并不高。此外,由于光场相机的微透镜结构的特殊性,排列十分紧密,因此相邻的两张子孔径图像之间只存在很微小的水平视差或垂直视差,此外,由于光场图像中包含三维场景的四维信息,从光场图像中提取的视点图像之间存在很强的空间相关性。但是,很多算法并没有充分探索视点图像间的强相关性,没有达到尽可能去除光场图像冗余的目的,来提升光场图像编码效率。
为了解决上述技术问题,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。综上所述,在本申请中,光场图像编码器在压缩编码光场图像之前,可以对光场图像进行空间和角度的下采样处理,获得低分辨率的光场图像,能够降低待编码的数据量,相应地,在解码后,光场图像解码器可以采用超分辨率重建网络进行对低分辨率的光场图像进行空间和角度的上采样,进而构建出高分辨率的光场图像,从而可以减少传输码流,大大提高编解码效率。
本申请的实施例提出了一个基于光场子孔径图像的、端到端的低码率光场图像压缩方案,主要可以包括光场图像的预处理,光场图像数据的编码,比特流传输,光场图像数据的解码,光场图像的后处理。在具体的实施过程中,主要可以包括低分辨率子孔径图像伪序列的构建、低分辨率子孔径图像伪序列的编解码以及解码后的低分辨率子孔径图像的超分辨率重建。
进一步地,在本申请中,设计了一个基于EPI思想的、用于光场图像的空间和角度超分辨重建的网络结构,该超分辨率重建网络整体采用分支-融合结构,可以有效地对低分辨率子孔径图像进行空间和角度的超分辨重建。同时,还设计了一个质量增强网络Enhance-net,用于对超分辨率重建网络所输出的重建后子孔径图像进行图像质量的增强。通过实验数据表明,本申请提出的光场图像的处理方法,在解码端通过超分辨率重建网络对光场图像的超分辨有明显的提升效果,因此可以在编码端通过空间分辨率和角度分辨率的下采样处理减小传输的数据量。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
在一实施例中,本申请实施例提供了一种光场图像的处理方法,该光场图像的处理方法应用于光场图像解码器中,图4为光场图像的处理方法的实现流程示意图一,如图4所示,在本申请的实施例中,光场图像解码器处理光场图像的方法可以包括以下步骤:
步骤101、解析码流,获得初始子孔径图像。
在本申请的实施例中,光场图像解码器可以先解析码流,从而可以获得初始子孔径图像,其中,初始子孔径图像为低分辨率的光场子孔径图像。
具体地,在编码侧,通过光场相机采集的一帧微透镜图像可以被处理为多个子孔径图像,在对多个子孔径图像分别进行下采样处理之后,获得对应的初始子孔径图像,其中,由于初始子孔径图像是下采样之后获得的,因此初始子孔径图像是分辨率较低的子孔径图像。
可以理解的是,在本申请的实施例中,光场图像解码器通过解析码流,可以获得图像伪序列和预设排列顺序。
具体地,在编码侧,通过下采样获得初始子孔径图像之后,可以按照预设排列顺序对初始子孔径图像进行重新排序,从而可以获得对应的图像伪序列。
进一步地,在本申请的实施例中,光场图像解码器在获取初始子孔径图像时,可以利用解析码流所确定的图像伪序列和预设排列顺序,逆变换获得初始子孔径图像。也就是说,在本申请的实施例中,光场图像解码器可以解析所述码流,然后获得图像伪序列和预设排列顺序;接着,可以基于预设排列顺序和图像伪序列,最后便可以生成初始子孔径图像。
需要说明的是,在本申请的实施例中,预设排列顺序可以包括多种排列顺序中的任意一种,图5为预设排列顺序的示意图一,图6为预设排列顺序的示意图二,图7为预设排列顺序的示意图三,图8为预设排列顺序的示意图四,如图5、6、7、8所示,排列顺序可以为旋转顺序、光栅扫描顺序、Z字形和U形扫描顺序中的任意一种,示例性的,可以选择光栅扫描顺序对多个子孔径图像进行排 列,从而获得对应的图像伪序列。
步骤102、将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率。
在本申请的实施例中,光场图像解码器在解析码流,获得初始子孔径图像之后,可以将初始子孔径图像输入至超分辨率重建网络中,从而可以输出重建后子孔径图像。具体地,重建后子孔径图像为高分辨率子孔径图像,即重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率。
需要说明的是,在本申请的实施例中,超分辨重建网络可以为分支融合超分辨网络(Branch Fusion Super Resolution Net,BFSRNet)模型,主要通过上采样处理进行图像分辨率的提升,可以使得输出图像的分辨率高于输入图像的分辨率。
也就是说,本申请实施例的核心是设计一个分支融合的神经网络模型(即超分辨率重建网络)来对图像的空间分辨率和角度分辨率进行提升,即空间和角度上超分辨。
进一步地,在本申请的实施例中,光场图像解码器在将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像时,具体可以先基于初始子孔径图像进行提取处理,获得核极线平面图(Epipolar Plane Image,EPI)集合;然后可以对初始EPI集合进行上采样处理和特征提取,获得目标EPI集合;其中,目标EPI集合中图像的分辨率大于初始EPI集合中图像的分辨率;最后可以对目标EPI集合进行融合处理,进而获得重建后子孔径图像。
可以理解的是,在本申请的实施例中,超分辨重建网络可以为一个基于EPI的光场图像超分辨网络结构,能够通过对不同维度的EPI图像进行超分辨来达到对整个光场图像超分辨的效果。
具体地,在本申请的实施例中,初始EPI集合可以为至少一个任意方向对应的EPI集合。
进一步地,在本申请的实施例中,在基于初始子孔径图像进行提取处理,获得初始EPI集合时,光场图像解码器可以先对初始子孔径图像进行排序处理,获得立体图像集合;然后可以按照至少一个方向对立体图像集合进行提取处理,获得至少一个初始EPI集合;其中,一个方向对应一个初始EPI集合。
示例性的,在本申请中,图9为提取初始EPI集合的示意图,如图9所示,(a)即为初始子孔径图像排列后的示意图,即基于预设排列顺序所获得的低分辨率的初始子孔径图像,接着,光场图像解码器可以按照旋转顺序、光栅扫描顺序、Z字形以及U形扫描顺序等任意一种扫描方法将初始子孔径图像排列重叠起来,形成一个立体合集,即获得立体图像集合,如(b)所示。如(c)所示,若沿y轴向上选取第m行(0≤m≤y)作为切点,对立体图像集合进行切片操作,即提取立体图像集合中所有图像同等高度上的像素,会得到一系列具有线性特性的图像,这些图像即为EPI图像,如(d)所示。每一行都会对应一幅EPI图像,因此所有行切片操作后的EPI图像组成一个初始EPI集合。同理,若沿x轴和n轴也进行相同切片操作,可以得到相应轴上的各自的初始EPI集合。其中,一个轴即为一个方向,一个方向(一个轴)对应一个初始EPI集合。
进一步地,在本申请的实施例中,光场图像解码器在对初始EPI集合进行上采样处理和特征提取,获得目标EPI集合时,可以先解析码流,获得采样参数;然后按照采样参数对EPI集合进行上采样处理,获得采样后EPI集合;接着可以利用一个或多个卷积层对采样后EPI集合进行特征提取,获得初始EPI集合对应的特征图像;最后,可以基于采样后EPI集合和特征图像,构建目标EPI集合。
需要说明的是,在本申请的实施例中,采样参数可以包括用于进行空间分辨率的上采样所对应的采样倍数和用于进行分辨角度率的上采样所对应的采样倍数。
具体地,在本申请的实施例中,在对一个方向所对应的初始EPI集合进行处理时,可以先通过上采样处理提升初始EPI集合中的每一个EPI图像的分辨率,然后可以通过卷积计算来对采样后的EPI集合进行浅层特征提取和深层特征提取,获得初始EPI集合对应的特征图像。最后,可以通过跳跃连接上采样处理后得到的采样后EPI集合和特征图像,获得超分辨的EPI图像集合,即构建高分辨率的目标EPI集合。其中,目标EPI集合中图像的分辨率大于初始EPI集合中图像的分辨率。
可以理解的是,在本申请的实施例中,基于超分辨率重建网络,光场图像解码器可以按照上述方法依次完成对每一个方向对应的初始EPI集合的超分辨率,最终构建出每一个方向对应的目标EPI集合。
进一步地,在本申请的实施例中,光场图像解码器在对目标EPI集合进行融合处理,获得重建后子孔径图像时,可以对至少一个EPI集合对应的至少一个目标EPI集合进行加权平均融合,获得重建后子孔径图像。
也就是说,在本申请中,光场图像解码器完成对全部目标EPI集合的构建之后,可以对利用每一个方向所对应的目标EPI集合进行融合处理,最终可以完成对子孔径图像的重建。
具体地,在本申请中,可以使用加权平均的线性方法融合每一个方向所对应的目标EPI集合,从而获得重建后子孔径图像。可见,光场图像解码器可以采用简单的加权平均方式进行融合处理,融合处理后可以获得超分辨率重建网络的最终输出结果,即重建后子孔径图像。
步骤103、将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。
在本申请的实施例中,光场图像解码器在将初始子孔径图像输入至超分辨率重建网络中,输出重建后子孔径图像之后,可以继续将重建后子孔径图像输入至质量增强网络中,从而可以输出目标子孔径图像。其中,目标子孔径图像的图像质量高于重建后子孔径图像的图像质量。
进一步地,在本申请的实施例中,在利用超分辨重建网络进行分辨率的提升之后,还可以利用质量增强网络进一步对图像的质量进行增强处理。其中,质量增强网络可以对每一帧重建后子孔径图像的质量进行增强,也可以针对部分帧的重建后子孔径图像质量进行增强。也就是说,质量增强网络并不固定。
也就是说,在本申请中,质量增强网络(Quality Enhancement Net,QENet),主要用于对重建后子孔径图像中的至少一帧图像进行质量增强。
本实施例提供了一种光场图像的处理方法,光场图像解码器解析码流,获得初始子孔径图像;将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率;将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。
基于上述实施例,在本申请的再一实施例中,图10为解码端的处理流程示意图,如图10所示,在解码端,解析码流获得的初始子孔径图像,输入至超分辨率重建网络络之后,可以输出重建后子孔径图像,接着输入至质量增强网络之后,可以输出目标子孔径图像。其中,重建后子孔径图像的分辨率高于初始子孔径图像的分辨率,目标子孔径图像的质量高于重建后子孔径图像的质量。
可以理解的是,在本申请的实施例中,利用上述EPI思想,光场图像解码器设计了低分辨率光场图像超分辨重建网络EPI-SREH-Net。所设计的网络能够同时实现在空间上和角度上的光场图像超分辨。基于整体的神经网络结构所包括的超分辨率重建网络SR-net和质量增强网络Enhance-net,光场图像解码器实现超分辨率的具体流程如下:
1、将获取到的解码后的低分辨率子孔径光场图像,即初始子孔径图像分别送入三个支路的超分辨网络SR-net,每个支路的输出都是高分辨率的子孔径图像。
2、将三个支路SR-net的输出采用加权平均的线性方法融合。
3、将融合后的高分辨率子孔径图像送入质量增强的网络Enhance-net,得到质量更高的光场子孔径图像,即目标子孔径图像。
具体地,在本申请中,超分辨重建网络(比如分支融合超分辨网络模型)主要实现在不同维度上同时对初始子孔径图像分别进行超分辨,并通过加权平均的方式融合得到超分辨的输出结果。质量增强网络(比如质量增强网络模型)主要是对超分辨率重建网络的输出结果进行图像质量的提升。
进一步地,在本申请的实施例中,超分辨重建网络作为实现超分辨率功能的核心,具体可以包括至少一个分支模块和一个融合模块。
可以理解地,在本申请的实施例中,图11为超分辨重建网络的结构示意图,如图11所示,超分辨重建网络的整体框架可以包括第一分支(用B1_SRNet表示)模块、第二分支(用B2_SRNet表示)模块、第三分支(用B3_SRNet表示)模块以及融合(用Fusion表示)模块。其中,第一分支模块、第二分支模块和第三分支模块等这三个分支模块即考虑了立体图像集中的三个方向,每一个分支模块可以看成是对立体图像集中不同方向的初始EPI集合的操作。
需要说明的是,在本申请中,三个分支模块采用相似的网络结构,仅有三维卷积模块中的ConvTranspose3d层的参数不同;这里,ConvTranspose3d层可称为转置3D卷积层、或者也可称为 3D解卷积层、3D反卷积层等。
进一步地,在本申请的实施例中,三个分支模块在完成各自的超分辨后,可以将输出的结果再次输入融合模块中。具体地,在本申请中,通过融合模块可以对三个分支模块的输出进行加权平均处理,最终可以得到重建后子孔径图像。
也就是说,本申请实施例可以采用的是简单的加权平均方式进行融合,融合后得到分支融合超分辨网络的最终输出结果。
需要说明的是,在本申请的实施例中,图12为分支模块的结构示意图,如图12所示,超分辨网络首先是通过上采样模块(即使用简单的上采样算子)将当前维度的初始EPI图像集合的分辨率进行提升,之后经过卷积计算模块,包括:两个Conv2d层的浅层特征提取和一系列ResDB模块(即ResDB 1模块、…、ResDB d模块、…、ResDB D模块等)的深层特征提取,这里每一个ResDB模块自身使用残差学习的方式,多个ResDB模块的输出通过连接(Concat)层进行特征拼接,然后使用1×1的Conv2d层来降低特征通道数。另外,残差重建也是使用一个Conv2d层,通过跳跃连接上采样模块得到的图像,进而得到超分辨的EPI图像集合。最后,还需要使用ConvTranspose3d层的3D反卷积来对立体图像集的三个维度分辨率进行提升,再使用Conv3d层完成立体图像集在图像分辨率和帧率上的超分辨率重建。其中,ConvTranspose3d层还包括有带泄露修正线性单元(Leaky Rectified Linear Unit,Leaky ReLU)函数。
进一步地,在本申请的实施例中,图13为ResDB模块的结构示意图,如图13所示,可以由三个带有激活函数的Conv2d层和一个1*1的Conv2d层组成。这里,每一个ResDB模块内部都采用密集连接,将这三个Conv2d层的输出通过Concat进行拼接,再由1*1的Conv2d层降维。而且ResDB模块与ResDB模块之间使用跳跃连接,即将上一个块(即ResDB d-1模块)的输出与当前块(即ResDB d模块)的输出进行叠加,然后将和值作为下一个块(即ResDB d+1模块)的输入。
需要注意的是,激活函数可以是线性整流函数(Rectified Linear Unit,ReLU),又称修正线性单元,是一种人工神经网络中常用的激活函数,通常指代以斜坡函数及其变种为代表的非线性函数。另外,Leaky ReLU函数是经典(以及广泛使用的)的ReLu函数的变体。由于ReLU函数的输入值为负的时候,输出始终为0,其一阶导数也始终为0;为了解决ReLU函数这个缺点,在ReLU函数的负半区间引入一个泄露(Leaky)值,即称为Leaky ReLU函数。
由此可见,在本申请中,对于解析码流获得的图像伪序列X s′,t′(x′,y′,n),n=s′×t′,对应的初始子孔径图像LF L(x′,y′,s′,t′),解析码流获得的采样参数包括空间分辨率采样倍数α和角度分辨率采样倍数β。
在使用基于EPI思想提出的超分辨率重建网络提升初始子孔径图像的空间分辨率和角度分辨率时,首先需要给予初始子孔径图像生成每一个方向所对应的初始EPI集合,然后经过上采样层,按照采样参数将初始EPI集合中的单个EPI图像的空间分辨率和角度分辨率进行提升;接着,可以经过两个卷积层的浅层特征提取和一系列密集连接的ResDB卷积块的深层特征提取,以及一个2d卷积层的残差图重建,然后与上采样后的图像相加,得到分辨率提升的EPI图像;进一步地,可以使用一个3D反卷积将三个维度的分辨率特征进行提升;最后采用3D卷积完成超分辨率光场图像的重建。
示例性的,以其中一个处理X轴方向的支路为例,通过分辨率网络进行分辨率的提升的具体的数据流程为:先解析获得低分辨率的图像伪序列X s′,t′(x′,y′,n),沿x轴维度做切片,得到x个EPI图像组成的初始EPI集合
Figure PCTCN2020103177-appb-000001
i∈R n,i≤x′。然后,利用SR-net网络,对初始EPI集合中的每个EPI图像分做α倍的超分辨,最终得到这部分网络的输出
Figure PCTCN2020103177-appb-000002
i∈R n,i≤x′。最后,可以将x个输出重新堆叠起来,得到T s′,t′(x′,αy′,αn)。依次完成每一个轴方向的每一个支路的处理之后,对每一个支路输出的结果进行加权平均融合处理,最终可以得到超分辨率网络的最终输出Y s,t(x,y,n),其中,x=αx′,y=αy′,s=βs′,t=βt′。
进一步地,在利用超分辨重建网络进行初始子孔径图像分辨率的提升,获得重建后子孔径图像之后,重建后子孔径图像的图像质量还需要进一步的提升,这时候就添加了质量增强网络。其中,质量增强网络可以逐帧对图像质量进行增强,也可以针对部分帧的图像质量进行增强。也就是说,质量增强网络并不固定,通常而言,质量增强网络包为QENet模型,其中,QENet模型可以采用现有的任意一种图像质量增强网络模型,比如超分辨卷积神经网络(Super-Resolution Convolutional Neural Network,SRCNN)模型、伪影消除卷积神经网络(Artifacts Reduction Convolutional Neural  Network,ARCNN)模型、超分辨率超深网络(Very Deep convolutional networks for Super-Resolution,VDSR)模型、视频超分辨率递归反投影网络(Recurrent Back-Projection Network for Video Super-Resolution,RBPN)模型和基于增强可变形卷积网络的视频重建(Video Restoration with Enhanced Deformable Convolutional Networks,EDVR)模型等。由于光场图像编码器设计复杂度的需要,建议选择效果不错且复杂度较低的网络,本申请实施例可以选择ARCNN模型较为合适。
图14为质量增强网络的结构示意图,如图14所示,质量增强网络可以由四个卷积层组成,除去最后一层用于重建的卷积层外,其余三层卷积后都用PReLU函数激活。
进一步地,在本申请的实施例中,在进行图像质量增强处理时,可以将超分辨率重建网络输出的重建后子孔径图像逐个作为输入送入质量增强网络,经过由三个带有激活函数的卷积层和一个用于图像重建的2D卷积层,最终可翻译得到质量增强后的子孔径图像,即目标子孔径图像。由此可见,在本申请中,质量增强网络的输入是每个高分辨率的重建后子孔径图像,质量增强网络的输出是质量增强后的高分辨率的目标子孔径图像。
本实施例提供了一种光场图像的处理方法,光场图像解码器解析码流,获得初始子孔径图像;将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率;将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。
在另一实施例中,本申请实施例提供了一种光场图像的处理方法,该光场图像的处理方法应用于光场图像编码器中,图15为光场图像的处理方法的实现流程示意图二,如图15所示,在本申请的实施例中,光场图像编码器处理光场图像的方法可以包括以下步骤:
步骤201、通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像。
在本申请的实施例中,光场图像编码器可以先通过光场相机采集获得微透镜图像,然后可以根据微透镜图像生成子孔径图像。
进一步地,在本申请的实施例中,光场相机可以通过配置的显微镜阵列进行采集,从而获得光场图像,即微透镜图像。
需要说明的是,在本申请的实施例中,为了更加适用于现有的编解码标准,光场图像编码器在通过光场相机采集获得微透镜图像之后,并不会直接使用现有的编解码标准技术对微透镜图像进行压缩处理,而是可以先对微透镜图像进行转化,获得对应的子孔径图像。
可以理解的是,在本申请的实施例中,在将微透镜图像转换成子孔径图像时,可以先提取微透镜图像中心区域亮度较高的部分,然后将所有相同位置的像素全部提出并进行重新排列,从而可以获得子孔径图像。
图16为微透镜图像的局部放大图,图17为初步提取的示意图,如图16和17所示,微透镜图像的亮度是由中心向四周逐渐变暗的,这是因为微透镜图像的边缘部分由于光线的汇聚作用强而导致接收的光线较少,因此只需要提取微透镜图像中心区域亮度较高的部分。
进一步地,在本申请中,可以将提取出的中心区域的图像按照微透镜排列次序进行排序,其中,Ai为各个图像的中心像素,Bi、Ci、Di分别为三个位置的像素,i为1,2,……。接着,将微透镜图像所有相同位置的像素全部提出并进行重排列操作,即可得到所需视点的子孔径图像。也就是说,子孔径图像提取过程主要包括微透镜图像中心点提取、微透镜图像中心点排序,从而生成子孔径图像。
进一步地,在本申请的实施例中,将通过光场相机采集获得的微透镜图像所转换获得的子孔径图像是光场每一个角度形成的图像,可以具有角度分辨率和空间分辨率。其中,子孔径图像的空间分辨率可以为子孔径图像的像素数,即可以表征光场相机微透镜阵列中微透镜的个数;子孔径图像的角度分辨率可以为子孔径图像的数量。
需要说明的是,在本申请的实施例中,基于微透镜图像生成的子孔径图像可以为一系列的2D图像阵列LF(x,y,s,t),其中一共有(s×t)张子孔径图像,即角度分辨率为(s×t),每张子 孔径图像大小为(x×y),即空间分辨率为(x×y)。图18为子孔径图像阵列,如图18所示,需要注意的是,在子孔径图像阵列中,相较于中间部分,四个角落的子孔径图像更暗,且越靠近边缘部分图像畸变和模糊现象越严重。因此一般的压缩过程中都会删去这些边缘角落的子孔径图像。
示例性的,在本申请中,在进行子孔径图像的提取时,假设微透镜阵列有300行400列,也就是有1.2w个微透镜,每个微透镜下面有30×30个像素,也就是半径大约为30个像素,那么可以提取30×30张子孔径图像,每张子孔径图像为300×400像素,中心角度的子孔径图像可以按顺序提取每一个微透镜中心位置的像素值,按顺序重新拼成一幅新的300×400的图像,就是子孔径图像。
也就是说,提取(0,0)位置的微透镜中心的像素值,也就是子孔径图像(0,0)位置的像素值,(0,1)位置的微透镜中心的像素值,就是子孔径图像(0,1)位置的像素值,以此按顺序类推,直到所有的微透镜遍历完。
和提取中心子孔径图像原理一样,只需要和之前一样按顺序遍历所有微透镜,根据微透镜中心坐标点,推测某一个角度的坐标,提取像素值,重新拼成一幅新的图像,即为该角度的子孔径图像。
步骤202、对子孔径图像进行下采样处理,获得初始子孔径图像。
在本申请的实施例中,光场图像编码器在通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像之后,可以先对子孔径图像进行下采样处理,从而可以获得采样后子孔径图像。其中,采样后子孔径图像的分辨率小于下采样之前的子孔径图像的分辨率。
需要说明的是,在本申请的实施例中,光场图像编码器在对子孔径图像进行下采样处理时,可以按照采样参数分别对子孔径图像的空间分辨率和角度分辨率进行下采样处理,最终便可以完成初始子孔径图像的构建。
可以理解的是,在本申请的实施例中,由于分别进行空间分辨率的下采样和角度分辨率的下采样,因此采样参数可以包括用于进行空间分辨率的下采样所对应的采样倍数和用于进行分辨角度率的下采样所对应的采样倍数。
也就是说,在本申请中,光场图像编码器可以按照对应的采样倍数分别对子孔径图像进行角度分辨率下采样和空间分辨率下采样,从而可以获得初始子孔径图像。
可以理解的是,在本申请中,经过下采样处理后的初始子孔径图像为低分辨率子孔径图像。
示例性的,在本申请中,对于子孔径图像LF(x,y,s,t),可以对其空间分辨率下采样α倍,对其角度分辨率下采样β倍(α、β构成采样参数),从而可以构建低分辨率的子孔径图像,即初始子孔径图像LF L(x′,y′,s′,t′),其中,x=αx′,y=αy′,s=βs′,t=βt′。
步骤203、基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列。
在本申请的实施例中,光场图像编码器在对子孔径图像进行下采样处理,获得初始子孔径图像之后,便可以基于预设排列顺序和初始子孔径图像,进一步生成子孔径图像对应的图像伪序列。
需要说明的是,在本申请的实施例中,光场图像编码器在获得低分辨率的初始子孔径图像之后,可以按照一定的顺序对初始子孔径图像进行排序,即基于预设排列顺序生成对应的图像伪序列。
可以理解的是,在本申请的实施例中,预设排列顺序可以为多种排列顺序中的任意一种,例如,预设排列顺序可以为旋转顺序、光栅扫描顺序、Z字形和U形扫描顺序等中的任意一种顺序。
进一步地,在本申请的实施例中,光场图像编码器在按照预设排列顺序对初始子孔径图像进行重新排序处理之后,可以确定对应的图像伪序列V L(x′,y′,n)。其中,V L(x′,y′,n)=LF L(x′,y′,s′,t′),n=s′×t′。
步骤204、基于图像伪序列进行编码处理,生成码流。
在本申请的实施例中,光场图像编码器在基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列之后,便可以基于图像伪序列进行处理,从而生成对应的二进制码流。
进一步地,在本申请的实施例中,光场图像编码器可以使用现有的编码标准进行编码处理,从而可以将低分辨率的初始子孔径图像所对应的图像伪序列写入码流中,便可以完成对广场图像的压缩处理。
可以理解的是,在本申请的实施例中,高效的编解码器可以选择HEVC或VVC等现有的主流编码标准的编解码器。
进一步地,在本申请的实施例中,在基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列之后,即步骤203之后,光场图像编码器还可以将预设排列顺序所对应的排序参数写入码流中。其中,排序参数用于指示生成图像伪序列的预设排列顺序。
示例性的,在本申请中,光场图像编码器如果使用旋转顺序对初始子孔径图像进行排序,可以在将排序参数设置为0之后,将其写入码流中;光场图像编码器如果使用光栅扫描顺序对初始子孔 径图像进行排序,可以在将排序参数设置为1之后,将其写入码流中;光场图像编码器如果使用Z字形对初始子孔径图像进行排序,可以在将排序参数设置为2之后,将其写入码流中;光场图像编码器如果使用U形扫描顺序对初始子孔径图像进行排序,可以在将排序参数设置为4之后,将其写入码流中。
进一步地,在本申请的实施例中,在对子孔径图像进行下采样处理,获得初始子孔径图像之后,即步骤202之后,光场图像编码器还可以将采样参数写入码流,传输至解码侧。
通过上述步骤201至步骤204所提出的光场图像的处理方法,能够通过下采样处理降低子孔径图像的角度分辨率和空间分辨率,获得低分辨率的初始子孔径图像,由于在解码侧能够实现子孔径图像的超分辨率重建,因此,光场图像编码器可以直接将初始子孔径对应的图像伪序列写入码流,传输至光场图像解码器。
本实施例提供了一种光场图像的处理方法,光场图像编码器通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像;对子孔径图像进行下采样处理,获得初始子孔径图像;基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列;基于图像伪序列进行编码处理,生成码流。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。
基于上述实施例,在本申请的再一实施例中,图19为实现光场图像处理的光场图像编码器和光场图像解码器的示意图,如图19所示,光场图像编码器可以包括预处理模块和编码模块,光场图像解码器可以包括解码模块和后处理模块。
图20为光场图像处理方法的结构示意图,如图20所示,在编码端,针对输入的光场相机采集的微透镜图像,预处理模块可以将微透镜图像转换为子孔径图像,然后按照采样参数对子孔径图像进行下采样处理,得到低分辨率的初始子孔径图像,接着,可以按照预设排列顺序将初始子孔径重新排序,输出图像伪序列;然后通过编码模块进行编码处理,将图像伪序列、采样参数以及预设排列顺序都写进码流中,其中,码流为二进制比特流,例如1010001011101010。编码模块在将该码流传输至解码端之后,解码模块可以解析码流,获得图像伪序列、采样参数以及预设排列顺序,然后利用图像伪序列和预设排列顺序得到低分辨率的初始子孔径图像,接着,后处理模块通过超分辨率重建网络对初始子孔径图像进行超分辨率重建,获得高分辨率的重建后子孔径图像,并使用质量增强网络进一步对图像质量进行提升,获得高质量的目标子孔径图像。具体地,后处理模块可以利用本申请实施例中的超分辨率网络实现超分辨率重建。也就是说,超分辨率网络的作用就是将解码后的低分辨率的子孔径图像超分辨重建为高分辨率的子孔径图像。
可见,在本申请中,正是由于光场相机不同于传统成像设备,不但可以记录空间场景中的光线强度,而且还能记录其方向信息。由于微透镜结构的特殊性,排列十分紧密,因此相邻的两张子孔径图像之间只存在很微小的水平视差或垂直视差,具有很强的相关性。同时每张子孔径图像本身也存在一定程度上的空间冗余。考虑到相邻子孔径图像之间存在的强相关性和每张子孔径图像本身的空间相关性,本申请提出的光场图像的处理方法,在编码端不需要对全部的光场图像进行压缩编码,只需要压缩部分光场图像的低分辨率图像,在解码端,可以利用子孔径图像之间的相关性重建出未编码的部分,从而节省码流,提高编码效率。
进一步地,在本申请的实施例中,编码端的预处理模块所使用的预处理方式具体可以为采用了下采样来降低光场图像的分辨率,其中,预处理方法并不固定,比如下采样,颜色空间转换等。解码端的后处理模块则对应采用了超分辨重建网络还原光场图像。由此可见,通过采用下采样的预处理方式,光场图像的处理流程主要包括低分辨率子孔径图像伪序列的构建、低分辨率子孔径图像伪序列的编解码以及解码后的低分辨率子孔径图像的超分辨率重建。
需要说明的是,在本申请的实施例中,对于光场图像,首先将需要压缩的光场子孔径图像在空间和角度上分别按照采样参数进行数倍的下采样,接着将其转换成YUV420格式,构建低分辨率子孔径图像伪序列。之后利用现有的编解码标准进行伪序列的编码解码。最后在解码端基于所设计的超分辨率重建网络重建出与原始子孔径图像分辨率相同的全部子孔径图像。
可以理解的是,本申请提出的光场图像的处理方法,可以应用于低码率的光场图像压缩方案。目前的光场图像大多是高分辨率的,如果直接对整个光场图像进行压缩,那么将会导致编码效率低,码流量大。而本申请所提出的光场图像的处理方法正是一种低码率的压缩方案,能够有效地解决上述问题。
进一步地,在本申请的实施例中,光场图像解码器在进行超分辨率重建网络的构建时,可以先确定出超分辨重建网络所对应的第一网络参数,然后基于第一网络参数构建超分辨重建网络。
可以理解的是,在本申请中,光场图像解码器可以采用多种方式来确定超分辨重建网络对应的第一网络参数。
示例性的,在本申请中,光场图像解码器可以先获取第一训练数据;其中,第一训练数据包括低分辨率图像和对应的高分辨率图像;然后通过第一训练数据进行模型训练,最终便可以确定第一网络参数。
示例性的,在本申请中,光场图像解码器可以解析码流,直接获得第一网络参数。也就是说,在编码端,光场图像编码器可以将第一网络参数写入码流中,传输至解码端。
需要说明的是,第一训练数据可以包括多组图像,每组图像由一帧低分辨率图像和对应的一帧高分辨率图像组成,第一训练数据用以进行模型参数的训练,以得到超分辨率重建网络的第一网络参数。
也就是说,针对超分辨率重建网络的第一网络参数,一方面可以是根据第一训练数据进行模型参数训练得到的;另一方面也可以是由光场图像编码器进行模型参数训练,然后将训练后的第一网络参数写入码流,由光场图像解码器通过解析码流直接获取第一网络参数;本申请实施例不作任何限定。
进一步地,在本申请的实施例中,光场图像解码器在进行质量增强网络的构建时,可以先确定出质量增强网络所对应的第二网络参数,然后基于第二网络参数构建质量增强网络。
可以理解的是,在本申请中,光场图像解码器可以采用多种方式来确定质量增强网络对应的第二网络参数。
示例性的,在本申请中,光场图像解码器可以先获取第二训练数据;其中,第一训练数据包括低质量图像和对应的高质量图像;然后通过第二训练数据进行模型训练,最终便可以确定第二网络参数。
示例性的,在本申请中,光场图像解码器可以解析码流,直接获得第二网络参数。也就是说,在编码端,光场图像编码器可以将第二网络参数写入码流中,传输至解码端。
需要说明的是,第二训练数据可以包括多组图像,每组图像由一帧低质量图像和对应的一帧高质量图像组成,第二训练数据用以进行模型参数的训练,以得到质量增强网络的第二网络参数。
也就是说,针对质量增强网络的第二网络参数,一方面可以是根据第二训练数据进行模型参数训练得到的;另一方面也可以是由光场图像编码器进行模型参数训练,然后将训练后的第二网络参数写入码流,由光场图像解码器通过解析码流直接获取第二网络参数;本申请实施例不作任何限定。
简言之,本申请实施例主要是解决目前的光场图像压缩过程中存在的编码效率低的问题;可以通过预处理时的降采样(即下采样处理),然后在后处理时恢复重构的方式,有效解决目前编解码效率偏低的问题。
需要说明的是,在本申请的实施例中,目前在低分辨率子孔径图像伪序列的构建中,低分辨率下采样采用了bicubic方法,这里的方法并不固定,只要能实现下采样效果即可。解码后的低分辨率子孔径图像的超分辨率重建也不仅限于本申请实施例所设计的网络结构,其他有同样功能的网络结构都可替换,只是在重建性能上可能有所差距。
另外,本申请实施例中超分辨率重建网络的网络结构可以有所更改。具体来说,分支融合超分辨网络模型的三个支路可以进行适当的删减,用以满足不同场景不同计算能力的需求。质量增强网络的网络结构在实际应用中通常使用的是ARCNN模型,但是并不局限于此,只要能满足对图像质量增强效果即可。而所有的更改都有可能对最终的图像质量产生差距。
进一步地,在本申请的实施例中,光场图像的超分辨重建网络的应用不仅局限于解码后的伪序列的后处理,也可以应用于光场图像编码器的帧间或帧内预测部分来提高预测精度。
可见,本申请实施例所提出的光场图像的处理方法大大提高了编码效率。具体地,在压缩编码前对光场图像在空间和角度上进行分辨率的下采样,大大降低了需要编码的图像数据量;在解码后采用超分辨率重建网络进行相对应的上采样,恢复出高分辨率的光场图像。总体上看,可以明显降低码率,大大提高编码效率,减少传输码流。
另外,本申请实施例所采用的质量增强网络的设计,大大提高了图像质量。因此,将所提出的超分辨率重建网络和质量增强网络应用于光场图像的压缩的处理过程中,可以明显的提升压缩光场图像的质量,而且在光场图像超分辨的提升上有明显效果。
也就是说,本申请实施例中所提出的光场图像的处理方法,可以利用超分辨率重建网络和质量增强网络来同时实现视光场图像超分辨和质量提升的两种效果。
可以理解的是,在本申请的实施例中,由于解码侧可以使用超分辨重建网络对低分辨的子孔径图像进行超分辨率重建,因此,在编码侧,光场图像编码器将微透镜图像转换为子孔径图像之后,通过下采样处理获得低分辨率的子孔径图像,便可以使用现有的编解码标准技术进行编码处理,能够很好的适应于现有的编解码框架,无需对编光场图像解码器的结构进行修改。
本实施例提供了一种光场图像的处理方法,光场图像解码器解析码流,获得初始子孔径图像;将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率;将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。光场图像编码器通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像;对子孔径图像进行下采样处理,获得初始子孔径图像;基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列;基于图像伪序列进行编码处理,生成码流。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。
基于上述实施例,在本申请的又一实施例中,超分辨率重建网络和质量增强网络可以在一台PC上使用Nvidia GTX 1080Ti GPU的PyTorch平台中实现的。实验采用的训练集和测试集来自于真实光场数据集EPFL、Lytro Illum和合成光场数据集HCI。
实验在空间和角度上均实现了2倍超分辨结果,也就是说,采样参数中的空间分辨率的采样倍数和角度分辨率的采样倍数均设置为2,即输入光场图像的分辨率为x×y×s×t,输出光场图像的分辨率为2x×2y×2s×2t。实验采用峰值信噪比(Peak Signal to Noise Ratio,PSNR)和结构相似性(Structural SIMilarity,SSIM)作为评价指标。不同数据集的图像分辨率不同,将直接获取到的光场子孔径图像作为高分辨率图像,使用Matlab平台将其空间角度2倍下采样得到的低分辨率子孔径图像作为网络的输入,具体如下:
EPFL数据集:输入的光场图像分辨率为217×312×4×4,则输出分辨率为434×624×8×8。
Lytro Illum数据集:输入分辨率187×270×4×4,输出分辨率为374×540×8×8。
HCI数据集:输入分辨率256×256×4×4,输出分辨率为512×512×8×8。
具体的实验结果如表1所示,与现有的如LFCNN、LFSR等一些算法相比,本申请提出的光场图像的处理方法在PSNR和SSIM表现上都有明显的提升。
表1
Figure PCTCN2020103177-appb-000003
这是由于目前大多数的网络只是单纯的考虑了光场图像超分辨的一个方面,而本申请所提出的光场图像的处理方法,由于在解码端所使用的超分辨率重建网络能够同时对光场图像的空间和角度两个维度的分辨率进行提升,因此基于本申请所提出的EPI-SREH-Net在光场图像超分辨上有明显的效果提升。
也就是说,在本申请中,将本申请所提出的光场图像的处理方法应用于子孔径图像的低码率光场压缩方案中,明显的提升了光场图像的编码性能。以Lytro Illum光场数据集为例,编码传输一个完整的光场图像,在未采用低码率光场压缩方案前,光场图像编码器需要编码64帧540P的伪序列,使用低码率压缩方案后,只需要编码16帧270P的伪序列即可,大大提高了编码效率,减少了传输码流。
综上所述,本申请的实施例提出了一个基于光场子孔径图像的、端到端的低码率光场图像压缩方案,主要可以包括光场图像的预处理,光场图像数据的编码,比特流传输,光场图像数据的解码,光场图像的后处理。在具体的实施过程中,主要可以包括低分辨率子孔径图像伪序列的构建、低分辨率子孔径图像伪序列的编解码以及解码后的低分辨率子孔径图像的超分辨率重建。
进一步地,在本申请中,设计了一个基于EPI思想的、用于光场图像的空间和角度超分辨重建的网络结构,该超分辨率重建网络整体采用分支-融合结构,可以有效地对低分辨率子孔径图像进行空间和角度的超分辨重建。同时,还设计了一个质量增强网络Enhance-net,用于对超分辨率重建网络所输出的重建后子孔径图像进行图像质量的增强。通过实验数据表明,本申请提出的光场图像的处理方法,在解码端通过超分辨率重建网络对光场图像的超分辨有明显的提升效果,因此可以在编码端通过空间分辨率和角度分辨率的下采样处理减小传输的数据量。
本实施例提供了一种光场图像的处理方法,光场图像解码器解析码流,获得初始子孔径图像;将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率;将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。光场图像编码器通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像;对子孔径图像进行下采样处理,获得初始子孔径图像;基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列;基于图像伪序列进行编码处理,生成码流。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。
基于上述实施例,在本申请的另一实施例中,图21为光场图像解码器的组成结构示意图一,如图21所示,本申请实施例提出的光场图像解码器200可以包括解析部分201和第一获取部分202,确定部分203以及构建部分204。
所述解析部分201,配置为解析码流,获得初始子孔径图像;
所述第一获取部分202,配置为将所述初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,所述重建后子孔径图像的空间分辨率和角度分辨率均大于所述初始子孔径图像的空间分辨率和角度分辨率;以及将所述重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。
进一步地,在本申请的实施例中,所述解析部分201,具体配置为解析所述码流,获得图像伪序列和预设排列顺序;基于所述预设排列顺序和所述图像伪序列,生成所述初始子孔径图像。
进一步地,在本申请的实施例中,所述第一获取部分202,具体配置为基于所述初始子孔径图像进行提取处理,获得初始核极线平面图EPI集合;对所述初始EPI集合进行上采样处理和特征提取,获得目标EPI集合;其中,目标EPI集合中图像的分辨率大于所述初始EPI集合中图像的分辨率;对所述目标EPI集合进行融合处理,获得所述重建后子孔径图像。
进一步地,在本申请的实施例中,第一获取部分202,还具体配置为对所述初始子孔径图像进行排序处理,获得立体图像集合;按照至少一个方向对所述立体图像集合进行提取处理,获得至少一个初始EPI集合;其中,一个方向对应一个初始EPI集合。
进一步地,在本申请的实施例中,第一获取部分202,还具体配置为解析所述码流,获得采样 参数;按照所述采样参数对所述EPI集合进行上采样处理,获得采样后EPI集合;利用一个或多个卷积层对所述采样后EPI集合进行特征提取,获得所述初始EPI集合对应的特征图像;基于所述采样后EPI集合和所述特征图像,构建所述目标EPI集合。
进一步地,在本申请的实施例中,第一获取部分202,还具体配置为对至少一个EPI集合对应的至少一个目标EPI集合进行加权平均融合,获得所述重建后子孔径图像。
进一步地,在本申请的实施例中,所述确定部分203,配置为确定所述超分辨重建网络对应的第一网络参数;
所述构建部分204,配置为基于所述第一网络参数构建所述超分辨重建网络。
进一步地,在本申请的实施例中,所述确定部分203,具体配置为获取第一训练数据;其中,所述第一训练数据包括低分辨率图像和对应的高分辨率图像;通过所述第一训练数据进行模型训练,确定所述第一网络参数。
进一步地,在本申请的实施例中,所述确定部分203,还具体配置为解析所述码流,获得所述第一网络参数。
进一步地,在本申请的实施例中,所述确定部分203,还配置为确定所述质量增强网络对应的第二网络参数;
所述构建部分204,还配置为基于所述第二网络参数构建所述质量增强网络。
进一步地,在本申请的实施例中,所述确定部分203,还具体配置为获取第二训练数据;其中,所述第二训练数据包括低质量图像和对应的高质量图像;通过所述第二训练数据进行模型训练,确定所述第二网络参数。
进一步地,在本申请的实施例中,所述确定部分203,还具体配置为解析所述码流,获得所述第二网络参数。
图22为光场图像解码器的组成结构示意图二,如图22所示,本申请实施例提出的光场图像解码器200还可以包括第一处理器205、存储有第一处理器205可执行指令的第一存储器206、第一通信接口207,和用于连接第一处理器205、第一存储器206以及第一通信接口207的第一总线208。
进一步地,在本申请的实施例中,上述第一处理器205,用于解析码流,获得初始子孔径图像;将所述初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,所述重建后子孔径图像的空间分辨率和角度分辨率均大于所述初始子孔径图像的空间分辨率和角度分辨率;将所述重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例提供了一种光场图像解码器,该光场图像解码器解析码流,获得初始子孔径图像;将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率;将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。
基于上述实施例,在本申请的再一实施例中,图23为光场图像编码器的组成结构示意图一,如图23所示,本申请实施例提出的光场图像编码器300包括:第二获取部分301,生成部分302,编 码部分303。
所述第二获取部分301,配置为通过光场相机采集获得微透镜图像;
所述生成部分302,配置为根据所述微透镜图像生成子孔径图像;
所述第二获取部分301,还配置为对所述子孔径图像进行下采样处理,获得初始子孔径图像;
所述生成部分302,还配置为基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列;以及基于所述图像伪序列进行编码处理,生成码流。
进一步地,在本申请的实施例中,所述编码部分303,配置为所述基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列之后,将排序参数写入码流;其中,所述排序参数用于对所述预设排列顺序进行指示。
进一步地,在本申请的实施例中,所述第二获取部分301,具体配置为按照采样参数分别对所述子孔径图像的空间分辨率和角度分辨率进行下采样处理,以完成所述初始子孔径图像的构建。
进一步地,在本申请的实施例中,所述编码部分303,还配置为所述对所述子孔径图像进行下采样处理,获得初始子孔径图像之后,将所述采样参数写入码流。
图24为光场图像编码器的组成结构示意图二,如图24所示,本申请实施例提出的光场图像编码器300还可以包括第二处理器304、存储有第二处理器304可执行指令的第二存储器305、第二通信接口306,和用于连接第二处理器304、第二存储器305以及第二通信接口306的第二总线307。
进一步地,在本申请的实施例中,上述第二处理器304,用于通过光场相机采集获得微透镜图像,并根据所述微透镜图像生成子孔径图像;对所述子孔径图像进行下采样处理,获得初始子孔径图像;基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列;基于所述图像伪序列进行编码处理,生成码流。
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例提供了一种光场图像编码器,该光场图像编码器通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像;对子孔径图像进行下采样处理,获得初始子孔径图像;基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列;基于图像伪序列进行编码处理,生成码流。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。
本申请实施例提供计算机可读存储介质和计算机可读存储介质,其上存储有程序,该程序被处理器执行时实现如上述实施例所述的方法。
具体来讲,本实施例中的一种光场图像的处理方法对应的程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与一种光场图像的处理方法对应的程序指令被一电子设备读取或被执行时,包括如下步骤:
解析码流,获得初始子孔径图像;
将所述初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,所述重建后子孔径图像的空间分辨率和角度分辨率均大于所述初始子孔径图像的空间分辨率和角度分辨率;
将所述重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。
具体来讲,本实施例中的一种光场图像的处理方法对应的程序指令可以被存储在光盘,硬盘, U盘等存储介质上,当存储介质中的与一种光场图像的处理方法对应的程序指令被一电子设备读取或被执行时,还包括如下步骤:
通过光场相机采集获得微透镜图像,并根据所述微透镜图像生成子孔径图像;
对所述子孔径图像进行下采样处理,获得初始子孔径图像;
基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列;
基于所述图像伪序列进行编码处理,生成码流。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的实现流程示意图和/或方框图来描述的。应理解可由计算机程序指令实现流程示意图和/或方框图中的每一流程和/或方框、以及实现流程示意图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。
工业实用性
本申请实施例提供了一种光场图像的处理方法、编码器、解码器及存储介质,光场图像解码器解析码流,获得初始子孔径图像;将初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,重建后子孔径图像的空间分辨率和角度分辨率均大于初始子孔径图像的空间分辨率和角度分辨率;将重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。光场图像编码器通过光场相机采集获得微透镜图像,并根据微透镜图像生成子孔径图像;对子孔径图像进行下采样处理,获得初始子孔径图像;基于预设排列顺序和初始子孔径图像,生成子孔径图像对应的图像伪序列;基于图像伪序列进行编码处理,生成码流。也就是说,在本申请的实施例中,由于在解码端可以使用超分辨率重建网络对低分辨的子孔径图像进行空间和角度上的超分辨率重建,因此,在编码端可以使用下采样处理降低子孔径图像的空间分辨率和角度分辨率,从而可以只对部分光场图像进行编解码处理,有效减少了传输的码流数据,大大提高了编解码效率,进而提升了光场图像的压缩效率。由此可见,本申请采用了超分辨率重建网络的设计,能够实现同时对光场图像的空间分辨率和角度分辨率进行提升处理,因此与在将超分辨率重建网络应用于光场压缩的处理过程时,可以明显提升压缩处理的效率,同时,本申请还可以采用质量增强网络对超分辨率重建网络输出的结果进行图像质量的提升,因此可以提高图像质量。综上所述,在本申请中,光场图像编码器在压缩编码光场图像之前,可以对光场图像进行空间和角度的下采样处理,获得低分辨率的光场图像,能够降低待编码的数据量,相应地,在解码后,光场图像解码器可以采用超分辨率重建网络进行对低分辨率的光场图像进行空间和角度的上采样,进而构建出高分辨率的光场图像,从而可以减少传输码流,大大提高编解码效率。

Claims (21)

  1. 一种光场图像的处理方法,应用于光场图像解码器,所述方法包括:
    解析码流,获得初始子孔径图像;
    将所述初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,所述重建后子孔径图像的空间分辨率和角度分辨率均大于所述初始子孔径图像的空间分辨率和角度分辨率;
    将所述重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。
  2. 根据权利要求1所述的方法,其中,所述解析码流,获得初始子孔径图像,包括:
    解析所述码流,获得图像伪序列和预设排列顺序;
    基于所述预设排列顺序和所述图像伪序列,生成所述初始子孔径图像。
  3. 根据权利要求1所述的方法,其中,所述将所述初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像,包括:
    基于所述初始子孔径图像进行提取处理,获得初始核极线平面图EPI集合;
    对所述初始EPI集合进行上采样处理和特征提取,获得目标EPI集合;其中,目标EPI集合中图像的分辨率大于所述初始EPI集合中图像的分辨率;
    对所述目标EPI集合进行融合处理,获得所述重建后子孔径图像。
  4. 根据权利要求3所述的方法,其中,所述基于所述初始子孔径图像进行提取处理,获得初始EPI集合,包括:
    对所述初始子孔径图像进行排序处理,获得立体图像集合;
    按照至少一个方向对所述立体图像集合进行提取处理,获得至少一个初始EPI集合;其中,一个方向对应一个初始EPI集合。
  5. 根据权利要求3所述的方法,其中,所述对所述初始EPI集合进行上采样处理和特征提取,获得目标EPI集合,包括:
    解析所述码流,获得采样参数;
    按照所述采样参数对所述EPI集合进行上采样处理,获得采样后EPI集合;
    利用一个或多个卷积层对所述采样后EPI集合进行特征提取,获得所述初始EPI集合对应的特征图像;
    基于所述采样后EPI集合和所述特征图像,构建所述目标EPI集合。
  6. 根据权利要求4所述的方法,其中,所述对所述目标EPI集合进行融合处理,获得所述重建后子孔径图像,包括:
    对至少一个EPI集合对应的至少一个目标EPI集合进行加权平均融合,获得所述重建后子孔径图像。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    确定所述超分辨重建网络对应的第一网络参数;
    基于所述第一网络参数构建所述超分辨重建网络。
  8. 根据权利要求7所述的方法,其中,所述确定所述超分辨重建网络对应的第一网络参数,包括:
    获取第一训练数据;其中,所述第一训练数据包括低分辨率图像和对应的高分辨率图像;
    通过所述第一训练数据进行模型训练,确定所述第一网络参数。
  9. 根据权利要求7所述的方法,其中,所述确定所述超分辨重建网络对应的第一网络参数,包括:
    解析所述码流,获得所述第一网络参数。
  10. 根据权利要求1所述的方法,其中,所述方法还包括:
    确定所述质量增强网络对应的第二网络参数;
    基于所述第二网络参数构建所述质量增强网络。
  11. 根据权利要求10所述的方法,其中,所述确定所述质量增强网络对应的第二网络参数,包括:
    获取第二训练数据;其中,所述第二训练数据包括低质量图像和对应的高质量图像;
    通过所述第二训练数据进行模型训练,确定所述第二网络参数。
  12. 根据权利要求10所述的方法,其中,所述确定所述质量增强网络对应的第二网络参数,包括:
    解析所述码流,获得所述第二网络参数。
  13. 一种光场图像的处理方法,应用于光场图像编码器,所述方法包括:
    通过光场相机采集获得微透镜图像,并根据所述微透镜图像生成子孔径图像;
    对所述子孔径图像进行下采样处理,获得初始子孔径图像;
    基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列;
    基于所述图像伪序列进行编码处理,生成码流。
  14. 根据权利要求13所述的方法,其中,所述基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列之后,所述方法还包括:
    将排序参数写入码流;其中,所述排序参数用于对所述预设排列顺序进行指示。
  15. 根据权利要求13所述的方法,其中,所述对所述子孔径图像进行下采样处理,获得初始子孔径图像,包括:
    按照采样参数分别对所述子孔径图像的空间分辨率和角度分辨率进行下采样处理,以完成所述初始子孔径图像的构建。
  16. 根据权利要求15所述的方法,其中,所述对所述子孔径图像进行下采样处理,获得初始子孔径图像之后,所述方法还包括:
    将所述采样参数写入码流。
  17. 一种光场图像解码器,所述光场图像解码器包括:解析部分和第一获取部分,
    所述解析部分,配置为解析码流,获得初始子孔径图像;
    所述第一获取部分,配置为将所述初始子孔径图像输入至超分辨重建网络中,输出重建后子孔径图像;其中,所述重建后子孔径图像的空间分辨率和角度分辨率均大于所述初始子孔径图像的空间分辨率和角度分辨率;以及将所述重建后子孔径图像输入至质量增强网络中,输出目标子孔径图像。
  18. 一种光场图像解码器,所述光场图像解码器包括第一处理器、存储有所述第一处理器可执行指令的第一存储器,当所述指令被所述第一处理器执行时,实现如权利要求1-12任一项所述的方法。
  19. 一种光场图像编码器,所述光场图像编码器包括:第二获取部分和生成部分,
    所述第二获取部分,配置为通过光场相机采集获得微透镜图像;
    所述生成部分,配置为根据所述微透镜图像生成子孔径图像;
    所述第二获取部分,还配置为对所述子孔径图像进行下采样处理,获得初始子孔径图像;
    所述生成部分,还配置为基于预设排列顺序和所述初始子孔径图像,生成所述子孔径图像对应的图像伪序列;以及基于所述图像伪序列进行编码处理,生成码流。
  20. 一种光场图像编码器,所述光场图像编码器包括第二处理器、存储有所述第二处理器可执行指令的第二存储器,当所述指令被所述第二处理器执行时,实现如权利要求13-16任一项所述的方法。
  21. 一种计算机可读存储介质,其上存储有程序,应用于光场图像解码器和光场图像编码器中,所述程序被第一处理器执行时,实现如权利要求1-12任一项所述的方法,所述程序被第二处理器执行时,实现如权利要求13-16任一项所述的方法。
PCT/CN2020/103177 2020-07-21 2020-07-21 光场图像的处理方法、编码器、解码器及存储介质 WO2022016350A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP20946182.1A EP4156685A4 (en) 2020-07-21 2020-07-21 LIGHT FIELD IMAGE PROCESSING METHODS, LIGHT FIELD IMAGE ENCODERS AND DECODERS AND STORAGE MEDIA
CN202080104551.6A CN116210219A (zh) 2020-07-21 2020-07-21 光场图像的处理方法、编码器、解码器及存储介质
PCT/CN2020/103177 WO2022016350A1 (zh) 2020-07-21 2020-07-21 光场图像的处理方法、编码器、解码器及存储介质
US18/079,174 US20230106939A1 (en) 2020-07-21 2022-12-12 Light field image processing method, light field image encoder and decoder, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/103177 WO2022016350A1 (zh) 2020-07-21 2020-07-21 光场图像的处理方法、编码器、解码器及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/079,174 Continuation US20230106939A1 (en) 2020-07-21 2022-12-12 Light field image processing method, light field image encoder and decoder, and storage medium

Publications (1)

Publication Number Publication Date
WO2022016350A1 true WO2022016350A1 (zh) 2022-01-27

Family

ID=79729965

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/103177 WO2022016350A1 (zh) 2020-07-21 2020-07-21 光场图像的处理方法、编码器、解码器及存储介质

Country Status (4)

Country Link
US (1) US20230106939A1 (zh)
EP (1) EP4156685A4 (zh)
CN (1) CN116210219A (zh)
WO (1) WO2022016350A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897680A (zh) * 2022-04-14 2022-08-12 安庆师范大学 融合光场子孔径图像与宏像素图像的角度超分辨率方法
CN114926339A (zh) * 2022-05-30 2022-08-19 北京拙河科技有限公司 基于深度学习的光场多视角图像超分辨率重建方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230344974A1 (en) * 2022-04-25 2023-10-26 Rovi Guides, Inc. Learning-based light field compression for tensor display
CN117475088B (zh) * 2023-12-25 2024-03-19 浙江优众新材料科技有限公司 基于极平面注意力的光场重建模型训练方法及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018050725A1 (en) * 2016-09-19 2018-03-22 Thomson Licensing A method and a device for reconstructing a point cloud representative of a scene using light-field data
CN109447919A (zh) * 2018-11-08 2019-03-08 电子科技大学 结合多视角与语义纹理特征的光场超分辨率重建方法
CN110191344A (zh) * 2019-06-06 2019-08-30 天津大学 一种光场图像智能编码方法
CN110191359A (zh) * 2019-05-16 2019-08-30 华侨大学 一种基于关键子孔径图像选取的光场图像压缩方法
CN110599400A (zh) * 2019-08-19 2019-12-20 西安理工大学 一种基于epi的光场图像超分辨的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018050725A1 (en) * 2016-09-19 2018-03-22 Thomson Licensing A method and a device for reconstructing a point cloud representative of a scene using light-field data
CN109447919A (zh) * 2018-11-08 2019-03-08 电子科技大学 结合多视角与语义纹理特征的光场超分辨率重建方法
CN110191359A (zh) * 2019-05-16 2019-08-30 华侨大学 一种基于关键子孔径图像选取的光场图像压缩方法
CN110191344A (zh) * 2019-06-06 2019-08-30 天津大学 一种光场图像智能编码方法
CN110599400A (zh) * 2019-08-19 2019-12-20 西安理工大学 一种基于epi的光场图像超分辨的方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MA XIAOHUI, ZENG HUANQIANG;CHEN JING;ZHU JIANQING: "Light Field Image Compression Based on Multi-view Pesudo Sequence", SIGNAL PROCESSING, vol. 35, no. 3, 1 March 2019 (2019-03-01), pages 378 - 385, XP055888100, ISSN: 1003-0530, DOI: 10.16798/j.issn.1003-0530.2019.03.008 *
See also references of EP4156685A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897680A (zh) * 2022-04-14 2022-08-12 安庆师范大学 融合光场子孔径图像与宏像素图像的角度超分辨率方法
CN114926339A (zh) * 2022-05-30 2022-08-19 北京拙河科技有限公司 基于深度学习的光场多视角图像超分辨率重建方法及系统
CN114926339B (zh) * 2022-05-30 2023-02-03 北京拙河科技有限公司 基于深度学习的光场多视角图像超分辨率重建方法及系统

Also Published As

Publication number Publication date
US20230106939A1 (en) 2023-04-06
EP4156685A1 (en) 2023-03-29
CN116210219A (zh) 2023-06-02
EP4156685A4 (en) 2023-06-07

Similar Documents

Publication Publication Date Title
WO2022016350A1 (zh) 光场图像的处理方法、编码器、解码器及存储介质
CN111028150B (zh) 一种快速时空残差注意力视频超分辨率重建方法
CN110033410B (zh) 图像重建模型训练方法、图像超分辨率重建方法及装置
CN113362223B (zh) 基于注意力机制和双通道网络的图像超分辨率重建方法
US20220014723A1 (en) Enhancing performance capture with real-time neural rendering
Pan et al. TSAN: Synthesized view quality enhancement via two-stream attention network for 3D-HEVC
WO2017215587A1 (zh) 一种视频图像的编解码方法和装置
CN111937040A (zh) 分层场景分解编解码系统及方法
CN108921910B (zh) 基于可伸缩卷积神经网络的jpeg编码压缩图像复原的方法
CN110612722B (zh) 对数字光场图像编码和解码的方法和设备
CN110599400A (zh) 一种基于epi的光场图像超分辨的方法
CN111800630A (zh) 一种视频超分辨率重建的方法、系统及电子设备
CN112785502B (zh) 一种基于纹理迁移的混合相机的光场图像超分辨率方法
WO2022011571A1 (zh) 视频处理方法、装置、设备、解码器、系统及存储介质
Hu et al. An adaptive two-layer light field compression scheme using GNN-based reconstruction
CN112150400A (zh) 图像增强方法、装置和电子设备
Majidi et al. A deep model for super-resolution enhancement from a single image
CN114897680B (zh) 融合光场子孔径图像与宏像素图像的角度超分辨率方法
CN115552905A (zh) 用于图像和视频编码的基于全局跳过连接的cnn滤波器
CN116542889A (zh) 一种拥有稳定视点的全景视频增强方法
CN111369443A (zh) 光场跨尺度的零次学习超分辨率方法
Zhao et al. Light field image compression via CNN-based EPI super-resolution and decoder-side quality enhancement
CN111416983B (zh) 基于成像相关的多焦距光场视频帧内预测方法和装置
Van Duong et al. Downsampling based light field video coding with restoration network using joint spatio-angular and epipolar information
Chen et al. Guided dual networks for single image super-resolution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20946182

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020946182

Country of ref document: EP

Effective date: 20221223

NENP Non-entry into the national phase

Ref country code: DE