WO2020062074A1 - Reconstruction d'images déformées, au moyen d'un réseau neuronal convolutif - Google Patents

Reconstruction d'images déformées, au moyen d'un réseau neuronal convolutif Download PDF

Info

Publication number
WO2020062074A1
WO2020062074A1 PCT/CN2018/108441 CN2018108441W WO2020062074A1 WO 2020062074 A1 WO2020062074 A1 WO 2020062074A1 CN 2018108441 W CN2018108441 W CN 2018108441W WO 2020062074 A1 WO2020062074 A1 WO 2020062074A1
Authority
WO
WIPO (PCT)
Prior art keywords
filter
cnn
image
side information
distorted image
Prior art date
Application number
PCT/CN2018/108441
Other languages
English (en)
Inventor
Jiabao YAO
Xiaoyang Wu
Xiaodan Song
Li Wang
Original Assignee
Hangzhou Hikvision Digital Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co., Ltd. filed Critical Hangzhou Hikvision Digital Technology Co., Ltd.
Priority to PCT/CN2018/108441 priority Critical patent/WO2020062074A1/fr
Publication of WO2020062074A1 publication Critical patent/WO2020062074A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation

Definitions

  • This disclosure relates to reconstructing distorted images.
  • CNN convolutional neural network
  • CNN uses artificial neurons that respond to a part of the surrounding cells within the coverage area to improve performance for large-scale image processing.
  • the present disclosure describes reconstructing distorted images.
  • a computer-implemented method implemented by a video codec includes receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN) ; and using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
  • CNN convolutional neural network
  • a first feature combinable with any of the following features, where the image data represents a portion of the at least one distorted image.
  • DBK deblocking
  • SAO sample adaptive offset
  • ALF Adaptive Loop Filter
  • a third feature combinable with any of the previous or following features, where the selected type of filter is the CNN type.
  • a fourth feature combinable with any of the previous or following features, where the type of filter is selected further based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  • a fifth feature combinable with any of the previous or following features, further comprising: generating controlling coefficients to adjust weights or biases of the filter by using a second CNN.
  • a sixth feature combinable with any of the previous or following features, where the controlling coefficients adjust more than one convolution kernals in a same channel with a same value.
  • a seventh feature combinable with any of the previous or following features, where the controlling coefficients adjust different convolution kernals in a same channel with different values.
  • controlling coefficients are generated based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  • a ninth feature combinable with any of the previous or following features, where controlling coefficients are generated based on a preconfigured computation boundary or a target quality factor.
  • a tenth feature combinable with any of the previous or following features, where the controlling coefficients are used to determine whether to omit a convolutional layer in generating a reconstructed image.
  • An eleventh feature combinable with any of the previous or following features, where the image data comprises data for at least one of a luminance component or a color component.
  • a computer-implemented method includes receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, a layer path of a convolutional neural network (CNN) filter based on the image data, wherein the layer path of the CNN filter is selected by using a first CNN; and generating, by the at least one processor, a reconstructed image corresponding to the at least one distorted image by using the selected layer path of the CNN filter.
  • CNN convolutional neural network
  • a first feature combinable with any of the following features, where the image data represents a portion of the at least one distorted image.
  • a third feature combinable with any of the previous or following features, where the layer path is selecged based on a preconfigured computation boundary or a target quality factor.
  • a fourth feature combinable with any of the previous or following features, further comprising: generating controlling coefficients to adjust weights or biases of the CNN filter by using a second CNN.
  • a fifth feature combinable with any of the previous or following features, where the controlling coefficients are generated based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  • a sixth feature combinable with any of the previous or following features, where the image data comprises data for at least one of a luminance component or a color component.
  • a computer-readable medium storing computer instructions, that when executed by one or more hardware processors, cause the one or more hardware processors of a router to perform operations including: receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN) ; and using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
  • CNN convolutional neural network
  • the previously described implementation is implementable using a computer-implemented method; a non-transitory, computer-readable medium storing computer-readable instructions to perform the computer-implemented method; and a computer-implemented system comprising a computer memory interoperably coupled with a hardware processor configured to perform the computer-implemented method and the instructions stored on the non-transitory, computer-readable medium.
  • FIG. 1 is an example communication system 100 that reconstructs distorted images, according to an implementation.
  • FIG. 2 is a flow diagram illustrating an example process for reconstructing images, according to an implementation.
  • FIG. 3 is a schematic diagram illustrating using a CNN to select a filter to reconstruct a distorted image, according to an implementation.
  • FIG. 4 is a schematic diagram illustrating using a CNN to generate coefficients of a filter used to reconstruct a distorted image, according to an implementation.
  • FIG. 5 includes schematic diagrams that illustrate different levels of controlling coefficients, according to an implementation.
  • FIG. 6 is a flowchart illustrating an example method for reconstructing an image, according to an implementation.
  • FIG. 7 is a block diagram of an example computer system used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures, as described in the instant disclosure, according to an implementation.
  • FIG. 8 is a schematic diagram illustrating an example structure of an electronic circuit that reconstructs images as described in the present disclosure, according to an implementation.
  • FIG. 9 is a schematic diagram illustrating the construct of an example CNN filter that reconstructs an image, according to an implementation.
  • FIG. 10 is a flowchart illustrating another example method for reconstructing an image, according to an implementation.
  • CNN simplifies the complexity because it can reduce or avoid complex pre-processing of the image and can directly use the original image for end-to-end learning. Furthermore, traditional neural networks are fully connected, which means that input-to-hidden neurons are all connected. Such configurations result in a large amount of parameters, thus complicating the training process and consuming large amount of computation resources. By using local connections and weight sharing, CNN saves computation resources and improves computation efficiency.
  • an image de-distortion filter can be used to post-process the distorted images to reconstruct the images by, for example, restoring the pixel intensity offset and reducing visual loss.
  • DBK filters Deblocking filters
  • filter coefficients and structures can be adjusted based on statistical information of local image regions.
  • adaptive filters include Sample Adaptive Offset (SAO) filter and Adaptive Loop Filter (ALF) .
  • SAO Sample Adaptive Offset
  • ALF Adaptive Loop Filter
  • different filter parameters e.g., filter coefficients, corresponding to different local statistical information are included in the software code so that they can be chosen based on the local statistical information for different images. As a result, the size and complexity of the software code for these adaptive filters would increase.
  • deep learning technology can be applied to image processing.
  • Using CNN filters to reconstruct distorted images can provide a significant improvement in the quality of the reconstructed images.
  • Such an approach can reduce or avoid the process of image pre-processing and manually designing filter coefficients. It learns the image distortion characteristics and compensation methods through data driving training, is simpler to use, more generalized, and more accurate. The improvement is more pronounced when applied to image and video compression that incorporates multiple distortions.
  • Some CNN filters e.g., Variable-filter-size Residue-learning Convolutional Neural Network (VRCNN)
  • VRCNN Variable-filter-size Residue-learning Convolutional Neural Network
  • Conventional convolutional neural network filters such as VRCNN only provides a high-dimensional filter, and do not include a decision-making filtering process for different texture inputs. This may cause a lack of generalization abilities of these filters and performance loss.
  • the process of reconstructing distorted images can be improved by incorporating a CNN based decision-making filtering process.
  • the decision making filtering process can be used to select filters, determine filter coefficients, or a combination thereof.
  • a side information guide map can be generated based on distorted parameters, and used as an input to the decision making filtering process. FIGS. 1-8 and associated descriptions provide additional details of these implementations.
  • FIG. 1 is an example communication system 100 that reconstructs distorted images, according to an implementation.
  • the example communication system 100 includes an electronic device 102 and a server 120, that are communicatively coupled with a network 110.
  • the server 120 represents an application, a set of applications, software, software modules, hardware, or any combination thereof that can be configured to provide trained CNN models.
  • the server 120 can perform training of the CNN models based on a set of training data.
  • the server 120 can receive data from different electronic devices that perform image processing to construct additional sets of training data.
  • the electronic device 102 represents an electronic device that can reconstruct distorted images.
  • the electronic device 102 can be a video codec or include a video codec.
  • the video codec can perform an image reconstruction process during encoding and decoding operations of video images.
  • the electronic device 102 can be a graphics-processing unit (GPU) or include a GPU that reconstructs distorted image data.
  • the electronic device 102 can use a first CNN to select a filter based on the distorted image.
  • the electronic device 102 can use a second CNN to generate controlling coefficients based on the distorted image, and use the controlling coefficients to adjust the weights of the filter.
  • the electronic device 102 can use the filter to reconstruct the distorted image.
  • the electronic device 102 can generate a side information guide map and use the side information guide map as additional input to the first and the second CNN. In some cases, the electronic device 102 can also receive parameters of trained CNN models from the server 120. FIGS. 2-8 and associated descriptions provide additional details of these implementations.
  • the electronic device 102 may include, without limitation, any of the following: endpoint, computing device, mobile device, mobile electronic device, user device, mobile station, subscriber station, portable electronic device, mobile communications device, wireless modem, wireless terminal, or other electronic device.
  • an endpoint may include an IoT (Internet of Things) device, EoT (Enterprise of Things) device, cellular phone, personal data assistant (PDA) , smart phone, laptop, tablet, personal computer (PC) , pager, portable computer, portable gaming device, wearable electronic device, health/medical/fitness device, camera, vehicle, or other mobile communications devices having components for communicating voice or data via a wireless communication network.
  • the wireless communication network may include a wireless link over at least one of a licensed spectrum and an unlicensed spectrum.
  • the term “mobile device” can also refer to any hardware or software component that can terminate a communication session for a user.
  • the terms “user equipment, ” “UE, ” “user equipment device, ” “user agent, ” “UA, ” “user device, ” and “mobile device” can be used interchangeably herein.
  • the example communication system 100 includes the network 110.
  • the network 110 represents an application, set of applications, software, software modules, hardware, or combination thereof, that can be configured to transmit data messages between the entities in the system 100.
  • the network 110 can include a wireless network, a wireline network, the Internet, or a combination thereof.
  • the network 110 can include one or a plurality of radio access networks (RANs) , core networks (CNs) , and the Internet.
  • the RANs may comprise one or more radio access technologies.
  • the radio access technologies may be Global System for Mobile communication (GSM) , Interim Standard 95 (IS-95) , Universal Mobile Telecommunications System (UMTS) , CDMA2000 (Code Division Multiple Access) , Evolved Universal Mobile Telecommunications System (E-UMTS) , Long Term Evaluation (LTE) , LTE-Advanced, the fifth generation (5G) , or any other radio access technologies.
  • GSM Global System for Mobile communication
  • UMTS Universal Mobile Telecommunications System
  • CDMA2000 Code Division Multiple Access
  • E-UMTS Evolved Universal Mobile Telecommunications System
  • LTE Long Term Evaluation
  • LTE-Advanced Long Term Evaluation
  • the fifth generation 5G
  • the core networks may be evolved packet cores (EPCs) .
  • FIG. 1 While elements of FIG. 1 are shown as including various component parts, portions, or modules that implement the various features and functionality, nevertheless, these elements may instead include a number of sub-modules, third-party services, components, libraries, and such, as appropriate. Furthermore, the features and functionality of various components can be combined into fewer components, as appropriate.
  • FIG. 2 is a flow diagram illustrating an example process 200 for reconstructing images, according to an implementation.
  • the process 200 can be performed by an electronic device that reconstructs images, e.g., the electronic device 102 as illustrated in FIG. 1.
  • process 200 may be performed, for example, by any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate.
  • various steps of process 200 can be run in parallel, in combination, in loops, or in any order.
  • a prediction process is performed.
  • input data is compared with a predicted frame to generate residual data.
  • the predicted frame is generated based on one or more prediction models.
  • the prediction models include temporal prediction models and spatial prediction models.
  • the temporal prediction models provide inter-frame predictions.
  • the spatial prediction models provide intra-frame predictions.
  • the prediction process can be performed continuously.
  • the input data represents the raw image data of the n-th frame.
  • a predicted frame for the n-th frame can be generated by applying the prediction models on a reference frame of the previous frame, i.e., the (n-1) -th frame.
  • the reference frame can be generated based on information of multiple previous frames, e.g., the (n-1) -th frame, the (n-2) -th frame, and etc.
  • the input data of the n-th frame is compared with the predicted frame for the n-th frame to generate residual data of the n-th frame.
  • the residual data of the n-th frame is processed through other steps in FIG. 2, e.g., transformation, quantization, and entropy encoding to generate output data of the n-th frame.
  • the residual data is also processed through reverse quantization and transformation and a filtering process to generate a reference frame of the n-th frame.
  • the reference frame of the n-th frame will be used to generate the predicted frame for the next frame, i.e., the (n+1) -th frame, which is used for encoding the (n+1) -th frame.
  • the input data can be provided in different formats.
  • the input data can be provided in the YCrCb format.
  • the input data includes the luminance component Y, and the color components Cr and Cb.
  • Other formats e.g., RGB, can also be used.
  • the residual data is transformed.
  • transformation techniques that can be used in this step include Karhunen-Loeve Transform (KLT) , Discrete Cosine Transformation (DCT) , Discrete Wavelet Transform (DWT) , and etc.
  • the transformed data is quantized.
  • quantization techniques that can be used in this step include scalar quantizer, vector quantizer, and etc.
  • the transform data can have a large range of values. Quantization can reduce the range of values, thereby obtaining a better compression effect.
  • the quantization is a main factor that causes image distortion.
  • the quantization process is configured by one or more quantization parameters, which can be used to configure the degrees of quantization. The following equation represents an example calculation of the quantization process:
  • QP represents a quantization parameter.
  • QP can be an integer value between 0 to 51.
  • entropy encoding is performed on the quantized data.
  • the inputs to the entropy encoder can include the quantized data generated at step 206.
  • the inputs to the entropy encoder can also include motion vectors, headers of the raw image of the current frame, supplementary information, side information, and any combinations thereof.
  • the entropy encoder converts these inputs to an encoded frame, and outputs the encoded frame as output data for storage or transmission. In some cases, reordering can also be performed on the quantized data prior to the entropy encoding.
  • the quantized data is also used to generate a reference frame for the next frame.
  • a reverse quantization and transformation process is performed on the quantized data to generate image data of a distorted image.
  • the reverse quantization and transformation process is implemented using techniques that corresponds to those used in the transformation and quantization process. For example, if the quantization process uses vector scaling, the reverse quantization process uses vector rescaling; if the transformation process uses DCT, the reverse transformation process uses Inverse DCT (IDCT) .
  • the quantized data after being reverse quantized and transformed, can be added to the predicted frame used in the prediction step (e.g., step 202) to generate the image data of the distorted image.
  • the distorted image can include multiple portions.
  • the distorted image can be divided into multiple pieces. Accordingly, multiple groups of image data can be generated. Each group represents one or more portions of the distorted images.
  • each group of image data can be processed separately in the following steps.
  • the distorted image discussed in the subsequent steps refers to the group of image data corresponding to the portion of the distorted image that is processed together.
  • one group of image data can represent the entire distorted image and can be processed together.
  • the distorted image discussed in the subsequent steps refers to the image data corresponding to the entire distorted image.
  • image data of more than one distorted images can be processed together in selecting the filter and generating controlling coefficients. In these cases, the distorted image discussed in the subsequent steps refers to the image data corresponding to multiple distorted images.
  • the image data of the distorted image can be grouped into different components. These components can include luminance components and color components.
  • the distorted image can be represented in a YCrCb format, and the image data can have the Y, Cr, and Cb components. In these cases, different components of the image data can be processed separately or jointly.
  • the quantized parameter is used as input to generate a side information guide map.
  • side information of an image refers to prior knowledge that can be used to assist the processing of one or more images.
  • conventional CNN filters such as VRCNN may rely on side information of the sensor that produce the image to determine suitable filter coefficients for the image, and thus its performance may suffer if such side information is not available.
  • a side information guide map can be generated based on the quantized parameter used in the quantization step. Because the quantization parameter indicates an extent to distortion introduced in the encoding process, using the quantization parameter to generate a side information guide map can provide additional input to the CNN to improve the performance.
  • the side information guide map can be generated by two steps: obtaining distortion parameters and normalization.
  • the distortion parameters can be generated based on the quantization parameter discussed previously.
  • the distortion parameters can also be generated based on other information that indicates the extent of image distortion, such as block size and sampling parameters used in the encoding process, or the resolution restoration factor in a high-resolution image restoration process.
  • the side information guide map can have the same dimension (e.g., width and height) as the portion of distorted image represented by the .
  • Each pixel on the side information guide map can be represented by a distortion parameter that indicates the degree of distortion for the corresponding pixel in the distorted image.
  • the distortion parameter can be obtained on the side information parameters discussed previously, e.g., the quantization parameters or the resolution restoration factor.
  • the information regarding the degree of distortion may not be readily available.
  • an image may be subjected to multiple digital image processing such as scaling and compression, which may not be known by the electronic device that performs the image reconstruction process.
  • non-reference image evaluation techniques can be used to determine the distortion parameter for the side information guide map.
  • fuzzy degree evaluation can be used as such a non-reference image evaluation. Following equation represents an example fuzzy degree evaluation technique:
  • f (x, y) represents the value of the pixel in the coordinate (x, y)
  • D (f) can represents the fuzzy degree in some circumstances.
  • the distortion parameters can be normalized so that the range of values of each distortion parameter in the guide map is consistent with the range of values of the corresponding pixel in the distorted image.
  • the range of values for the quantization parameters is [QP_MIN, QP_MAX]
  • the range of values for pixels in the distorted image is [PIXEL_MIN, PIXEL_MAX] .
  • the normalized quantization parameters norm (x) can be obtained using the following equation:
  • a first CNN is used to select a type of a filter.
  • different filters can be used as candidate filters for reconstructing the images.
  • the type of the filter selected by the first CNN can be a non-CNN type, such as DBK, SAO, ALF, or a CNN type.
  • a CNN filter can outperform the non-CNN filters.
  • a non-CNN filter can outperform the CNN filters.
  • Using a CNN to select an appropriate filter can improve the performance of the image reconstruction process.
  • the first CNN can be used to select more than one filters.
  • the candidate filters can include more than one types of CNN filters. These CNN filters can have different constructs (e.g., layers) , filter parameters (e.g., weights or bias) .
  • the candidate filters may not include non-CNN filters, e.g., non-CNN filters may not be included for image reconstruction that is not part of a video coding process.
  • the first CNN can be used to select one or more CNN filters among the different CNN filters
  • FIG. 3 is a schematic diagram 300 illustrating using a CNN to select a filter to reconstruct a distorted image, according to an implementation.
  • the diagram 300 includes a first CNN 310 and a first trained CNN model 320.
  • the first CNN 310 represents a CNN that is configured to select a filter to reconstruct a distorted image.
  • the first CNN 310 includes an input layer 312, a hidden layer 314, and an output layer 316.
  • the first CNN 310 can be implemented by using any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate.
  • the input layer 312 is configured to receive input data to the first CNN 310.
  • the input data can include the distorted image as discussed previouslyIn some cases, if the side information guide map can be generated as discussed previously, the side information guide map can also be included as the input data. If the side information guide map is not available, e.g., due to lack of information regarding the degree of distortion, the first CNN 310 can proceed without the side information guide map.
  • features of the distorted image can also be used as input to the first CNN 310.
  • these features include linear features or non-linear features.
  • Linear features also referred to as textual features, include features generated by linear transformations, e.g., gradient features.
  • Non-linear features include features generated by non-linear transformations, e.g., frequency features generated by Fourier transformation such as Fast Fourier Transform (FFT) , wavelet features generated by Wavelet Transform (WT) , or features generated by non-linear activations.
  • FFT Fast Fourier Transform
  • WT Wavelet Transform
  • the input layer 312 can perform a channel merging operation and a convolution filtering operation.
  • the channel merging operation combines the side information guide map with the distorted image for each channel to generate a combined input data, represented as I.
  • the convolution filtering operation perform a convolutional filtering on the combined input data I, as illustrated in the following equation:
  • W 1 represents the weighting coefficients of the convolutional filter used in the input layer 312
  • B 1 represents the bias coefficients of the convolutional filter
  • g () represents a non-linear mapping function
  • F 1 (I) represents the output of the input layer 312.
  • each convolutional filter can have a kernel of a size c1 x f1 x f1, where C1 represents the number of channels in the input data, and f1 represents the spatial size of each kernel.
  • the hidden layer 314 performs additional high-dimensional mapping on the image segmentation of sparse representations extracted by the input layer 312.
  • the hidden layer 314 includes one or more convolution layers.
  • F i (I) g (W i *F i-1 (I) +B i ) , i ⁇ ⁇ 2, 3, ..., N ⁇
  • F i (I) represents the output of the i-th convolutional layer
  • W i represents the weighting coefficients of the convolutional filter used in the i-th convolutional layer
  • B i represents the bias coefficients of the convolutional filter used in the i-th convolutional layer
  • g () represents a non-linear mapping function
  • each convolutional filter can have a kernel of a size c2 x f2 x f2, where C2 represents the number of channels for input data to the convolutional layer, and f2 represents the spatial size of each kernel.
  • the output layer 316 processes the high-dimensional image output from the hidden layer 314 and generates the filter selection decisions.
  • the filter decisions can be a CNN filter, or a non-CNN filter, such as a DBK, ALF, or SAO filter. In some cases, more than one type of filter can be selected.
  • the output layer 316 can further perform a convolutional operation on the distorted image to generate the reconstruction image.
  • the output layer 316 can include one reconstruction layer.
  • the operation of the reconstruction layer can be represented by the following equation:
  • F (I) represents the output of the reconstruction layer
  • F N-1 (I) represents output of the hidden layer 134
  • W N represents the weighting coefficients of the convolutional filter used in the reconstruction layer
  • B N represents the bias coefficients of the convolutional filter used in the reconstruction layer.
  • each convolutional filter can have a kernel of a size cN x fN x fN, where CN represents the number of channels for input data to the construction layer, and fN represents the spatial size of each kernel.
  • the first CNN 310 uses the first trained CNN model 320 to make filter selection decisions.
  • the parameters related to the network structure of the first CNN 310 including e.g., the number of convolutional layers, concatenation of convolutional layers, the number of convolution filters per convolutional layer, and the size of the convolution kernel, can be fixed, while the filter coefficients, e.g., the weighting coefficients and bias coefficients, can be configured based on the first trained CNN model 320.
  • the filter coefficients e.g., the weighting coefficients and bias coefficients
  • the parameter set of the first CNN 310 can be stored on the electronic device that performs the image reconstruction process.
  • the parameter set can be downloaded or updated from a server that performs training based on data collected from multiple electronic devices.
  • training for the first CNN 310 can be performed in the following steps:
  • a side information guide map is generated for a large number of undistorted images based on different noise sources from the natural image.
  • Corresponding distorted images can also be generated to form a training set.
  • the parameters of the CNN are initialized as ⁇ 0 , and the training-related high-level parameters such as the learning rate and the weight-updating algorithm are set.
  • the loss function can be adjusted to improve the converging process.
  • the third and fourth septs are repeated until the loss function converges, at which point the final parameter ⁇ final is outputted .
  • the final parameter ⁇ final can be represented by the first trained CNN model 320 to configure the first CNN 310.
  • a non-CNN filter e.g., SAO, ALF, or DEK
  • the process 200 proceeds from 222 to 230, where the selected non-CNN filter is used in the to generate the reference frame.
  • the process 200 proceeds from 222 to 224, where filter coefficients of the CNN filter are generated.
  • a second CNN is used to determine a set of controlling coefficients based on the distorted image, the side information guide map, or a combination thereof.
  • the controlling coefficients can be used to adjust the configured weights of the CNN filter to generate the filter coefficients for the CNN filter to be used in the filtering process.
  • FIG. 4 is a schematic diagram 400 illustrating using a CNN to generate coefficients of a filter used to reconstruct a distorted image, according to an implementation.
  • the diagram 400 includes a second CNN 410 and a second trained CNN model 420.
  • the second CNN 410 represents a CNN that is configured to generate controlling coefficient that can be used to adjust the configured filter weights or filter biases of the filter.
  • the second CNN 410 includes an input layer 412, a hidden layer 414, and an output layer 416.
  • the second CNN 410 can be implemented by using any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate.
  • the input layer 412 is configured to receive input data to the second CNN 410.
  • the input data can include the distorted image, as discussed previously.
  • the side information guide map can be generated as discussed previously, the side information guide map can also be included as the input data. If the side information guide map is not available, e.g., due to lack of information regarding the degree of distortion, the second CNN 410 can proceed without the side information guide map.
  • features of the distorted image can also be used as input to the second CNN 410. Examples of these features include linear features or non-linear features discussed previously.
  • a preconfigured computation boundary or a target quality factor can also be used as input to the second CNN 410 to generate the controlling coefficient.
  • the target quality factor can be a Peak Signal to Noise Ratio (PSNR) value. Following equation represents an example PSNR value calculation:
  • the input layer 412 can perform a channel merging operation and a convolution filtering operation.
  • the channel merging operation combines the side information guide map with the distorted image for each channel to generate a combined input data, represented as I.
  • the convolution filtering operation perform a convolutional filtering on the combined input data I, as illustrated in the following equation:
  • W 1 represents the weighting coefficients of the convolutional filter used in the input layer 412
  • B 1 represents the bias coefficients of the convolutional filter
  • g () represents a non-linear mapping function
  • F 1 (I) represents the output of the input layer 412.
  • each convolutional filter can have a kernel of a size c1 x f1 x f1, where C1 represents the number of channels in the input data, and f1 represents the spatial size of each kernel.
  • the input data can also be extracted from the feature map of a convolutional layer in the CNN filter.
  • the hidden layer 414 performs additional high-dimensional mapping on the image segmentation of sparse representations extracted by the input layer 412.
  • the hidden layer 414 includes one or more convolution layers.
  • F i (I) g (W i *F i-1 (I) +B i ) , i ⁇ ⁇ 2, 3, ..., N ⁇
  • F i (I) represents the output of the i-th convolutional layer
  • W i represents the weighting coefficients of the convolutional filter used in the i-th convolutional layer
  • B i represents the bias coefficients of the convolutional filter used in the i-th convolutional layer
  • g () represents a non-linear mapping function
  • the outputted controlling coefficients are used to adjust the configured CNN filter weights.
  • the adjustment can be performed by a multiplication operation.
  • the controlling coefficients can be multiplied with the configured filter weights to generate the coefficients of the CNN filter.
  • Other operations e.g., addition or convolution, can also be used to perform the adjustment.
  • the coefficients can be controlled at a channel level or a pixel level.
  • FIG. 5 includes schematic diagrams 510 and 520 that illustrate different levels of controlling coefficients, according to an implementation.
  • the schematic diagram 510 illustrates controlling coefficients at a pixel level, according to an implementation.
  • F i-j represents the j-th feature map of the i-th convolutional layer.
  • Cf i-j represents a set of controlling coefficients that are applied to different convolution kernals in the j-th feature map of the i-th convolutional layer.
  • F i-1, F i-2, and F i-3 represent the first, second, and third feature map of the i-th convolution layer, respectively.
  • Cf i-1, Cf i-2 , and Cf i-3 represent the set of controlling coefficients for the convolution kernals in these feature maps, respectively.
  • the dimension of the controlling coefficients is the same as the dimension of the feature map. Accordingly, different controlling coefficients in the set of the controlling coefficients can be applied to different convolution kernals in the same channel of the CNN filter.
  • the schematic diagram 520 illustrates controlling coefficients at a channel level, according to an implementation.
  • the same controlling coefficient is applied to different convolution kernals in the feature map.
  • S i-1 represents the controlling coefficient that is applied to all the convolution kernals in the first feature map of the i-th convolution layer.
  • S i-2 , and S i-3 represents the controlling coefficients that are applied to all the convolution kernals in the second and third feature maps of the i-th convolution layer, respectively.
  • the level of the controlling coefficients can be configured to be the same for the CNN filter, or differently or different feature map of the CNN filter.
  • each convolutional filter can have a kernel of a size c2 x f2 x f2, where C2 represents the number of channels for input data to the convolutional layer, and f2 represents the spatial size of each kernel.
  • the output layer 416 processes the high-dimensional image output from the hidden layer 414 and generates the controlling coefficients.
  • the second CNN 410 uses the second trained CNN model 420 to make filter selection decisions. Similarly to the first CNN 310, the parameters related to the network structure, the filter coefficients, or any combinations thereof can be configured based on the second trained CNN model 420.
  • the parameter set of the second CNN 410 can be stored on the electronic device that performs the image reconstruction process. Alternatively or additionally, the parameter set can be downloaded or updated from a server.
  • training for the second CNN 410 can be performed by constructing a training set, initializing parameter ⁇ 0 , and performing forward and backward calculation in an iteration process.
  • the training for the second CNN 410 can be performed jointly with or separately from the first CNN 310.
  • the loss function for the controlling coefficient training can be the represented as the following:
  • I n the input data based on the combination of side information guide map and distorted image.
  • ⁇ i ) represents the reconstructed image corresponding to the current parameter ⁇ i , X n represents undistorted image.
  • the distorted image is reconstructed using the selected filter, the generated filter coefficients, or a combination thereof.
  • the reconstructed image is used as the reference frame for the prediction operation of the next frame.
  • the controlling coefficients generated by the second CNN can also be used to simplify the reconstruction process. For example, if a control coefficient for a particular layer in the CNN filter is below a configured threshold, the particular layer can be skipped in the reconstruction operation. This approach can increase the processing speed and save computation resources.
  • the quantization and prediction step discussed previously can be skipped.
  • the input image may be the distorted image that is used as the input to step 220, 224, and 230 discussed previously.
  • While the process 200 illustrates an example encoding process, other image reconstruction process can also use CNNs to select filter type and generate controlling coefficients for CNN filters in a similar fashion.
  • a decoding process distorted images are generated based on encoded image data, and reconstructed images are generated based on the distorted image and reference frames.
  • a first CNN can be used to select a filter type used for the reconstruction process, and a second CNN can be used to generate controlling coefficients that are used to adjust filter weights of the reconstruction filter.
  • An image restoration can also be performed on distorted images that are generated or received in other processes.
  • FIG. 6 is a flowchart illustrating an example method 600 for reconstructing an image, according to an implementation.
  • the method 600 can be implemented by an electronic device shown in FIG. 1.
  • the method 600 can also be implemented using additional, fewer, or different entities.
  • the method 600 can also be implemented using additional, fewer, or different operations, which can be performed in the order shown or in a different order. In some instances, an operation or a group of operations can be iterated or repeated, for example, for a specified number of iterations or until a terminating condition is reached.
  • the example method 600 begins at 602, where image data of at least one distorted image is received.
  • at least one type of filter is selected from a plurality of filter types based on the image data.
  • the filter type selection is performed using a first CNN.
  • the plurality of filter types can include a deblocking (DBK) type, a sample adaptive offset (SAO) type, an Adaptive Loop Filter (ALF) type, or a CNN type.
  • the CNN type can be selected.
  • the filter is selected further based on a side information guide map of the distorted image or features of the distorted image.
  • controlling coefficients that adjust weights or biases of the reconstruction filter are generated by using a second CNN.
  • the second CNN uses the distorted image, the side information guide map of the distorted image, features of the distorted image, a preconfigured computation boundary, a target quality factor, or any combinations thereof to generate the controlling coefficients.
  • the step 604 and 606 can be performed together or separately.
  • a filter of the selected type is used to generate a reconstructed image corresponding to the distorted image.
  • the controlling coefficients are used to determine whether to omit a convolutional layer in generating a reconstructed image.
  • FIG. 9 is a schematic diagram 900 illustrating the construct of an example CNN filter 910 that reconstructs an image, according to an implementation. While the CNN filter 910 includes 4 layers (represented horizontally) and 3 scale levels (represented vertically) as illustrated. Additional layers and scale levels can be included in the CNN filter 910. As illustrated, each layer includes one or more features at each scale level, denoted as features 911, 921, 931 for layer 1, features 912, 922, 932 for layer 2, features 913, 923, 933 for layer 3, and features 914, 924, 934 for layer 4, respectively. As illustrated, the arrows represent convolutional (regular or strided) operations.
  • concatenations 941, 942, 943 represent concatenation operations at scale level 1; concatenations 951, 952, 953, 954, 955 represents concatenation operations at scale level 2; and concatenations 961, 962, 963, 964, 965, 966 represents concatenation operations at scale level 3.
  • the diagram 902 includes a distorted image 902.
  • the distorted image 902 is a blurred image of a cat.
  • Different layer paths in the CNN filter 910 can be taken to process the image data of the distorted image 902 to obtain a reconstructed image.
  • the CNN filter 910 includes layer paths 920 and 930.
  • the layer path 920 takes the image data, take a convolutional operation with features 911, then takes another convolutional operation with features 912, takes a concatenation operation at 941 (concatenating the convolutional operation outputs of features 911 and 912) , takes another convolutional operation at 913 and another concatenation operation at 942, then takes another convolutional operation at 914 and another concatenation operation at 943 to generate a reconstruction image as an output.
  • the layer path 930 takes concatenation operations 951, 952, 953, 954, 955, and 956 to generate a reconstruction image as an output. As illustrated, these concatenation operations concatenate the outputs of convolutional operation with features at scale level 1 and scale level 2. Therefore, while layer path 930 may generate a different and better image that the layer path 920, the layer path 930 involves more computations and therefore can be more time and resource consuming.
  • a CNN can be used to select the layer path to process the image data of the distorted image.
  • the CNN can take input of the image data of the at least one distorted image, a side information guide map of the at least one distorted image, features of the at least one distorted image, or any combinations thereof, and output a selected layer path.
  • the input of the CNN can further include a preconfigured computation boundary or a target quality factor.
  • a different CNN can take input of the image data of the at least one distorted image, a side information guide map of the at least one distorted image, features of the at least one distorted image, a preconfigured computation boundary, a target quality factor or any combinations thereof, and generate controlling coefficients to adjust the weights or biases of the filtering operations (e.g., the convolutional operations) on the selected layer path.
  • the filtering operations e.g., the convolutional operations
  • FIG. 10 is a flowchart illustrating another example method 1000 for reconstructing an image, according to an implementation.
  • the method 1000 can be implemented by an electronic device shown in FIG. 1.
  • the method 1000 can also be implemented using additional, fewer, or different entities.
  • the method 1000 can also be implemented using additional, fewer, or different operations, which can be performed in the order shown or in a different order. In some instances, an operation or a group of operations can be iterated or repeated, for example, for a specified number of iterations or until a terminating condition is reached.
  • the example method 1000 begins at 602, where image data of at least one distorted image is received.
  • a layer path of a convolutional neural network (CNN) filter is selected based on the image data, where the layer path of the CNN filter is selected by using a first CNN.
  • the selected layer path of the CNN filter is used to generate a reconstructed image corresponding to the distorted image.
  • CNN convolutional neural network
  • FIG. 7 is a block diagram of an example computer system 700 used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures, as described in the instant disclosure, according to an implementation.
  • the computer system 700 or more than one computer system 700, can be used to implement the electronic device that reconstructs the image, and the server that trains and provides the CNN models.
  • the illustrated computer 702 is intended to encompass any computing device, such as a server, desktop computer, laptop/notebook computer, wireless data port, smart phone, personal data assistant (PDA) , tablet computing device, one or more processors within these devices, or any other suitable processing device, including physical or virtual instances (or both) of the computing device. Additionally, the computer 702 may comprise a computer that includes an input device, such as a keypad, keyboard, touch screen, or other device that can accept user information, and an output device that conveys information associated with the operation of the computer 702, including digital data, visual, or audio information (or a combination of information) , or a graphical user interface (GUI) .
  • GUI graphical user interface
  • the computer 702 can serve in a role as a client, network component, a server, a database or other persistency, or any other component (or a combination of roles) of a computer system for performing the subject matter described in the instant disclosure.
  • the illustrated computer 702 is communicably coupled with a network 730.
  • one or more components of the computer 702 may be configured to operate within environments, including cloud-computing-based, local, global, or other environment (or a combination of environments) .
  • the computer 702 is an electronic computing device operable to receive, transmit, process, store, or manage data and information associated with the described subject matter. According to some implementations, the computer 702 may also include, or be communicably coupled with, an application server, e-mail server, web server, caching server, streaming data server, or other server (or a combination of servers) .
  • the computer 702 can receive requests over network 730 from a client application (for example, executing on another computer 702) and respond to the received requests by processing the received requests using an appropriate software application (s) .
  • requests may also be sent to the computer 702 from internal users (for example, from a command console or by other appropriate access methods) , external or third-parties, other automated applications, as well as any other appropriate entities, individuals, systems, or computers.
  • Each of the components of the computer 702 can communicate using a system bus 703.
  • any or all of the components of the computer 702, hardware or software (or a combination of both hardware and software) may interface with each other or the interface 704 (or a combination of both) , over the system bus 703 using an application programming interface (API) 712 or a service layer 713 (or a combination of the API 712 and service layer 713) .
  • the API 712 may include specifications for routines, data structures, and object classes.
  • the API 712 may be either computer-language independent or dependent and refer to a complete interface, a single function, or even a set of APIs.
  • the service layer 713 provides software services to the computer 702 or other components (whether or not illustrated) that are communicably coupled to the computer 702.
  • the functionality of the computer 702 may be accessible for all service consumers using this service layer.
  • Software services, such as those provided by the service layer 713 provide reusable, defined functionalities through a defined interface.
  • the interface may be software written in JAVA, C++, or other suitable language providing data in extensible markup language (XML) format or other suitable formats.
  • XML extensible markup language
  • alternative implementations may illustrate the API 712 or the service layer 713 as stand-alone components in relation to other components of the computer 702 or other components (whether or not illustrated) that are communicably coupled to the computer 702.
  • any or all parts of the API 712 or the service layer 713 may be implemented as child or sub-modules of another software module, enterprise application, or hardware module without departing from the scope of this disclosure.
  • the computer 702 includes an interface 704. Although illustrated as a single interface 704 in FIG. 7, two or more interfaces 704 may be used according to particular needs, desires, or particular implementations of the computer 702.
  • the interface 704 is used by the computer 702 for communicating with other systems that are connected to the network 730 (whether illustrated or not) in a distributed environment.
  • the interface 704 includes logic encoded in software or hardware (or a combination of software and hardware) and is operable to communicate with the network 730. More specifically, the interface 704 may include software supporting one or more communication protocols associated with communications such that the network 730 or interface’s hardware is operable to communicate physical signals within and outside of the illustrated computer 702.
  • the computer 702 includes a processor 705. Although illustrated as a single processor 705 in FIG. 7, two or more processors may be used according to particular needs, desires, or particular implementations of the computer 702. Generally, the processor 705 executes instructions and manipulates data to perform the operations of the computer 702 and any algorithms, methods, functions, processes, flows, and procedures as described in the instant disclosure.
  • the computer 702 also includes a database 706 that can hold data for the computer 702 or other components (or a combination of both) that can be connected to the network 730 (whether illustrated or not) .
  • database 706 can be an in-memory, conventional, or other type of database storing data consistent with this disclosure.
  • database 706 can be a combination of two or more different database types (for example, a hybrid in-memory and conventional database) according to particular needs, desires, or particular implementations of the computer 702 and the described functionality.
  • two or more databases can be used according to particular needs, desires, or particular implementations of the computer 702 and the described functionality.
  • database 706 is illustrated as an integral component of the computer 702, in alternative implementations, database 706 can be external to the computer 702.
  • the computer 702 also includes a memory 707 that can hold data for the computer 702 or other components (or a combination of both) that can be connected to the network 730 (whether illustrated or not) .
  • memory 707 can be Random Access Memory (RAM) , Read-Only Memory (ROM) , optical, magnetic, and the like, storing data consistent with this disclosure.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • memory 707 can be a combination of two or more different types of memory (for example, a combination of RAM and magnetic storage) according to particular needs, desires, or particular implementations of the computer 702 and the described functionality. Although illustrated as a single memory 707 in FIG.
  • memories 707 can be used according to particular needs, desires, or particular implementations of the computer 702 and the described functionality. While memory 707 is illustrated as an integral component of the computer 702, in alternative implementations, memory 707 can be external to the computer 702.
  • the application 708 is an algorithmic software engine providing functionality according to particular needs, desires, or particular implementations of the computer 702, particularly with respect to functionality described in this disclosure.
  • application 708 can serve as one or more components, modules, or applications.
  • the application 708 may be implemented as multiple applications 708 on the computer 702.
  • the application 708 can be external to the computer 702.
  • the computer 702 can also include a power supply 714.
  • the power supply 714 can include a rechargeable or non-rechargeable battery that can be configured to be either user-or non-user-replaceable.
  • the power supply 714 can include power-conversion or management circuits (including recharging, standby, or other power management functionality) .
  • the power supply 714 can include a power plug to allow the computer 702 to be plugged into a wall socket or other power source to, for example, power the computer 702 or recharge a rechargeable battery.
  • computers 702 there may be any number of computers 702 associated with, or external to, a computer system containing computer 702, each computer 702 communicating over network 730.
  • clients, ” “user, ” and other appropriate terminology may be used interchangeably, as appropriate, without departing from the scope of this disclosure.
  • this disclosure contemplates that many users may use one computer 702, or that one user may use multiple computers 702.
  • FIG. 8 is a schematic diagram illustrating an example structure of an electronic circuit 800 that reconstructs images as described in the present disclosure, according to an implementation.
  • the electronic circuit 800 can be a component or a functional block of a codec, e.g., a video codec.
  • the electronic circuit 800 can also be a component or a functional block of a graphic processing unit.
  • the electronic circuit 800 includes a receiving circuit 802, a filter selection circuit 804, a filter coefficient determination circuit 806, a storage circuit 808, and a processing circuit 810 that are coupled to, or capable of communicating with, the receiving circuit 802, the filter selection circuit 804, the filter coefficient determination circuit 806, and the storage circuit 808.
  • the electronic circuit 800 can further include one or more circuits for performing any one or a combination of steps described in the present disclosure. In some implementations, some or all of these component circuits can be combined into fewer components.
  • the receiving circuit 802 is configured to receive image data that represents a distorted image.
  • the filter selection circuit 804 is configured to select a type of filter from a plurality of filter types based on the image data by using a first CNN.
  • the processing circuit 810 is configured to use a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
  • the filter coefficient circuit 806 is configured to generate controlling coefficients to adjust weights of the filter by using a second CNN.
  • the storage circuit 808 is configured to store training models used by the first CNN and the second CNN.
  • Implementations of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Implementations of the subject matter described in this specification can be implemented as one or more computer programs, that is, one or more modules of computer program instructions encoded on a tangible, non-transitory, computer-readable computer-storage medium for execution by, or to control the operation of, data processing apparatus.
  • the program instructions can be encoded in/on an artificially generated propagated signal, for example, a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • the computer-storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of computer-storage mediums.
  • real-time, ” “real time, ” “real time, ” “real (fast) time (RFT) , ” “near (ly) real-time (NRT) , ” “quasi real-time, ” or similar terms means that an action and a response are temporally proximate such that an individual perceives the action and the response occurring substantially simultaneously.
  • the time difference for a response to display (or for an initiation of a display) of data following the individual’s action to access the data may be less than 1 ms, less than 1 sec., or less than 5 secs.
  • data processing apparatus refers to data processing hardware and encompass all kinds of apparatus, devices, and machines for processing data, including by way of example, a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can also be or further include special purpose logic circuitry, for example, a Central Processing Unit (CPU) , a Field Programmable Gate Array (FPGA) , or an Application-specific Integrated Circuit (ASIC) .
  • CPU Central Processing Unit
  • FPGA Field Programmable Gate Array
  • ASIC Application-specific Integrated Circuit
  • the data processing apparatus or special purpose logic circuitry may be hardware-or software-based (or a combination of both hardware-and software-based) .
  • the apparatus can optionally include code that creates an execution environment for computer programs, for example, code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of execution environments.
  • code that constitutes processor firmware for example, code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of execution environments.
  • the present disclosure contemplates the use of data processing apparatuses with or without conventional operating systems, for example LINUX, UNIX, WINDOWS, MAC OS, ANDROID, IOS, or any other suitable conventional operating system.
  • a computer program which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data, for example, one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, for example, files that store one or more modules, sub-programs, or portions of code.
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. While portions of the programs illustrated in the various figures are shown as individual modules that implement the various features and functionality through various objects, methods, or other processes, the programs may instead include a number of sub-modules, third-party services, components, libraries, and such, as appropriate. Conversely, the features and functionality of various components can be combined into single components, as appropriate. Thresholds used to make computational determinations can be statically, dynamically, or both statically and dynamically determined.
  • the methods, processes, or logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.
  • the methods, processes, or logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, for example, a CPU, an FPGA, or an ASIC.
  • Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors, both, or any other kind of CPU.
  • a CPU will receive instructions and data from a ROM or a Random Access Memory (RAM) , or both.
  • RAM Random Access Memory
  • the essential elements of a computer are a CPU, for performing or executing instructions, and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, receive data from or transfer data to, or both, one or more mass storage devices for storing data, for example, magnetic, magneto-optical disks, or optical disks.
  • mass storage devices for storing data, for example, magnetic, magneto-optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, for example, a mobile telephone, a Personal Digital Assistant (PDA) , a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, for example, a Universal Serial Bus (USB) flash drive, to name just a few.
  • PDA Personal Digital Assistant
  • GPS Global Positioning System
  • USB Universal Serial Bus
  • Computer-readable media suitable for storing computer program instructions and data includes non-volatile memory, media and memory devices, including by way of example, semiconductor memory devices, for example, Erasable Programmable Read-Only Memory (EPROM) , Electrically Erasable Programmable Read-Only Memory (EEPROM) , and flash memory devices; magnetic disks, for example, internal hard disks or removable disks; magneto-optical disks; and CD-ROM, DVD+/-R, DVD-RAM, and DVD-ROM disks.
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • flash memory devices including by way of example, semiconductor memory devices, for example, Erasable Programmable Read-Only Memory (EPROM) , Electrically Erasable Programmable Read-Only Memory (EEPROM) , and flash memory devices; magnetic disks, for example, internal hard disks or removable disks; magneto-optical disks; and CD-ROM, DVD+/-R, DVD
  • the memory may store various objects or data, including caches, classes, frameworks, applications, backup data, jobs, web pages, web page templates, database tables, repositories storing dynamic information, and any other appropriate information including any parameters, variables, algorithms, instructions, rules, constraints, or references thereto. Additionally, the memory may include any other appropriate data, such as logs, policies, security or access data, reporting files, as well as others.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • implementations of the subject matter described in this specification can be implemented on a computer having a display device, for example, a Cathode Ray Tube (CRT) , Liquid Crystal Display (LCD) , Light Emitting Diode (LED) , or plasma monitor, for displaying information to the user and a keyboard and a pointing device, for example, a mouse, trackball, or trackpad by which the user can provide input to the computer.
  • a display device for example, a Cathode Ray Tube (CRT) , Liquid Crystal Display (LCD) , Light Emitting Diode (LED) , or plasma monitor
  • LCD Liquid Crystal Display
  • LED Light Emitting Diode
  • plasma monitor for displaying information to the user
  • keyboard and a pointing device for example, a mouse, trackball, or trackpad by which the user can provide input to the computer.
  • Input may also be provided to the computer using a touchscreen, such as a tablet computer surface with pressure sensitivity, a multi-touch screen
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s client device in response to requests received from the web browser.
  • GUI graphical user interface
  • GUI may be used in the singular or the plural to describe one or more graphical user interfaces and each of the displays of a particular graphical user interface. Therefore, a GUI may represent any graphical user interface, including but not limited to, a web browser, a touch screen, or a Command Line Interface (CLI) that processes information and efficiently presents the information results to the user.
  • a GUI may include a plurality of User Interface (UI) elements, some or all associated with a web browser, such as interactive fields, pull-down lists, and buttons. These and other UI elements may be related to or represent the functions of the web browser.
  • UI User Interface
  • Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, for example, as a data server, or that includes a middleware component, for example, an application server, or that includes a front-end component, for example, a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of wireline or wireless digital data communication (or a combination of data communication) , for example, a communication network.
  • Examples of communication networks include a Local Area Network (LAN) , a Radio Access Network (RAN) , a Metropolitan Area Network (MAN) , a Wide Area Network (WAN) , Worldwide Interoperability for Microwave Access (WIMAX) , a Wireless Local Area Network (WLAN) using, for example, 802.11 a/b/g/n or 802.20 (or a combination of 802.11x and 802.20 or other protocols consistent with this disclosure) , all or a portion of the Internet, or any other communication system or systems at one or more locations (or a combination of communication networks) .
  • the network may communicate with, for example, Internet Protocol (IP) packets, Frame Relay frames, Asynchronous Transfer Mode (ATM) cells, voice, video, data, or other suitable information (or a combination of communication types) between network addresses.
  • IP Internet Protocol
  • ATM Asynchronous Transfer Mode
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • any claimed implementation is considered to be applicable to at least a computer-implemented method; a non-transitory, computer-readable medium storing computer-readable instructions to perform the computer-implemented method; and a computer system comprising a computer memory interoperably coupled with a hardware processor configured to perform the computer-implemented method or the instructions stored on the non-transitory, computer-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé implémenté par ordinateur pour reconstruire des images numériques. Le procédé comprend : la réception, par au moins un processeur, de données d'image d'au moins une image déformée ; la sélection, par le ou les processeurs, d'au moins un type de filtre parmi une pluralité de types de filtre, sur la base des données d'image, le type de filtre étant sélectionné au moyen d'un premier réseau neuronal convolutionnel (CNN) ; et l'utilisation d'un filtre du type sélectionné pour générer une image reconstruite correspondant à l'image déformée.
PCT/CN2018/108441 2018-09-28 2018-09-28 Reconstruction d'images déformées, au moyen d'un réseau neuronal convolutif WO2020062074A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/108441 WO2020062074A1 (fr) 2018-09-28 2018-09-28 Reconstruction d'images déformées, au moyen d'un réseau neuronal convolutif

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/108441 WO2020062074A1 (fr) 2018-09-28 2018-09-28 Reconstruction d'images déformées, au moyen d'un réseau neuronal convolutif

Publications (1)

Publication Number Publication Date
WO2020062074A1 true WO2020062074A1 (fr) 2020-04-02

Family

ID=69953262

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108441 WO2020062074A1 (fr) 2018-09-28 2018-09-28 Reconstruction d'images déformées, au moyen d'un réseau neuronal convolutif

Country Status (1)

Country Link
WO (1) WO2020062074A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022067806A1 (fr) * 2020-09-30 2022-04-07 Oppo广东移动通信有限公司 Procédés de codage et de décodage vidéo, codeur, décodeur et support de stockage
CN114339221A (zh) * 2020-09-30 2022-04-12 脸萌有限公司 用于视频编解码的基于卷积神经网络的滤波器
WO2022245640A3 (fr) * 2021-05-18 2023-01-05 Tencent America LLC Apprentissage du facteur de qualité de substitution pour filtre de boucle basé sur un réseau neuronal adaptatif de qualité
US11792438B2 (en) 2020-10-02 2023-10-17 Lemon Inc. Using neural network filtering in video coding
WO2024077575A1 (fr) * 2022-10-13 2024-04-18 Oppo广东移动通信有限公司 Procédé de filtrage en boucle basé sur un réseau neuronal, procédé et appareil de codage vidéo, procédé et appareil de décodage vidéo, et système

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100329362A1 (en) * 2009-06-30 2010-12-30 Samsung Electronics Co., Ltd. Video encoding and decoding apparatus and method using adaptive in-loop filter
US20130113880A1 (en) * 2011-11-08 2013-05-09 Jie Zhao High Efficiency Video Coding (HEVC) Adaptive Loop Filter
CN104350752A (zh) * 2012-01-17 2015-02-11 华为技术有限公司 用于高性能视频编码中的无损编码模式的环内滤波
CN108447036A (zh) * 2018-03-23 2018-08-24 北京大学 一种基于卷积神经网络的低光照图像增强方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100329362A1 (en) * 2009-06-30 2010-12-30 Samsung Electronics Co., Ltd. Video encoding and decoding apparatus and method using adaptive in-loop filter
US20130113880A1 (en) * 2011-11-08 2013-05-09 Jie Zhao High Efficiency Video Coding (HEVC) Adaptive Loop Filter
CN104350752A (zh) * 2012-01-17 2015-02-11 华为技术有限公司 用于高性能视频编码中的无损编码模式的环内滤波
CN108447036A (zh) * 2018-03-23 2018-08-24 北京大学 一种基于卷积神经网络的低光照图像增强方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022067806A1 (fr) * 2020-09-30 2022-04-07 Oppo广东移动通信有限公司 Procédés de codage et de décodage vidéo, codeur, décodeur et support de stockage
CN114339221A (zh) * 2020-09-30 2022-04-12 脸萌有限公司 用于视频编解码的基于卷积神经网络的滤波器
CN114339221B (zh) * 2020-09-30 2024-06-07 脸萌有限公司 用于视频编解码的基于卷积神经网络的滤波器
US11792438B2 (en) 2020-10-02 2023-10-17 Lemon Inc. Using neural network filtering in video coding
WO2022245640A3 (fr) * 2021-05-18 2023-01-05 Tencent America LLC Apprentissage du facteur de qualité de substitution pour filtre de boucle basé sur un réseau neuronal adaptatif de qualité
WO2024077575A1 (fr) * 2022-10-13 2024-04-18 Oppo广东移动通信有限公司 Procédé de filtrage en boucle basé sur un réseau neuronal, procédé et appareil de codage vidéo, procédé et appareil de décodage vidéo, et système

Similar Documents

Publication Publication Date Title
WO2020062074A1 (fr) Reconstruction d'images déformées, au moyen d'un réseau neuronal convolutif
US10880551B2 (en) Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (VQA)
US11606560B2 (en) Image encoding and decoding, video encoding and decoding: methods, systems and training methods
Mentzer et al. Conditional probability models for deep image compression
KR102332476B1 (ko) 신경망을 이용한 타일 이미지 압축
US20200092556A1 (en) Efficient Use of Quantization Parameters in Machine-Learning Models for Video Coding
Cai et al. Efficient variable rate image compression with multi-scale decomposition network
EP3828811A1 (fr) Dispositif électronique, son procédé de commande et système
US11956447B2 (en) Using rate distortion cost as a loss function for deep learning
JP7345650B2 (ja) 代替エンドツーエンドビデオコーディング
US11881003B2 (en) Image compression and decoding, video compression and decoding: training methods and training systems
CN107113426B (zh) 使用广义图形参数执行基于图形的变换的方法和设备
Jeong et al. An overhead-free region-based JPEG framework for task-driven image compression
EP4387233A1 (fr) Procédé de codage et de décodage vidéo, codeur, décodeur et support de stockage
Petrov et al. Intra frame compression and video restoration based on conditional markov processes theory
Xu et al. Perceptual rate-distortion optimized image compression based on block compressive sensing
CN102948147A (zh) 基于变换系数直方图的视频速率控制
Xie et al. Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT
Nami et al. Lightweight Multitask Learning for Robust JND Prediction using Latent Space and Reconstructed Frames
US11683515B2 (en) Video compression with adaptive iterative intra-prediction
US11750847B2 (en) Quality-adaptive neural network-based loop filter with smooth quality control by meta-learning
Wang et al. Adaptive CNN-Based Image Compression Model for Improved Remote Desktop Experience
EP4231643A1 (fr) Procédé de compression d'image et appareil de mise en uvre associé
US20220383554A1 (en) Substitutional quality factor learning for quality-adaptive neural network-based loop filter
Jiang et al. Compressed vision information restoration based on cloud prior and local prior

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18934919

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18934919

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18934919

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.12.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18934919

Country of ref document: EP

Kind code of ref document: A1