EP3846475A1 - Prétraitement de données d'image - Google Patents
Prétraitement de données d'image Download PDFInfo
- Publication number
- EP3846475A1 EP3846475A1 EP20199344.1A EP20199344A EP3846475A1 EP 3846475 A1 EP3846475 A1 EP 3846475A1 EP 20199344 A EP20199344 A EP 20199344A EP 3846475 A1 EP3846475 A1 EP 3846475A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- preprocessing
- network
- preprocessing network
- image
- image data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007781 pre-processing Methods 0.000 title claims abstract description 104
- 238000000034 method Methods 0.000 claims abstract description 82
- 230000001419 dependent effect Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 50
- 238000012549 training Methods 0.000 claims description 37
- 230000008569 process Effects 0.000 claims description 18
- 238000013139 quantization Methods 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 14
- 230000000007 visual effect Effects 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000013442 quality metrics Methods 0.000 claims description 9
- 230000010339 dilation Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- 239000010410 layer Substances 0.000 description 46
- 230000004913 activation Effects 0.000 description 13
- 238000001994 activation Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 230000003044 adaptive effect Effects 0.000 description 5
- 230000001143 conditioned effect Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000007906 compression Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 4
- 230000003750 conditioning effect Effects 0.000 description 4
- 238000012886 linear function Methods 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 239000000654 additive Substances 0.000 description 3
- 230000000996 additive effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 238000001303 quality assessment method Methods 0.000 description 3
- 208000009119 Giant Axonal Neuropathy Diseases 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006735 deficit Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 201000003382 giant axonal neuropathy 1 Diseases 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000009738 saturating Methods 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 101000796953 Homo sapiens Protein ADM2 Proteins 0.000 description 1
- 102100032586 Protein ADM2 Human genes 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/23439—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
- H04N19/126—Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/177—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/184—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/65—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/86—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- G—PHYSICS
- G03—PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
- G03G—ELECTROGRAPHY; ELECTROPHOTOGRAPHY; MAGNETOGRAPHY
- G03G15/00—Apparatus for electrographic processes using a charge pattern
- G03G15/50—Machine control of apparatus for electrographic processes using a charge pattern, e.g. regulating differents parts of the machine, multimode copiers, microprocessor control
- G03G15/5004—Power supply control, e.g. power-saving mode, automatic power turn-off
Definitions
- the present disclosure concerns computer-implemented methods of preprocessing image data prior to encoding with an external encoder.
- the disclosure is particularly, but not exclusively, applicable where the image data is video data.
- bitrate control When a set of images or video is sent over a dedicated IP packet switched or circuit-switched connection, a range of streaming and encoding recipes must be selected in order to ensure the best possible use of the available bandwidth.
- the image or video encoder must be tuned to provide for some bitrate control mechanism; and (ii) the streaming server must provide for the means to control or switch the stream when the bandwidth of the connection does not suffice for the transmitted data.
- Methods for tackling bitrate control include: constant bitrate (CBR) encoding, variable bitrate (VBR) encoding, or solutions based on a video buffer verifier (VBV) model [9]-[12], such as QVBR, CABR, capped-CRF, etc.
- This quality loss is measured with a range of quality metrics, ranging from low-level signal-to-noise ratio metrics, all the way to complex mixtures of expert metrics that capture higher-level elements of human visual attention and perception.
- One such metric that is now well-recognised by the video community and the Video Quality Experts Group (VQEG) is the Video Multi-method Assessment Fusion (VMAF), proposed by Netflix.
- VMAF Video Multi-method Assessment Fusion
- JND Just-Noticeable Difference
- the first type of approaches consists of solutions attempting device-based enhancement, i.e. advancing the state-of-the-art in intelligent video upscaling at the video player when the content has been "crudely” downscaled using a linear filter like the bicubic or variants of the Lanczos or other polyphase filters [9]-[12] and adaptive filters [13]-[15].
- a linear filter like the bicubic or variants of the Lanczos or other polyphase filters [9]-[12] and adaptive filters [13]-[15].
- SoC solutions embedded within the latest 8K televisions. While there have been some advances in this domain [13]-[15], this category of solutions is limited by the stringent complexity constraints and power consumption limitations of consumer electronics.
- the received content at the client is already distorted from the compression (quite often severely so), there are theoretical limits to the level of picture detail that can be recovered by client-side upscaling.
- a second family of approaches consists of the development of bespoke image and video encoders, typically based on deep neural networks [16]-[20]. This deviates from encoding, stream-packaging and stream-transport standards and creates bespoke formats, so has the disadvantage of requiring bespoke transport mechanisms and bespoke decoders in the client devices.
- video encoding has been developed most opportunities for improving gain in different situations have been taken, thereby making the current state-of-the-art in spatio-temporal prediction and encoding very difficult to outperform with neural-network solutions that are designed from scratch and learn from data.
- the third family of methods comprises perceptual optimisation of existing standards-based encoders by using perceptual metrics during encoding.
- the challenges are that:
- the present disclosure seeks to solve or mitigate some or all of these above-mentioned problems.
- aspects of the present disclosure seek to provide improved image and video encoding and decoding methods, and in particular methods that can be used in combination with existing image and video codec frameworks.
- a computer-implemented method of preprocessing prior to encoding with an external encoder, image data using a preprocessing network comprising a set of inter-connected learnable weights, the method comprising:
- the representation space can be partitioned within a single model, reducing the need to train multiple models for every possible encoder setting, and reducing the need to redesign and/or reconfigure the preprocessing model for a new encoder and/or a new standard, e.g. from HEVC to VP9 encoding.
- the methods described herein include a preprocessing model that exploits knowledge of encoding parameters and characteristics to tune the parameters and/or operation of the preprocessing model. This enables the preprocessing of the image data to be performed optimally in order to make the external encoder (which may be a standards-based encoder) operate as efficiently as possible, by exploiting the knowledge of the characteristics and configuration settings of the encoder.
- a visual quality of the subsequently encoded and decoded image data may be improved for a given encoding bitrate, and/or an encoding bitrate to achieve a given visual quality may be reduced. Fidelity of the subsequently encoded and decoded image data to the original image data may also be improved through use of the methods described herein.
- the described methods include technical solutions that are learnable based on data and can utilize a standard image/video encoder with a predetermined encoding recipe for bitrate, quantization and temporal prediction parameters, and fidelity parameters.
- An overall technical question addressed can be abstracted as: how to optimally preprocess (or "precode") the pixel stream of a video into a (typically) smaller pixel stream, in order to make standards-based encoders as efficient (and fast) as possible? This question may be especially relevant where the client device can upscale the content with its existing linear filters, and/or where perceptual quality is measured with the latest advances in perceptual quality metrics from the literature, e.g., using VMAF or similar metrics.
- the one or more configuration settings comprise at least one of a bitrate, a quantization, and a target fidelity of encoding performed by the external encoder.
- the weights of the preprocessing network are trained using end-to-end back-propagation of errors.
- the errors are calculated using a cost function indicative of an estimated image error associated with encoding the output pixel representation using the external encoder configured according to the one or more configuration settings (or configured with settings similar to the one or more configuration settings).
- the cost function is indicative of an estimate of at least one of: an image noise of the output of decoding the encoded output pixel representation; a bitrate to encode the output pixel representation; and a perceived quality of the output of decoding the encoded output pixel representation.
- the preprocessing network may be used to reduce noise in the final displayed image(s), reduce the bitrate to encode the output pixel representation, and/or improve the visual quality of the final displayed image(s).
- the estimated image error is indicative of the similarity of the output of decoding the encoded output pixel representation to the received image data based on at least one reference-based quality metric, the at least one reference based quality metric comprising at least one of: an elementwise loss function such as mean squared error, MSE; a structural similarity index metric, SSIM; and a visual information fidelity metric, VIF.
- the preprocessing network may be used to improve the fidelity of the final displayed image(s) relative to the original input image(s).
- the weights of the preprocessing network are trained in a manner that balances perceptual quality of the post-decoded output with fidelity to the original image.
- the cost function is formulated using an adversarial learning framework, in which the preprocessing network is encouraged to generate output pixel representations that reside on the natural image manifold.
- the preprocessing network is trained to produce images which lie on the natural image manifold, and/or to avoid producing images which do not lie on the natural image manifold (and which may look artificial or unrealistic). This facilitates an improvement in user perception of the subsequently displayed image(s).
- the weights of the preprocessing network are trained using training image data, prior to deployment of the preprocessing network, based on a random initialisation or a prior training phase.
- the weights of the preprocessing network are trained using image data obtained during deployment of the preprocessing network.
- the weights of the preprocessing network may be adjusted and/or reconfigured after the initial training phase, using additional image data. This can enable the preprocessing network to adapt to new encoder settings, new external encoders, and/or new types of image content, thereby improving the flexibility of the preprocessing network.
- the resolution of the received image data is different to the resolution of the output pixel representation.
- the resolution of the output pixel representation may be lower than the resolution of the received image data.
- the external encoder can operate more efficiently by processing a lower resolution image.
- the parameters used when downscaling/upscaling can be chosen to provide different desired results, for example to improve accuracy (i.e. how similarly the recovered images are to the original).
- the downscaling/upscaling process may be designed to be in accordance with downscaling/upscaling performed by the external encoder, so that the downscaled/upscaled images can be encoded by the external encoder without essential information being lost.
- the preprocessing network comprises an artificial neural network including multiple layers having a convolutional architecture, with each layer being configured to receive the output of one or more previous layers.
- the outputs of each layer of the preprocessing network are passed through a non-linear parametric linear rectifier function, pReLU.
- pReLU non-linear parametric linear rectifier function
- Other non-linear functions may be used in other embodiments.
- the preprocessing network comprises a dilation operator configured to expand a receptive field of a convolutional operation of a given layer of the preprocessing network. Increasing the receptive field allows for integration of larger global context.
- the weights of the preprocessing network are trained using a regularisation method that controls the capacity of the preprocessing network, the regularisation method comprising using hard or soft constraints and/or a normalisation technique on the weights that reduces a generalisation error.
- the one or more images are downscaled using one or more filters.
- a filter of the one or more filters may be an edge-detection filter.
- a filter of the one or more filters is a blur filter.
- the blur filter may be a Gaussian blur filter, for example.
- the output pixel representation is encoded using the external encoder.
- the external encoder may be configured according to the one or more configuration settings that are input to the preprocessing network, or may be configured according to configuration settings that are similar to, but not identical to, the configuration settings that are input to the preprocessing network.
- the encoded pixel representation is output for transmission, for example to a decoder, for subsequent decoding and display of the image data. In alternative embodiments, the encoded pixel representation is output for storage.
- a computer-implemented method of preprocessing one or multiple images into output pixel representations that can subsequently be encoded with any external still-image or video encoder comprises a set of weights inter-connected in a network (termed as "preprocessing network") that ingests: (i) the input pixels from the single or plurality of images; (ii) the external encoder configuration settings corresponding to bitrate, quantization or target fidelity of the encoding. If these encoding configuration settings are not known precisely, then approximations can be provided. These settings can be average settings for an entire video, or can be provided per scene, per individual frame, or even per segment of an individual frame or image.
- the preprocessing network is configured to convert input pixels of each frame to output pixel representations by applying the network weights on the input pixels and accumulating the result of the output product and summation between weights and subsets of input pixels.
- the network weights, as well as offset or bias terms used for sets of one or more weights, are conditioned on the aforementioned bitrate, quantization or fidelity settings.
- the weights are updated via a training process that uses end-to-end back-propagation of errors computed on the outputs to each group of weights, biases and offsets based on the network connections.
- the output errors are computed via a cost function that estimates the image or video frame error after encoding and decoding the output pixel representation of the preprocessing network with the aforementioned external encoder using bitrate, quantization or fidelity settings close to, or identical, to the ones used as inputs to the network.
- the utilized cost function may comprise multiple terms that, for the output after decoding, express at least one of: image or video frame noise estimates; functions that estimate the rate to encode the image or video frame; estimates or functions expressing the perceived quality of the output from human viewers.
- the preprocessing network and cost-function components are trained or refined for any number of iterations prior to deployment (offline) based on training data or, optionally, have their training fine-tuned for any number of iterations based on data obtained during the preprocessing network and encoder-decoder operation during deployment (online).
- the disclosed preprocessing network can optionally increase or decrease the resolution of the pixel data in accordance to a given up scaling or downscaling ratio.
- the ratio can be an integer or fractional number and also includes ratio of 1 (unity) that corresponds to no resolution change. For example, ratio of 2/3 and input image resolution equal to 1080p (1920x1080 pixels, with each pixel comprising 3 color values) would correspond to the output being an image of 720p resolution (1280x720 pixels).
- the network can optionally be structured in a cascaded structure of layers of activations. Each activation in each layer can be connected to any subset (or the entirety) of activations of the next layer, or a subsequent layer by a function determined by the layer weights.
- the network can optionally comprise a single or multiple layers of a convolutional architecture, with each layer taking the outputs of the previous layer and implementing a filtering process via them that realizes the mathematical operation of convolution.
- some or all the outputs of each layer can optionally be passed through a non-linear parametric linear rectifier function (pReLU) or other non-linear functions that include, but are not limited to, variations of the sigmoid function or any variation of functions that produce values based on threshold criteria.
- pReLU parametric linear rectifier function
- some or all of the convolutional layers of the preprocessing architecture can include implementations of dilation operators that expand the receptive field of the convolutional operation per layer.
- the training of the preprocessing network weights can be done with the addition of regularization methods that control the network capacity, via hard or soft constraints or normalization techniques on the layer weights or activations that reduces the generalization error but not the training error.
- the utilized cost functions can express the fidelity to the input images based on reference-based quality metrics that include one or more of:
- the utilized encoder is a standards-based image or video encoder such as an ISO JPEG or ISO MPEG standard encoder, or a proprietary or royalty-free encoder, such as, but not limited to, an AOMedia encoder.
- high resolution and low resolution image or video pairs can optionally be provided and the low resolution image upscaled and optimized to improve and/or match quality or rate of the high resolution image using the disclosed methods as the means to achieve this.
- this corresponds to a component on the decoder (client side) that applies such processing after the external decoder has provided the decoded image or video frames.
- the training of the preprocessing network weights, and any adjustment to the cost functions are performed at frequent or in-frequent intervals with new measurements from quality, bitrate, perceptual quality scores from humans, or encoded image data from external image or video encoders, and the updated weights and cost functions replace the previously-utilized ones.
- the external encoder comprises an image codec.
- the image data comprises video data and the one or more images comprise frames of video.
- the external encoder comprises a video codec.
- the methods of processing image data described herein may be performed on a batch of video data, e.g. a complete video file for a movie or the like, or on a stream of video data.
- a computing device comprising:
- a computer program product arranged, when executed on a computing device comprising a process or memory, to perform any of the method of preprocessing image data described above.
- Figure 1 is a schematic diagram showing a method of processing image data, according to embodiments.
- Image or video input data is pre-processed by a conditional 'precoder' prior to passing to an external image or video codec.
- the embodiments depicted are applicable to batch processing, i.e. processing a group of images or video frames together without delay constraints (e.g. an entire video sequence), as well as to stream processing, i.e. processing only a limited subset of a stream of images or video frames, or even a select subset of a single image, e.g. due to delay or buffering constraints.
- the method depicted in Figure 1 includes deep conditional precoding with quality-rate score optimization (and optional resizing) within the transmission pipeline.
- all components in the transmission pipeline take codec settings Q as input. In alternative embodiments, some of the components do not take the codec settings as an input.
- Embodiments comprise a deep conditional precoding model that processes input image or video frames.
- the deep conditional precoding (and optional post-processing) depicted in Figure 1 can comprise any combination of learnable weights locally or globally connected in a network with a non-linear activation function.
- An example of such weights is shown in Figure 2(a) and an associated example in Figure 2(b) depicts global connectivity between weights and inputs. That is, Figure 2(a) shows a combination of inputs x 0 , ..., x 3 with weight coefficients ⁇ and non-linear activation function g(), and Figure 2(b) is a schematic diagram showing layers of interconnected activations and weights, forming an artificial neural network with global connectivity.
- Figure 2(c) An instantiation of local connectivity between weights and inputs is shown in Figure 2(c) for a 2D dilated convolution [1], 3 ⁇ 3 kernel, and dilation rate of 2.
- Figure 2(c) is a schematic diagram of 2D dilated convolutional layer with local connectivity.
- Figure 2(d) is a schematic diagram of back-propagation of errors 8 from an intermediate layer (right hand side of Figure 2(d) ) to the previous intermediate layer using gradient descent.
- Figure 3 An example of the deep conditional precoding model is shown in Figure 3 . It consists of a series of conditional convolutional layers and elementwise parametric ReLu (pReLu) layers of weights and activations. As such, Figure 3 shows a cascade of conditional convolutional and parametric ReLu (pReLu) layers mapping input pixel groups to transformed output pixel groups. All layers receive codec settings as input, along with the representation from the previous layer. There is also an optional skip connection between the input and output layer. Each conditional convolution takes the output of the preceding layer as input (with the first layer receiving the image as input), along with intended user settings for the external image or video codec, encoded as a numerical representation.
- these user settings can include but are not limited to quality factor or discrete cosine transform (DCT) block size.
- DCT discrete cosine transform
- these user settings can include but are not limited to constant rate factor (CRF), quantization parameter (QP), maximum bitrate or preset setting.
- Figure 4 is a schematic diagram of a conditional convolutional layer for the case of JPEG encoding.
- the layer receives the quality factor as input which is quantized and one-hot encoded.
- the one hot encoded vector is then mapped to intermediate representations w and b, which respectively weight and bias the channels of the output of the dilated convolutional layer z.
- the user selects a JPEG quality factor, which is quantized and one-hot encoded.
- the one-hot encoding is then mapped via linear or non-linear functions, such as densely connected layers (following the connectivity illustrated in Figure 2(b) ), to vector representations. These vector representations are then used to weight and bias the output of a dilated convolution - thus conditioning the dilated convolution on the user settings.
- FIG. 2(c) An example of the connectivity per dilated convolution is illustrated in Figure 2(c) .
- the dilation rate spacing between each learnable weight in the kernel
- kernel size number of learnable weights in the kernel per dimension
- stride step per dimension in the convolution operation
- the entirety of the series of dilated convolutional layers and activation functions can be trained end-to-end based on back-propagation of errors for the output layer backwards using gradient descent methods, as illustrated in Figure 2(d) .
- Figure 5 is a schematic diagram showing training of deep conditional precoding for intra-frame coding, where s represents the scale factor for resizing and Q represents the input codec settings.
- the discriminator and precoder are trained iteratively and the perceptual model can also be trained iteratively with the precoder, or pre-trained and frozen.
- the guidance image input x ⁇ to the discriminator refers to a linearly downscaled, compressed and upscaled representation of x.
- the post-processing refers to a simple linear (non-parametric) upscaling in this example.
- the precoding is trained in a manner that balances perceptual quality of the post-decoded output with fidelity to the original image or frame.
- the precoding is trained iteratively via backpropagation and any variation of gradient descent, e.g. as shown in Figure 2(d) .
- Parameters of the learning process such as the learning rate, the use of dropout and other regularization options to stabilize the training and convergence process are applied.
- the presented training framework assumes that post-processing only constitutes a simple linear resizing.
- the framework comprises a linear or non-linear weighted combination of loss functions for training the deep conditional precoding.
- the loss functions used will now be described.
- the distortion loss is derived as a function of a perceptual model, and optimized over the precoder weights, in order to match or maximize the perceptual quality of the post-decoded output x ⁇ over the original input x .
- the perceptual model is a parametric model that estimates the perceptual quality of the post-decoded output x ⁇ .
- the perceptual model can be configured as an artificial neural network with weights and activation functions and connectivity (e.g. as described above with reference to Figures 2(a)-2(d) ). This perceptual model produces a reference or non-reference based score for quality; reference based scores compare the quality of x ⁇ to x , whereas non-reference based scores produce a blind image quality assessment of x ⁇ .
- the perceptual model can optionally approximate non-differentiable perceptual score functions, including VIF, ADM2 and VMAF, with continuous differentiable functions.
- the perceptual model can also be trained to output human rater scores, including MOS or distributions over ACR values.
- the example shown in Figure 5 represents a non-reference based instantiation trained to output the distribution over ACR values, however it will be understood that reference-based frameworks may be used in other examples.
- the perceptual model can either be pre-trained or trained iteratively with the deep conditional precoding by minimizing perceptual loss and alternately or sequentially respectively.
- the perceptual loss is a function of the difference between the reference (human-rater) quality scores and model-predicted quality scores over a range of inputs.
- the distortion loss can thus be defined between x ⁇ and x , as a linear or non-linear function of the intermediate activations of selected layers of the perceptual model, up to the output reference or non-reference based scores. Additionally, in order to ensure faithful reconstruction of the input x, the distortion loss is combined with a pixel-wise loss directly between the input x and x ⁇ , such as mean absolute error (MAE) or mean squared error (MSE), and optionally a structural similarity loss, based on SSIM or MSSIM.
- MAE mean absolute error
- MSE mean squared error
- the adversarial loss is optimized over the precoder weights, in order to ensure that the post-decoded output x ⁇ , which is generated via the precoder, lies on the natural image manifold.
- the adversarial loss is formulated by modelling the precoder as a generator and adding a discriminator into the framework, which in the example shown in Figure 5 corresponds to the generative adversarial network (GAN) setup [2].
- GAN generative adversarial network
- the discriminator receives the original input frames, represented by x and the post-decoded output x ⁇ as input, which can respectively be referred to as 'real' and 'fake' (or 'artificial') data.
- the discriminator is trained to distinguish between the 'real' and 'fake' data with loss .
- the precoder is trained with to fool the discriminator into classifying the 'fake' data as 'real'.
- the discriminator and precoder are trained alternately with and respectively, with additional constraints such as gradient clipping depending on the GAN variant.
- IPM integral probability metric
- the loss functions can be patch-based (i.e. evaluated between local patches of x and x ⁇ ) or can be image-based (i.e. evaluated between whole images).
- the discriminator is configured with conditional convolutional layers (e.g. as described above with reference to Figures 3 and 4 ).
- An additional guidance image or frame x ⁇ is passed to the discriminator, which can represent a linear downscaled, upscaled and compressed representation of x , following the same scaling and codec settings as x ⁇ .
- the discriminator can thus learn to distinguish between x , x ⁇ and x ⁇ , whilst the precoder can learn to generate representations that post-decoding and scaling will be perceptually closer to x than x ⁇ .
- the discriminator is depicted in the example of Figure 5 as receiving the encoder settings Q , in alternative embodiments the encoder settings are not input to the discriminator. In any case, the discriminator may still be configured to distinguish between 'real' and 'artificial' data, corresponding to the original image x and the post-decoded output image x ⁇ .
- the noise loss component is optimized over the precoder weights and acts as a form of regularization, in order to further ensure that the precoder is trained such that the post-decoded output is a denoised representation of the input.
- Examples of noise include aliasing artefacts (e.g. jagging or ringing) introduced by downscaling in the precoder, as well as additional codec artefacts (e.g. blocking) introduced by the virtual codec during training to emulate a standard video or image codec that performs lossy compression.
- An example of the noise loss component is total variation denoising, which is effective at removing noise while preserving edges.
- the rate loss is an optional loss component that is optimized over the precoder weights, in order to constrain the rate (number of bits or bitrate) of the precoder output, as estimated by a virtual codec module.
- the virtual codec module depicted in Figure 5 emulates a standard image or video codec that performs lossy compression and primarily consists of a frequency transform component, a quantization and entropy encoding component and a dequantization and inverse transform component.
- the codec module takes as input both the precoder output and any associated codec settings (e.g. CRF, preset) that the precoder itself is conditioned on (e.g. via the instantiated conditional convolutional layers depicted in Figure 4 ).
- the frequency transform component of the virtual codec can be any variant of discrete sine or cosine transform or wavelet transform, or an atom-based decomposition.
- the dequantization and inverse transform component can convert the transform coefficients back into approximated pixel values.
- the main source of loss for the virtual codec module comes from the quantization component, which emulates any multi-stage deadzone or non-deadzone quantizer. Any non-differentiable parts of the standard codec are approximated with continuous differentiable alternatives; one such example is the rounding operation in quantization, which can be approximated with additive uniform noise of support width equal to 1. In this way, the entire virtual codec module is end-to-end continuously differentiable.
- the entropy coding component represents a continuously differentiable approximation to a standard Huffman, arithmetic or runlength encoder, or any combination of those that is also made context adaptive, i.e.
- the entropy coding and other virtual codec components can be made learnable, with an artificial neural network or similar, and jointly trained with the precoding or pre-trained to maximize the likelihood on the frequency transformed and quantized precoder representations.
- a given lossy JPEG, MPEG or AOMedia open encoder can be used to provide the actual rate and compressed representations as reference, which the virtual codec can be trained to replicate. In both cases, training of the artificial neural network parameters can be performed with backpropagation and gradient descent methods.
- the discriminator and deep conditional precoding may be trained alternately. This can also be true for the perceptual model and deep video precoding (or otherwise the perceptual model can be pre-trained and weights frozen throughout precoder training). After training one component, its weights are updated and it is frozen and the other component is trained. This weight update and interleaved training improves both and allows for end-to-end training and iterative improvement both during the training phase. The number of iterations that a component is trained before being frozen, n ⁇ 1. For the discriminator-precoding pair, this will depend on the GAN loss formulation and whether one seeks to train the discriminator to optimality. Furthermore, the training can continue online and at any time during the system's operation.
- a utilized video codec fully-compliant to the H.264/AVC standard was used, with the source code being the JM19.0 reference software of the HHI/Fraunhofer repository [21].
- the source material comprised an infra-red sequence of images with 12-bit dynamic range, but similar results have been obtained with visual image sequences or videos in full HD or ultra-HD resolution and any dynamic range for the input pixel representations.
- bitrate-controlled test the used bitrates were: ⁇ 64, 128, 256, 512, 1024 ⁇ kbps.
- QP-controlled test the used QP values were within the range: ⁇ 20,44 ⁇ .
- FIG. 6 shows the bitrate-vs-PSNR results for a sequence of infra-red images, with AVC encoding under rate (left) and QP control (right). The average BD-rate gain is 62% (see [22] for a definition of BD-rate). For these encoding tests, 25 fps has been assumed.
- bitrates of all solutions should be divided appropriately (e.g., by 2.5).
- bitrate e.g., 0.8dB-3.4dB improvement in PSNR. This occurs for both types of encodings (bitrate and QP control).
- the methods described herein can be realized with the full range of options and adaptivity described in the previous examples, and all such options and their adaptations are covered by this disclosure.
- the methods described herein can shrink the input to 10%-40% of the frame size of the input frames, which means that the encoder processes a substantially smaller number of pixels and is therefore 2-6 times faster than the encoder of the full resolution infrared image sequence. This offers additional benefits in terms of increased energy autonomy for video monitoring under battery support, vehicle/mobile/airborne visual monitoring systems, etc.
- Figure 7 shows a method 700 for preprocessing image data using a preprocessing network comprising a set of inter-connected learnable weights.
- the method 700 may be performed by a computing device, according to embodiments.
- the method 700 may be performed at least in part by hardware and/or software.
- the preprocessing is performed prior to encoding the preprocessed image data with an external encoder.
- the preprocessing network is configured to take as an input encoder configuration data representing one or more configuration settings of the external encoder.
- the weights of the preprocessing network are dependent upon (i.e. conditioned on) the one or more configuration settings of the external encoder.
- image data from one or more images is received at the preprocessing network.
- the image data may be retrieved from storage (e.g.
- the image data is processed using the preprocessing network (e.g. by applying the weights of the preprocessing network to the image data) to generate an output pixel representation for encoding with the external encoder.
- the method 700 comprises encoding the output pixel representation, e.g. using the external encoder.
- the encoded output pixel representation may be transmitted, for example to a display device for decoding and subsequent display.
- Embodiments of the disclosure include the methods described above performed on a computing device, such as the computing device 800 shown in Figure 8 .
- the computing device 800 comprises a data interface 801, through which data can be sent or received, for example over a network.
- the computing device 800 further comprises a processor 802 in communication with the data interface 801, and memory 803 in communication with the processor 802.
- the computing device 800 can receive data, such as image data or video data, via the data interface 801, and the processor 802 can store the received data in the memory 803, and process it so as to perform the methods of described herein, including preprocessing image data prior to encoding using an external encoder, and optionally encoding the preprocessed image data.
- Each device, module, component, machine or function as described in relation to any of the examples described herein may comprise a processor and/or processing system or may be comprised in apparatus comprising a processor and/or processing system.
- One or more aspects of the embodiments described herein comprise processes performed by apparatus.
- the apparatus comprises one or more processing systems or processors configured to carry out these processes.
- embodiments may be implemented at least in part by computer software stored in (non-transitory) memory and executable by the processor, or by hardware, or by a combination of tangibly stored software and hardware (and tangibly stored firmware).
- Embodiments also extend to computer programs, particularly computer programs on or in a carrier, adapted for putting the above described embodiments into practice.
- the program may be in the form of non-transitory source code, object code, or in any other non-transitory form suitable for use in the implementation of processes according to embodiments.
- the carrier may be any entity or device capable of carrying the program, such as a RAM, a ROM, or an optical memory device, etc.
- the preprocessing method comprises a set of weights, biases and offset terms inter-connected in a network (termed as "preprocessing network") that ingests: (i) the input pixels from the single or plurality of images; (ii) the encoder configuration settings corresponding to bitrate, quantization or target fidelity of the encoding.
- the utilised preprocessing network is configured to convert input pixels to an output pixel representation such that: weights and offset or bias terms of the network are conditioned on the aforementioned bitrate, quantization or fidelity settings and the weights are trained end-to-end with back-propagation of errors from outputs to inputs.
- the output errors are computed via a cost function that estimates the image or video frame error after encoding and decoding the output pixel representation of the preprocessing network with the aforementioned external encoder using bitrate, quantization or fidelity settings close to, or identical, to the ones used as inputs to the network.
- the utilized cost function comprises multiple terms that, for the output after decoding, express: image or video frame noise estimates, or functions or training data that estimate the rate to encode the image or video frame, or estimates, functions or training data expressing the perceived quality of the output from human viewers, or any combinations of these terms.
- the preprocessing network is trained from scratch with the utilized cost function after a random initialization, or refined from a previous training, for any number of iterations prior to deployment (offline) based on training data or, optionally, have their training fine-tuned for any number of iterations based on data obtained during the preprocessing network and encoder-decoder operation during deployment (online).
- the resolution of the pixel data is increased or decreased in accordance to a given upscaling or downscaling ratio that can be an integer or fractional number and also includes ratio of 1 (unity) that corresponds to no resolution change.
- weights in the preprocessing network are used, in order to construct a function of the input over single or multiple layers of a convolutional architecture, with each layer receiving outputs of the previous layers.
- the outputs of each layer of the preprocessing network are passed through a non-linear parametric linear rectifier function (pReLU) or other non-linear activation function.
- pReLU non-linear parametric linear rectifier function
- the convolutional layers of the preprocessing architecture include dilation operators that expand the receptive field of the convolutional operation per layer.
- the training of the preprocessing network weights is done with the addition of regularization methods that control the network capacity, via hard or soft constraints or normalization techniques on the layer weights or activations that reduces the generalization error.
- cost functions are used that express the fidelity to the input images based on reference-based quality metrics that include one or more of: elementwise loss functions such as mean squared error (MSE); a structural similarity index metric (SSIM); a visual information fidelity metric (VIF), for example from the published work of H. Sheikh and A. Bovik entitled “Image Information and Visual Quality "; a detail loss metric (DLM), for example from the published work of S. Li, F. Zhang, L. Ma, and K. Ngan entitled “Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments "; or variants and combinations of these metrics.
- elementwise loss functions such as mean squared error (MSE); a structural similarity index metric (SSIM); a visual information fidelity metric (VIF), for example from the published work of H. Sheikh and A. Bovik entitled “Image Information and Visual Quality "
- a detail loss metric (DLM) for example from the published work of S. Li, F. Zhang, L. Ma, and
- cost functions are used that express or estimate quality scores attributed to the output images from human viewers.
- cost functions are used that are formulated via an adversarial learning framework, in which the preprocessing network is encouraged to generate output pixel representations that reside on the natural image manifold (and optionally encouraged to reside away from another non-representative manifold).
- the provided image or video encoder parameters include quantization or fidelity values per input image, or constant rate factor (CRF) values from a video encoder, or bit allocation budgets per input image, or any combination of these.
- CRF constant rate factor
- the utilized encoder is a standards-based image or video encoder such as an ISO JPEG or ISO MPEG standard encoder, or a proprietary or royalty-free encoder, such as, but not limited to, an AOMedia encoder.
- high resolution and low resolution image or video pairs are provided and the low resolution image is upscaled and optimized to improve and/or match quality or rate to the high resolution image.
- the training of the preprocessing network weights and any adjustment to the cost functions are performed at frequent or in-frequent intervals with new measurements from quality, bitrate, perceptual quality scores from humans, or encoded image data from external image or video encoders, and the updated weights and cost functions replace the previously-utilized ones.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Discrete Mathematics (AREA)
- Databases & Information Systems (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Controls And Circuits For Display Device (AREA)
- Image Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202062957286P | 2020-01-05 | 2020-01-05 | |
US202062962971P | 2020-01-18 | 2020-01-18 | |
US202062962970P | 2020-01-18 | 2020-01-18 | |
US202062971994P | 2020-02-09 | 2020-02-09 | |
US202063012339P | 2020-04-20 | 2020-04-20 | |
US202063023883P | 2020-05-13 | 2020-05-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3846475A1 true EP3846475A1 (fr) | 2021-07-07 |
Family
ID=72709153
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20199349.0A Active EP3846478B1 (fr) | 2020-01-05 | 2020-09-30 | Traitement de données d'image |
EP20199347.4A Active EP3846477B1 (fr) | 2020-01-05 | 2020-09-30 | Prétraitement de données d'image |
EP20199344.1A Pending EP3846475A1 (fr) | 2020-01-05 | 2020-09-30 | Prétraitement de données d'image |
EP20199345.8A Active EP3846476B1 (fr) | 2020-01-05 | 2020-09-30 | Pré-traitement des données d'image |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20199349.0A Active EP3846478B1 (fr) | 2020-01-05 | 2020-09-30 | Traitement de données d'image |
EP20199347.4A Active EP3846477B1 (fr) | 2020-01-05 | 2020-09-30 | Prétraitement de données d'image |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20199345.8A Active EP3846476B1 (fr) | 2020-01-05 | 2020-09-30 | Pré-traitement des données d'image |
Country Status (2)
Country | Link |
---|---|
US (4) | US11252417B2 (fr) |
EP (4) | EP3846478B1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113920404A (zh) * | 2021-11-09 | 2022-01-11 | 北京百度网讯科技有限公司 | 训练方法、图像处理方法、装置、电子设备以及存储介质 |
US11540798B2 (en) | 2019-08-30 | 2023-01-03 | The Research Foundation For The State University Of New York | Dilated convolutional neural network system and method for positron emission tomography (PET) image denoising |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12026227B2 (en) * | 2019-11-08 | 2024-07-02 | Unknot Inc. | Systems and methods for editing audiovisual data using latent codes from generative networks and models |
FR3113219B1 (fr) * | 2020-07-29 | 2022-07-29 | Sagemcom Energy & Telecom Sas | Procédé de transmission de mesures permettant de réduire la charge du réseau |
US20230407239A1 (en) * | 2020-11-13 | 2023-12-21 | Teewinot Life Sciences Corporation | Tetrahydrocannabinolic acid (thca) synthase variants, and manufacture and use thereof |
US20230412808A1 (en) * | 2020-11-30 | 2023-12-21 | Intel Corporation | Determining adaptive quantization matrices using machine learning for video coding |
US12015785B2 (en) * | 2020-12-04 | 2024-06-18 | Ofinno, Llc | No reference image quality assessment based decoder side inter prediction |
US20220201283A1 (en) * | 2020-12-21 | 2022-06-23 | Ofinno, Llc | Chroma Prediction from Luma for Video Coding |
US11582453B2 (en) * | 2021-01-11 | 2023-02-14 | Tencent America LLC | Multi-model selection for neural network based tools in video coding |
JP2022145001A (ja) * | 2021-03-19 | 2022-10-03 | キヤノン株式会社 | 画像処理装置、画像処理方法 |
US11849118B2 (en) * | 2021-04-30 | 2023-12-19 | Tencent America LLC | Content-adaptive online training with image substitution in neural image compression |
US11924464B2 (en) | 2021-09-20 | 2024-03-05 | Qualcomm Incorporated | Processing video data picture size change request and notification messages |
KR20240070513A (ko) * | 2021-09-20 | 2024-05-21 | 퀄컴 인코포레이티드 | 비디오 데이터 픽처 사이즈 변경 요청 및 통지 메시지들의 프로세싱 |
US20230118010A1 (en) * | 2021-10-15 | 2023-04-20 | Bitmovin, Inc. | Scalable Per-Title Encoding |
WO2023069337A1 (fr) * | 2021-10-18 | 2023-04-27 | Op Solutions, Llc | Systèmes et procédés d'optimisation d'une fonction de perte pour un codage vidéo pour des machines |
GB2612306A (en) * | 2021-10-25 | 2023-05-03 | Sony Interactive Entertainment Inc | Streaming system and method |
CN114363631B (zh) * | 2021-12-09 | 2022-08-05 | 慧之安信息技术股份有限公司 | 一种基于深度学习的音视频处理方法和装置 |
US12026859B2 (en) * | 2021-12-09 | 2024-07-02 | Google Llc | Compression-aware pre-distortion of geometry and color in distributed graphics display systems |
US11599972B1 (en) * | 2021-12-22 | 2023-03-07 | Deep Render Ltd. | Method and system for lossy image or video encoding, transmission and decoding |
CN114677684A (zh) * | 2022-03-23 | 2022-06-28 | 平安普惠企业管理有限公司 | 扭曲图像校正方法、装置、设备及计算机可读存储介质 |
KR20230148579A (ko) * | 2022-04-18 | 2023-10-25 | 한국전자통신연구원 | 머신 비전을 위한 영상 압축 방법 및 장치 |
CN115082500B (zh) * | 2022-05-31 | 2023-07-11 | 苏州大学 | 基于多尺度与局部特征引导网络的角膜神经纤维分割方法 |
WO2024074231A1 (fr) * | 2022-10-04 | 2024-04-11 | Nokia Technologies Oy | Procédé, appareil et produit programme d'ordinateur pour le traitement d'image et de vidéo faisant appel à des branches de réseau de neurones artificiels présentant différents champs de réception |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6836512B2 (en) | 2000-10-11 | 2004-12-28 | Koninklijke Philips Electronics N.V. | Spatial scalability for fine granular video encoding |
US7336720B2 (en) | 2002-09-27 | 2008-02-26 | Vanguard Software Solutions, Inc. | Real-time video coding/decoding |
US20090304071A1 (en) * | 2008-06-06 | 2009-12-10 | Apple Inc. | Adaptive application of entropy coding methods |
US20120033040A1 (en) * | 2009-04-20 | 2012-02-09 | Dolby Laboratories Licensing Corporation | Filter Selection for Video Pre-Processing in Video Applications |
US8165197B2 (en) | 2007-12-05 | 2012-04-24 | Sony Corporation | Method and apparatus for video upscaling |
US9100660B2 (en) | 2011-08-09 | 2015-08-04 | Dolby Laboratories Licensing Corporation | Guided image up-sampling in video coding |
WO2016132152A1 (fr) * | 2015-02-19 | 2016-08-25 | Magic Pony Technology Limited | Interpolation de données visuelles |
GB2548749A (en) * | 2015-02-19 | 2017-09-27 | Magic Pony Tech Ltd | Online training of hierarchical algorithms |
WO2018229490A1 (fr) * | 2017-06-16 | 2018-12-20 | Ucl Business Plc | Système et procédé mis en œuvre par ordinateur permettant de segmenter une image |
WO2019009449A1 (fr) * | 2017-07-06 | 2019-01-10 | 삼성전자 주식회사 | Procédé et dispositif de codage/décodage d'image |
CN110248190A (zh) * | 2019-07-03 | 2019-09-17 | 西安交通大学 | 一种基于压缩感知的多层残差系数图像编码方法 |
WO2019197712A1 (fr) * | 2018-04-09 | 2019-10-17 | Nokia Technologies Oy | Appareil, procédé, et programme informatique destiné au codage et au décodage vidéo |
Family Cites Families (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5086479A (en) * | 1989-06-30 | 1992-02-04 | Hitachi, Ltd. | Information processing system using neural network learning function |
CA2040903C (fr) * | 1991-04-22 | 2003-10-07 | John G. Sutherland | Reseaux neuronaux |
US6477201B1 (en) | 1998-05-22 | 2002-11-05 | Sarnoff Corporation | Content-adaptive compression encoding |
JP4340084B2 (ja) * | 2003-03-11 | 2009-10-07 | パナソニック株式会社 | 送信装置および送信方法 |
JP2009049979A (ja) * | 2007-07-20 | 2009-03-05 | Fujifilm Corp | 画像処理装置、画像処理方法、画像処理システム、及びプログラム |
KR101307552B1 (ko) | 2008-08-12 | 2013-09-12 | 엘지디스플레이 주식회사 | 액정표시장치와 그 구동방법 |
DE102010045008B4 (de) * | 2010-09-10 | 2013-02-28 | Brose Fahrzeugteile Gmbh & Co. Kommanditgesellschaft, Hallstadt | Kapazitiver Abstandssensor |
DE112012006541B4 (de) | 2012-07-27 | 2020-03-19 | Hewlett-Packard Development Company, L.P. | Verfahren zur Videokompression |
JP6123342B2 (ja) | 2013-02-20 | 2017-05-10 | ソニー株式会社 | 表示装置 |
KR102085270B1 (ko) | 2013-08-12 | 2020-03-05 | 삼성전자 주식회사 | 가장 작은 왜곡 값을 갖는 해상도를 선택하는 이미지 처리 방법과 상기 방법을 수행하는 장치들 |
US20160105698A1 (en) * | 2014-10-09 | 2016-04-14 | FiveByFive, Inc. | Channel-based live tv conversion |
US10726560B2 (en) * | 2014-10-31 | 2020-07-28 | Fyusion, Inc. | Real-time mobile device capture and generation of art-styled AR/VR content |
EP3278559B1 (fr) | 2015-03-31 | 2021-05-05 | Magic Pony Technology Limited | Apprentissage de processus vidéo bout-à-bout |
US20170193335A1 (en) * | 2015-11-13 | 2017-07-06 | Wise Athena Inc. | Method for data encoding and accurate predictions through convolutional networks for actual enterprise challenges |
JP6636323B2 (ja) | 2015-12-28 | 2020-01-29 | 株式会社日立エルジーデータストレージ | 調光器及びこれを用いた映像表示装置 |
KR102170550B1 (ko) | 2016-05-24 | 2020-10-29 | 노키아 테크놀로지스 오와이 | 미디어 콘텐츠를 인코딩하는 방법, 장치 및 컴퓨터 프로그램 |
US20180121791A1 (en) * | 2016-11-03 | 2018-05-03 | Qualcomm Incorporated | Temporal difference estimation in an artificial neural network |
JP6929047B2 (ja) * | 2016-11-24 | 2021-09-01 | キヤノン株式会社 | 画像処理装置、情報処理方法及びプログラム |
US10748062B2 (en) * | 2016-12-15 | 2020-08-18 | WaveOne Inc. | Deep learning based adaptive arithmetic coding and codelength regularization |
CN106933326B (zh) * | 2017-03-10 | 2019-08-30 | Oppo广东移动通信有限公司 | 一种动态调节终端的节能等级的方法、装置及移动终端 |
US20180373986A1 (en) * | 2017-06-26 | 2018-12-27 | QbitLogic, Inc. | Machine learning using dynamic multilayer perceptrons |
US11222255B2 (en) * | 2017-08-17 | 2022-01-11 | Samsung Electronics Co., Ltd. | Neuromorphic processing apparatus |
WO2019079198A1 (fr) * | 2017-10-16 | 2019-04-25 | Illumina, Inc. | Classification de site de raccordement basée sur un apprentissage profond |
US11003992B2 (en) * | 2017-10-16 | 2021-05-11 | Facebook, Inc. | Distributed training and prediction using elastic resources |
CN111655116A (zh) * | 2017-10-30 | 2020-09-11 | 公益财团法人癌研究会 | 图像诊断辅助装置、资料收集方法、图像诊断辅助方法及图像诊断辅助程序 |
CN116312419A (zh) * | 2017-12-15 | 2023-06-23 | 谷歌有限责任公司 | 自适应显示器亮度调整 |
WO2019182701A1 (fr) * | 2018-03-22 | 2019-09-26 | Futurewei Technologies, Inc. | Métrique multimédia immersive pour champ de vision |
US20190045195A1 (en) * | 2018-03-30 | 2019-02-07 | Intel Corporation | Reduced Partitioning and Mode Decisions Based on Content Analysis and Learning |
EP3794828A1 (fr) | 2018-05-16 | 2021-03-24 | Isize Limited | Codage et décodage de données d'image |
US20190370638A1 (en) * | 2018-06-01 | 2019-12-05 | Thales Canada Inc | System for and method of data encoding and/or decoding using neural networks |
US11004183B2 (en) * | 2018-07-10 | 2021-05-11 | The Board Of Trustees Of The Leland Stanford Junior University | Un-supervised convolutional neural network for distortion map estimation and correction in MRI |
US11328203B2 (en) * | 2018-07-30 | 2022-05-10 | Salesforce.Com, Inc. | Capturing organization specificities with embeddings in a model for a multi-tenant database system |
US10869036B2 (en) * | 2018-09-18 | 2020-12-15 | Google Llc | Receptive-field-conforming convolutional models for video coding |
CN109389556B (zh) * | 2018-09-21 | 2023-03-21 | 五邑大学 | 一种多尺度空洞卷积神经网络超分辨率重构方法及装置 |
US10848765B2 (en) * | 2018-12-11 | 2020-11-24 | Google Llc | Rate/distortion/RDcost modeling with machine learning |
WO2020140047A1 (fr) * | 2018-12-28 | 2020-07-02 | Nvidia Corporation | Détection de distance à obstacle dans des applications de machine autonome |
US11240492B2 (en) * | 2019-01-22 | 2022-02-01 | Apple Inc. | Neural network based residual coding and prediction for predictive coding |
KR20200112378A (ko) * | 2019-03-22 | 2020-10-05 | 삼성전자주식회사 | 두 개의 디스플레이 면을 갖는 전자 장치 및 그의 디스플레이 운영 방법 |
US11562500B2 (en) * | 2019-07-24 | 2023-01-24 | Squadle, Inc. | Status monitoring using machine learning and machine vision |
KR20190100097A (ko) * | 2019-08-08 | 2019-08-28 | 엘지전자 주식회사 | 디스플레이 상의 화면의 화질 또는 화면 내용을 추론하여 화면을 조정하는 방법, 제어기 및 시스템 |
-
2020
- 2020-09-30 EP EP20199349.0A patent/EP3846478B1/fr active Active
- 2020-09-30 US US17/039,563 patent/US11252417B2/en active Active
- 2020-09-30 EP EP20199347.4A patent/EP3846477B1/fr active Active
- 2020-09-30 EP EP20199344.1A patent/EP3846475A1/fr active Pending
- 2020-09-30 US US17/039,453 patent/US11394980B2/en active Active
- 2020-09-30 US US17/039,605 patent/US11223833B2/en active Active
- 2020-09-30 US US17/039,526 patent/US11172210B2/en active Active
- 2020-09-30 EP EP20199345.8A patent/EP3846476B1/fr active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6836512B2 (en) | 2000-10-11 | 2004-12-28 | Koninklijke Philips Electronics N.V. | Spatial scalability for fine granular video encoding |
US7336720B2 (en) | 2002-09-27 | 2008-02-26 | Vanguard Software Solutions, Inc. | Real-time video coding/decoding |
US8165197B2 (en) | 2007-12-05 | 2012-04-24 | Sony Corporation | Method and apparatus for video upscaling |
US20090304071A1 (en) * | 2008-06-06 | 2009-12-10 | Apple Inc. | Adaptive application of entropy coding methods |
US20120033040A1 (en) * | 2009-04-20 | 2012-02-09 | Dolby Laboratories Licensing Corporation | Filter Selection for Video Pre-Processing in Video Applications |
US9100660B2 (en) | 2011-08-09 | 2015-08-04 | Dolby Laboratories Licensing Corporation | Guided image up-sampling in video coding |
WO2016132152A1 (fr) * | 2015-02-19 | 2016-08-25 | Magic Pony Technology Limited | Interpolation de données visuelles |
GB2548749A (en) * | 2015-02-19 | 2017-09-27 | Magic Pony Tech Ltd | Online training of hierarchical algorithms |
WO2018229490A1 (fr) * | 2017-06-16 | 2018-12-20 | Ucl Business Plc | Système et procédé mis en œuvre par ordinateur permettant de segmenter une image |
US20200167930A1 (en) * | 2017-06-16 | 2020-05-28 | Ucl Business Ltd | A System and Computer-Implemented Method for Segmenting an Image |
WO2019009449A1 (fr) * | 2017-07-06 | 2019-01-10 | 삼성전자 주식회사 | Procédé et dispositif de codage/décodage d'image |
US20200145661A1 (en) * | 2017-07-06 | 2020-05-07 | Samsung Electronics Co., Ltd. | Method for encoding/decoding image, and device therefor |
WO2019197712A1 (fr) * | 2018-04-09 | 2019-10-17 | Nokia Technologies Oy | Appareil, procédé, et programme informatique destiné au codage et au décodage vidéo |
CN110248190A (zh) * | 2019-07-03 | 2019-09-17 | 西安交通大学 | 一种基于压缩感知的多层残差系数图像编码方法 |
Non-Patent Citations (25)
Title |
---|
A. JOLICOEUR-MARTINEAU, THE RELATIVISTIC DISCRIMINATOR: A KEY ELEMENT MISSING FROM STANDARD GAN, 2018 |
BOYCE, JILL ET AL., TECHNIQUES FOR LAYERED VIDEO ENCODING AND DECODING |
DARYEHUDAALFRED M. BRUCKSTEIN, IMPROVING LOW BIT-RATE VIDEO CODING USING SPATIO-TEMPORAL DOWN-SCALING, 2014 |
DONGJIEYAN YE: "Adaptive downsampling for high-definition video coding", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 24.3, 2014, pages 480 - 488 |
DOUMAPETERMOTOYUKI KOIKE, METHOD AND APPARATUS FOR VIDEO UPSCALING |
G. BJONTEGAARD: "Calculation of average PSNR differences between RD-curves", VCEG-M33, 2001 |
H. SHEIKHA. BOVIK, IMAGE INFORMATION AND VISUAL QUALITY |
HINTONGEOFFREY E.RUSLAN R. SALAKHUTDINOV: "Reducing the dimensionality of data with neural networks", SCIENCE, vol. 313.5786, 2006, pages 504 - 507 |
I. GOODFELLOWJ. POUGET-ABADIEM. MIRZAB. XUD. WARDE-FARLEYS. OZAIRA. COURVILLEY. BENGIO: "Generative adversarial nets", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2014 |
I. GULRAJANIF. AHMEDM. ARJOVSKYV. DUMOULINA. C. COURVILLE: "Improved training of wasserstein gans", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2017 |
K. SUEHRING, HHI AVC REFERENCE CODE REPOSITORY, ONLINE AT THE HHI WEBSITE |
M. ARJOVSKYS. CHINTALAL. BOTTOU, WASSERSTEIN GAN, 2017 |
MARTEMYANOVALEXEY ET AL., REAL-TIME VIDEO CODING/DECODING |
OZER: "Streaming Media Mag.", BUYERS' GUIDE TO VIDEO QUALITY METRICS, 29 March 2019 (2019-03-29) |
RIPPELORENLUBOMIR BOURDEV., REAL-TIME ADAPTIVE IMAGE COMPRESSION, 2017 |
S. LIF. ZHANGL. MAK. NGAN, IMAGE QUALITY ASSESSMENT BY SEPARATELY EVALUATING DETAIL LOSSES AND ADDITIVE IMPAIRMENTS |
SONG HAN ET AL: "Learning both Weights and Connections for Efficient Neural Networks", 30 October 2015 (2015-10-30), XP055396330, Retrieved from the Internet <URL:https://arxiv.org/pdf/1506.02626.pdf> [retrieved on 20170804] * |
SUGUAN-MING ET AL., GUIDED IMAGE UP-SAMPLING IN VIDEO CODING |
T. SALIMANSI. GOODFELLOWW. ZAREMBAV. CHEUNGA. RADFORDX. CHEN: "Improved techniques for training gans", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2016 |
THEISLUCAS ET AL., LOSSY IMAGE COMPRESSION WITH COMPRESSIVE AUTOENCODERS, 2017 |
VAN DEN OORDAARON ET AL.: "Conditional image generation with pixelcnn decoders", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2016 |
VAN DER SCHAARMIHAELAMAHESH BALAKRISHNAN., SPATIAL SCALABILITY FOR FINE GRANULAR VIDEO ENCODING |
WUCHAO-YUANNAYAN SINGHALPHILIPP KRAHENBUHL., VIDEO COMPRESSION THROUGH IMAGE INTERPOLATION, 2018 |
X. MAOQ. LIH. XIER. Y. K. LAUZ. WANGS. PAUL SMOLLEY: "Least squares generative adversarial networks", PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 2017 |
Y. MROUEHT. SERCU: "Fisher gan", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2017 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11540798B2 (en) | 2019-08-30 | 2023-01-03 | The Research Foundation For The State University Of New York | Dilated convolutional neural network system and method for positron emission tomography (PET) image denoising |
CN113920404A (zh) * | 2021-11-09 | 2022-01-11 | 北京百度网讯科技有限公司 | 训练方法、图像处理方法、装置、电子设备以及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20210211682A1 (en) | 2021-07-08 |
EP3846478B1 (fr) | 2023-09-20 |
EP3846476B1 (fr) | 2023-09-20 |
US20210211741A1 (en) | 2021-07-08 |
US11252417B2 (en) | 2022-02-15 |
US20210211739A1 (en) | 2021-07-08 |
EP3846477B1 (fr) | 2023-05-03 |
EP3846477A1 (fr) | 2021-07-07 |
EP3846476A1 (fr) | 2021-07-07 |
US20210211684A1 (en) | 2021-07-08 |
US11394980B2 (en) | 2022-07-19 |
EP3846478A1 (fr) | 2021-07-07 |
US11223833B2 (en) | 2022-01-11 |
US11172210B2 (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11223833B2 (en) | Preprocessing image data | |
US11445222B1 (en) | Preprocessing image data | |
US20210211683A1 (en) | Machine learning video processing systems and methods | |
US11164340B2 (en) | Apparatus and method for performing artificial intelligence (AI) encoding and AI decoding on image | |
KR102126886B1 (ko) | 단계적 계층에서의 신호 인코딩, 디코딩 및 재구성 동안의 잔차 데이터의 압축해제 | |
KR102312337B1 (ko) | Ai 부호화 장치 및 그 동작방법, 및 ai 복호화 장치 및 그 동작방법 | |
KR20210018668A (ko) | 딥러닝 신경 네트워크를 사용하여 다운샘플링을 수행하는 이미지 처리 시스템 및 방법, 영상 스트리밍 서버 시스템 | |
GB2512827A (en) | Method and device for classifying samples of an image | |
US20240040160A1 (en) | Video encoding using pre-processing | |
US8442338B2 (en) | Visually optimized quantization | |
KR20220063063A (ko) | 인공지능 부호화 및 인공지능 복호화를 수행하기 위한 방법 및 장치 | |
US20220321879A1 (en) | Processing image data | |
US11854165B2 (en) | Debanding using a novel banding metric | |
US20230281755A1 (en) | Method and device for performing artificial intelligence encoding and artificial intelligence decoding | |
Giuseppe et al. | Quality Assessment of Deep-Learning-Based Image Compression | |
WO2024141694A1 (fr) | Procédé, appareil et produit-programme informatique pour traitement d'image et de vidéo | |
WO2024068190A1 (fr) | Procédé, appareil et produit programme d'ordinateur pour un traitement d'image et de vidéo | |
Hadizadeh | Saliency-guided wireless transmission of still images using SoftCast | |
WO2024074231A1 (fr) | Procédé, appareil et produit programme d'ordinateur pour le traitement d'image et de vidéo faisant appel à des branches de réseau de neurones artificiels présentant différents champs de réception |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20211217 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20220404 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Free format text: PREVIOUS MAIN CLASS: H04N0019850000 Ipc: H04N0021234300 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06N 3/045 20230101ALI20240107BHEP Ipc: H04N 19/177 20140101ALI20240107BHEP Ipc: H04N 19/154 20140101ALI20240107BHEP Ipc: H04N 19/124 20140101ALI20240107BHEP Ipc: H04N 19/103 20140101ALI20240107BHEP Ipc: G06N 3/088 20230101ALI20240107BHEP Ipc: G03G 15/00 20060101ALI20240107BHEP Ipc: H04N 21/2343 20110101AFI20240107BHEP |
|
INTG | Intention to grant announced |
Effective date: 20240206 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Free format text: CASE NUMBER: Effective date: 20240521 |