WO2020227179A1 - Amélioration vidéo à l'aide d'un réseau neuronal - Google Patents

Amélioration vidéo à l'aide d'un réseau neuronal Download PDF

Info

Publication number
WO2020227179A1
WO2020227179A1 PCT/US2020/031233 US2020031233W WO2020227179A1 WO 2020227179 A1 WO2020227179 A1 WO 2020227179A1 US 2020031233 W US2020031233 W US 2020031233W WO 2020227179 A1 WO2020227179 A1 WO 2020227179A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
neural network
resolution
network
enhanced
Prior art date
Application number
PCT/US2020/031233
Other languages
English (en)
Inventor
Silviu Stefan Andrei
Nataliya Shapovalova
Walterio Wolfgang Mayol Cuevas
Original Assignee
Amazon Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/403,367 external-priority patent/US11210769B2/en
Priority claimed from US16/403,386 external-priority patent/US11017506B2/en
Priority claimed from US16/403,355 external-priority patent/US11216917B2/en
Application filed by Amazon Technologies, Inc. filed Critical Amazon Technologies, Inc.
Publication of WO2020227179A1 publication Critical patent/WO2020227179A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Definitions

  • Streaming video is usually compressed to reduce bandwidth.
  • the quality of the compression and channel characteristics are determined by various factors including environmental factors and network congestion. These issues degrade the received image quality spatially and temporally inducing artifacts.
  • FIG. 1 is a diagram illustrating embodiments of an environment for enhancing images.
  • FIG. 2 is a diagram illustrating embodiments of an environment for enhancing images.
  • FIG. 4 illustrates embodiments of an image enhancement service or module to be used in inference that uses a more recurrent approach.
  • FIG. 5 illustrates embodiments of an image enhancement service or module during training.
  • FIG. 6 illustrates embodiments of a GAN of which part may be used to generate enhanced images.
  • FIG. 7 illustrates embodiments of a generator with filters of a GAN of which part may be used to generate enhanced images during training of the GAN.
  • FIG. 8 illustrates embodiments of a generator with filters of progressively trained GAN.
  • FIG. 9 illustrates embodiments of a generator with filters of progressively trained GAN.
  • FIG. 10 illustrates embodiments of a discriminator of a GAN.
  • FIG. 11 illustrates embodiments of an image enhancement service or module to be used in inference where the CNN to produce a higher resolution image is the generator of a GAN.
  • FIG. 12 is a flow diagram illustrating embodiments of a method for enhancing an image.
  • FIG. 13 a flow diagram illustrating embodiments of a method for enhancing an image.
  • FIG. 14 is a flow diagram illustrating embodiments of a method for enhancing an image.
  • FIG. 16 illustrates an example of a Pareto front.
  • FIG. 17 is a flow diagram illustrating a method for training a neural network using a Pareto front.
  • FIG. 20 illustrates an example data center that implements an overlay network on a network substrate using IP tunneling technology according to some embodiments.
  • FIG. 21 is a block diagram of an example provider network that provides a storage service and a hardware virtualization service to customers according to some embodiments.
  • FIG. 22 is a block diagram illustrating an example computer system that may be used in some embodiments.
  • FIG. 24 illustrates an example of an environment for implementing aspects in accordance with various embodiments.
  • FIG. 1 Various embodiments of methods, apparatus, systems, and non-transitory computer- readable storage media for enhancing one or more images are described.
  • embodiments for removing compression artifacts and increasing video image resolution based on training convolutional neural networks are detailed. These embodiments may operate on steaming video, stored video, or on static images, and may help with at least a variety of low-quality video problems such as : a) low quality video that has occurred due to environmental circumstances; b) low quality video that was purposefully created to reduce the bandwidth to save data transfer; c) older videos that have been encoded with less quality or lower resolution; d) etc.
  • FIG. 1 is a diagram illustrating embodiments of an environment for enhancing images from still images or from a video.
  • a provider network 100 provides users with the ability to utilize one or more of a variety of types of computing-related resources such as compute resources (e.g., executing virtual machine (VM) instances and/or containers, executing batch jobs, executing code without provisioning servers), data/storage resources (e.g., object storage, block-level storage, data archival storage, databases and database tables, etc.), network-related resources (e.g., configuring virtual networks including groups of compute resources, content delivery networks (CDNs), Domain Name Service (DNS)), application resources (e.g., databases, application build/deployment services), access policies or roles, identity policies or roles, machine images, routers and other data processing resources, etc.
  • compute resources e.g., executing virtual machine (VM) instances and/or containers, executing batch jobs, executing code without provisioning servers
  • data/storage resources e.g., object storage, block-
  • These and other computing resources may be provided as services, such as a hardware virtualization service that can execute compute instances, a storage service that can store data objects, etc.
  • the users (or “customers”) of provider networks 100 may utilize one or more user accounts that are associated with a customer account, though these terms may be used somewhat interchangeably depending upon the context of use.
  • Users may interact with a provider network 100 across one or more intermediate networks 106 (e.g., the internet) via one or more interface(s), such as through use of application programming interface (API) calls, via a console implemented as a website or application, etc.
  • API application programming interface
  • the interface(s) may be part of, or serve as a front-end to, a control plane of the provider network 100 that includes“backend” services supporting and enabling the services that may be more directly offered to customers.
  • virtualization technologies may be used to provide users the ability to control or utilize compute instances (e.g., a VM using a guest operating system (O/S) that operates using a hypervisor that may or may not further operate on top of an underlying host O/S, a container that may or may not operate in a VM, an instance that can execute on“bare metal” hardware without an underlying hypervisor), where one or multiple compute instances can be implemented using a single electronic device.
  • compute instances e.g., a VM using a guest operating system (O/S) that operates using a hypervisor that may or may not further operate on top of an underlying host O/S, a container that may or may not operate in a VM, an instance that can execute on“bare metal” hardware without an underlying hypervisor
  • a user may directly utilize a compute instance hosted by the provider network to perform a variety of computing tasks or may indirectly utilize a compute instance by submitting code to be executed by the provider network, which in turn utilizes a compute instance to execute the code (typically without the user having any control of or knowledge of the underlying compute instance(s) involved).
  • a“serverless” function may include code provided by a user or other entity that can be executed on demand.
  • Serverless functions may be maintained within provider network 100 and may be associated with a particular user or account, or may be generally accessible to multiple users and/or multiple accounts.
  • Each serverless function may be associated with a URL, URI, or other reference, which may be used to call the serverless function.
  • Each serverless function may be executed by a compute instance, such as a virtual machine, container, etc., when triggered or invoked.
  • a serverless function can be invoked through an application programming interface (“API”) call or a specially formatted HyperText Transport Protocol (“HTTP”) request message.
  • API application programming interface
  • HTTP HyperText Transport Protocol
  • users can define serverless functions that can be executed on demand, without requiring the user to maintain dedicated infrastructure to execute the serverless function.
  • the serverless functions can be executed on demand using resources maintained by the provider network 100.
  • these resources may be maintained in a“ready” state (e.g., having a pre-initialized runtime environment configured to execute the serverless functions), allowing the serverless functions to be executed in near real-time.
  • a customer can access an image enhancement service or module 108 in provider network 100 using a client device 102.
  • the client device 102 can access the image enhancement service or module 108 over one or more intermediate networks 106 through an interface provided by image enhancement service or module 108, such as an API, console, application, etc.
  • a user can upload one or more image files 111 to an input store 110 in a storage service 104.
  • the storage service may provide a virtual data store (e.g., a folder or“bucket”, a virtualized volume, a database, etc.) provided by the provider network 100.
  • the user may access the functionality of storage service 104, for example via one or more APIs, to store data to the input store 110.
  • the image enhancement service or module 108 includes at least one trained convolutional neural network (CNN) 112 and, in some embodiments, includes an object recognition module 114.
  • the least one trained CNN 112 performs image enhancement on at least a proper subset of an image as will be detailed below.
  • the object recognition module 114 finds one or more particular objects (such as visage recognition) in an image.
  • a CNN selector 113 uses an one or more of an input image (or a collection of them), a recognized object, the bandwidth available to the neural network (which may impact the initial resolution), processing power available, power, an acceptable latency, locality information for the image and/or destination viewer, memory available, lighting information for the image, screen resolution, etc. select a trained CNN 112 to perform the image enhancement.
  • the circled numerals illustrate an exemplary flow of actions.
  • the user sends a request to image enhancement service or module 108 to enhance one or more images.
  • the request may include at least one of a location of the neural network (e.g., CNN 112) to use, a location of an image-based file (such as solo image file, a video file, etc.), a desired resolution, if object detection is to be used (and for what), etc.
  • a neural network 112 may have different profiles to utilize depending upon environmental factors such as processing power, power, etc.
  • the image enhancement service or module 108 calls for images 111 (such as for a video) according to the request.
  • the input storage 110 may simply be a buffer and not a longer-term storage.
  • the trained CNN 112 may access additional image content (such as preceding and succeeding frames) at circle 6.
  • the trained CNN 112 performs image enhancement at circle 7 and provides the result(s) to the client device(s) 102 at circle 8.
  • an output of the CNN 112 per image (or proper subset thereof) is multiple images (or proper subsets thereof). For example, an image for To and T0.5 such that potentially a subsequent image in time may not need to be generated“from scratch.”
  • FIG. 2 is a diagram illustrating embodiments of an environment for enhancing images.
  • the environment is a client device 102.
  • an application 201 can access an image enhancement service or module 108.
  • the image enhancement service or module 108 includes at least one trained convolutional neural network (CNN) 112 and, in some embodiments, includes an object recognition module 114.
  • CNN convolutional neural network
  • the least one trained CNN 112 performs image enhancement on at least a proper subset of an image of images 211 of storage 210 as will be detailed below.
  • the object recognition module 114 finds one or more particular objects (such as a visage) in an image.
  • a CNN selector 113 uses an one or more of an input image (or a collection of them), a recognized object, the bandwidth available to the neural network (which may impact the initial resolution), processing power available, power, an acceptable latency, locality information for the image and/or destination viewer, lighting information for the image, screen resolution, etc. select a trained CNN 112 to perform the image enhancement.
  • the circled numerals illustrate an exemplary flow of actions.
  • the application 201 sends a request to image enhancement service or module 108 to enhance one or more images.
  • the request may include at least one of a location of the neural network (e.g.,
  • CNN 112 to use, a location of an image-based file (such as solo image file, a video file, etc.), a desired resolution, if object detection is to be used (and for what), etc. Additionally, a neural network 112 may have different profiles to utilize depending upon environmental factors such as processing power, power, etc.
  • the image enhancement service or module 108 calls for images 211 (such as for a video) according to the request.
  • the storage 210 may simply be a buffer and not a longer-term storage.
  • the trained CNN 112 may access additional image content (such as preceding and succeeding frames) at circle 6.
  • the trained CNN 112 performs image enhancement at circle 7 and provides the result(s) to the client device(s) 102 at circle 8.
  • FIG. 3 illustrates embodiments of an image enhancement service or module to be used in inference.
  • the image enhancement service or module 108 is to be utilized to enhance one or more images, or proper subsets of images thereof, of a video in response to an inference request to do so. That image may be a part of a video stream or a stored video. In some embodiments, an image is a frame of video. Typically, although not necessarily the case, the video is of a lower quality (lower resolution, etc.).
  • the image enhancement service or module 108 may be used in many different environments.
  • the image enhancement service or module 108 is a part of a provider network (e.g., as a service) that can be called (such as shown in FIG. 1).
  • the image enhancement service or module 108 receives a video (streamed or stored), enhances some part of it as directed, and then makes that enhancement available (such as by storing it or transmitting it to a client device).
  • the image enhancement service or module 108 is a part of a client device (such as shown in FIG. 2).
  • the image enhancement service or module 108 receives a video (streamed or stored), enhances some part of it as directed, and then makes that enhancement available (such as by making available for display or storing for later playback).
  • an object recognition module 114 finds one or more particular objects (such as a visage) in an input image 305 to be enhanced.
  • a CNN selector 113 uses one or more of an input image (or a collection of them), a recognized object, the bandwidth available to the neural network (which may impact the initial resolution), processing power available, power, an acceptable latency, locality information for the image and/or destination viewer, lighting information for the image, screen resolution, etc. select a trained CNN 112 to perform the image enhancement.
  • the image enhancement service or module 108 includes a convolutional neural network (CNN) 310 comprised of a plurality of layers 311 which may include one or more of: 1) one or more convolutional layers, 2) one or more subsampling layers, 3) one or more pooling layers, 4) and other layers.
  • the CNN layers 311 operate one an input image 305 (or a proper subset thereof) to be enhanced and one or more preceding 301-303 and succeeding images 307-309 (or a proper subset thereof in the video to be enhanced.
  • Connected to these layers 311 is a CNN residual layer 313 that generates a residual value for the output of the CNN layers 313.
  • the residual value removes or minimizes spatial and temporal compression artifacts produced by the video encoder that produced the image such as blocking, ringing, and flickering.
  • an artifact removal layer (or layers) 317 removes artifacts from the input image 305.
  • the image enhancement service or module 108 also includes an image upsampler 315 (such as a bilinear upsampler) that upsamples the image to be enhanced.
  • an image upsampler 315 such as a bilinear upsampler
  • the image is upsampled four times, however, other upsampling scales may be used. The upsampled image and the residual are summed to generate an enhanced image.
  • the image enhancement service or module 108 includes an upsampling filter layer 316 in the CNN 310 that is coupled to the CNN layers 311 to upsample the image (or subset thereof). A tensor product of the upsampled image and the input image 305 is performed and then summed with the residual to generate an enhanced image.
  • FIG. 4 illustrates embodiments of an image enhancement service or module to be used in inference that uses a more recurrent approach.
  • a previous higher resolution image will be an input into the CNN 310 instead of a plurality of preceding 301-303 and succeeding 307- 309 lower resolution images.
  • the image enhancement service or module 108 does utilize a preceding and succeeding lower resolution images to generate a higher resolution image as detailed above (such as using either the image upsampler 315 or upsampling filter layer 316). This higher resolution image is then passed back as an input to the CNN 310 along with the image to enhance.
  • the CNN 310 only utilizes the single lower resolution image and a previously generated higher resolution image in the generation of an enhanced image.
  • the initial image to enhance is enhanced without any other input to the CNN 310, (e.g., the output of the CNN layers 311 and any other layers needed to generate an enhanced image).
  • the result of that initial enhancement is then passed back in as an input to the CNN 310 along with the image to enhance.
  • the CNN 310 only utilizes the single lower resolution image and the previously generated higher resolution image in the generation of an enhanced image. This approach reduces a number of frames to process and may be faster than the previously detailed approach.
  • an artifact removal layer (or layers) 317 removes artifacts from the input image 305.
  • FIG. 5 illustrates embodiments of an image enhancement service or module during training.
  • the CNN layers 311 are shared by the CNN residual layer 313 detailed above and CNN filters 316.
  • the CNN filters 316 perform the function of upsampling.
  • the CNN 310 utilizes those filters 316 instead of a separate upsampling function.
  • the entire CNN 310 is trained including the CNN filters 316 which are not used during inference (in some embodiments).
  • an exaggerated target training images are used to promote the network to enhance sharpness and color on the output.
  • the CNN 310 (or at least the CNN layers) is a component of a generative adversarial network (GAN).
  • GAN generative adversarial network
  • the CNN 310 is a generator component of a GAN.
  • an artifact removal layer (or layers) 317 removes artifacts from the input image 305.
  • FIG. 6 illustrates embodiments of a GAN of which part may be used to generate enhanced images.
  • the GAN 600 includes a generator with filters 611 and a discriminator 613.
  • the generator with filters 611 are to produce a“fake” image from an image of a lower resolution image dataset 601 and the discriminator 613 is to compare the“fake” image to a corresponding higher resolution image from a higher resolution dataset 603 to determine if the image is“real” or“fake” (evaluates whether the generate image belongs in the training data set of higher resolution images or not).
  • the output of the discriminator 613 is a probability that the generated image is fake.
  • the discriminator 613 is a fully connected neural network. If the generator 611 is performing well (generating good“fakes”) then the discriminator 613 will return a value indicating a higher probability of the generated images being real.
  • the constructed image of the generator with filters 611 is also subtracted from a corresponding image of the higher resolution image dataset 601 which indicates a perceptual loss.
  • This perceptual loss is added to the output of the discriminator 613 to produce a generator loss which is fed back to the generator with filters 611 to update the weights of the generator with filters 611 to help train it.
  • a discriminator loss is the inverse of the generator loss and is fed back to the discriminator 613 to update its weights. Note that in some embodiments the discriminator 613 and generator with filters 611 are trained in turns by first training one and then the other.
  • an exaggerated target training image is used as the higher resolution image to promotes the network to enhance sharpness and color on the output.
  • the output of the generator with filters 611 is added to an upsampled input image 305.
  • one or more of an object recognition module 114 and a CNN selector 113 are utilized and operate as previously described.
  • FIG. 7 illustrates embodiments of a generator with filters of a GAN of which part may be used to generate enhanced images during training of the GAN.
  • the discriminator 613 and generator with filters 611 are trained progressively. In progressive training, both the generator with filters 611 and discriminator 613 start off at lower resolution and new layers that produce increasingly higher resolutions are added during training. When a new set of layers are added, they are slowly blended in over several epochs and then stabilized for several more epochs.
  • the generator with filters 611 and discriminator 613 comprise multiple stages each and this illustration is for embodiments of the generator with filters 611.
  • the generator with filters 611 performs some temporal pre processing (such as concatenating lower resolution images along the temporal dimension) using a temporal pre-processing layer(s) 703.
  • the temporal dimension is then reduced from 2n+l to 1 using a temporal reduction layer(s) 705 which, in some embodiments, consists of a set of 3x3x3 3D convolutions with no padding in the temporal dimension.
  • the generator with filters 611 are trained in multiple stages.
  • the generator with filters 611 is trained to only remove compression artifacts and output an image that has the same resolution (IX) as the input images and features using artifact removal layer(s) 707.
  • the generator with filters 611 are initialized using only mean squared error (MSE) and perceptual losses. Note that because the generator with filters 611 are trained to output an image at the IX stage by enforcing perceptual and MSE losses throughout all training stages, the IX value can be used to verify the model’s ability to remove encoding artifacts at this stage.
  • MSE mean squared error
  • a second stage (comprising 2X layer(s) 709, upsampling layer(s) 711, and 2X filters 717) processes the features of the IX image (using the 2X layer(s) 709) to generate a higher resolution image (using the 2X layer(s) 709) and upsample features of the higher resolution image using the upsampling layer(s) 711.
  • the discriminator 613 may also be enabled.
  • the 2X layer(s) 709 is/are one or more CNN layers.
  • the adversarial loss is used in addition to MSE and perceptual losses for training the generator with filters 611.
  • the 2X filter(s) 717 are generated from the output features of the artifact removal layer(s) 707.
  • a third stage is used for blending and stabilizing the 4X output in an analogous way to the second stage using 4X filters 719, 4X layer(s) 713 on the output of the 2X upsampling layer(s) 711, and 4X upsampling layer(s) 715.
  • 4X RGB images are produced using a product of the output of the 4X filter(s) 719 with the IX RGB image(s) produced by the artifact removal layer(s) 707 that has been summed with the output of the 4X upsampling layer(s) 715.
  • FIG. 8 illustrates embodiments of a generator with filters 611 of progressively trained GAN. As shown, the filtering procedure using filter(s) 819 is simplified from training in that the filter(s) 819 only act on the output of the artifact removal layer(s) 707.
  • FIG. 9 illustrates embodiments of a generator with filters 611 of progressively trained GAN. This is a more simplified version of the generator with filters 611 of FIG. 8 where the 2X layer(s) 709, 2X upsampling layer(s) 711, and 4X layer(s) 713 are not utilized.
  • FIG. 10 illustrates embodiments of a discriminator of a GAN.
  • This GAN takes in the RGB values from the generator with filters 711.
  • the RGB 4X values are subjected to a 2D convolution 1001 using 64 channels followed by a leaky ReLU (LReLU) 1003.
  • the output of the LReLU is added to the RGB 2X values and then subjected to a 2D convolution using 128 channels followed by batch normalization 1005 and another LReLU.
  • the output of the second LReLU is added to the RGB IX values and then subjected to a plurality of convolution-BN-LReLU combinations using 2D convolutions from 256 to 2048 to 256 channels.
  • One or more dense layers are used to flatten the result and then a sigmoid activation function 1011 is applied.
  • FIG. 11 illustrates embodiments of an image enhancement service or module to be used in inference where the CNN to produce a higher resolution image is the generator of a GAN.
  • the image enhancement service or module 108 is to be utilized to enhance one or more images, or proper subsets of images thereof, of a video in response to an inference request to do so as noted previously.
  • the image enhancement service or module 108 includes a convolutional neural network (CNN) that is the generator with filters 611 of a GAN.
  • CNN convolutional neural network
  • a lower resolution image 305 and/or one or more surrounding images 300 is/are passed to the generator with filters 611 which produces the higher resolution image.
  • FIG. 12 is a flow diagram illustrating embodiments of a method for enhancing an image. Depending upon the implementation, this method may be performed as a part of a service of a provider network, on a client device (sender and/or receiver), and/or a combination thereof.
  • a neural network to enhance an image, or a proper subset thereof is trained.
  • a generator or generator with filters of a GAN or one or more CNN layers are trained as discussed herein.
  • a neural network may be trained for different environments. For example, a different neural network may be used depending upon the bandwidth available to the neural network (which may impact the initial resolution), processing power available, power available (e.g., it may not be best to render very high-resolution images on a mobile device), an acceptable latency, etc.
  • a request to enhance one or more images is received.
  • This request may include at least one of a location of the neural network to use, a location of an image-based file (such as solo image file, a video file, etc.), a desired resolution, which images to enhance (for example, not enhance higher quality frames), etc.
  • a stored neural network may have different profiles to utilize depending upon environmental factors such as processing power, power, etc. The request may come from a client device user, an application, etc.
  • An image to be at least partially enhanced using a trained neural network is received at 1203.
  • a lower resolution image that is a part of a video stream or file is received by an image enhancement service or module which have been detailed above.
  • a determination of if the received image should be enhanced is made at 1204.
  • the higher quality frames may be found by an image quality assessment or by appending a known pattern to every frame from where to judge the distortion.
  • the different types of images may be handled differently. For example, higher quality images may not be enhanced by either a sender or a receiver. Assessing image quality per image to potentially enhance may also be used purposefully to reduce bandwidth not enhancing some frames with already higher quality and using those higher quality frames to help get a better overall output at a reduced average bitrate. If not, then the next image to potentially enhanced is received at 1203.
  • a determination of a proper subset of the received image to enhance is made at 1205. For example, an object recognition of a visage, etc. is made. Streamed and compressed images contain some regions that are more informative than others or more relevant such as people, edges of objects, center of the image, etc. Operating on fixing the entire image may be too expensive computationally or unnecessary. Note that in some embodiments, a sender using this method may reduce bandwidth by sending the most relevant parts of the image with less compression than the remainder of the image.
  • the encodes information with the image itself that notes the image, or a proper subset thereof is important so that the receiving end only applies the removal of artifacts on pre-defined areas making the process faster and also requiring less bandwidth overall.
  • the receiver can then determine what to enhance in an image.
  • a determination of one or more CNNs to use for enhancement is made at 1206. Examples of what goes into that determination have been detailed.
  • the (proper subset of the) received image is enhanced according to the request at 1207.
  • the trained neural network generates a residual value based on the (proper subset of the) received image and at least one corresponding image portion of a preceding lower resolution image and at least one corresponding image portion of a subsequent lower resolution image at 1209.
  • the (proper subset of the) received image is upscaled at 1211.
  • the upscaled (proper subset of the) received image and residual value are added to generate an enhanced image of the (proper subset of the) received image at 1213.
  • multiple CNNs may be used on a single image. For example, a proper subset of an image may use one CNN and the rest of the image may use a different CNN, etc.
  • the enhanced image is output at 1215.
  • the enhanced image is stored, displayed, etc.
  • the enhanced image is merged with other (potentially unenhanced) images of a video file to generate a higher quality video file at 1217.
  • a device with very low memory could store very low resolution video locally, mix this local data with enhanced streamed low bitrate video to create a better quality output.
  • FIG. 13 is a flow diagram illustrating embodiments of a method for enhancing an image. Depending upon the implementation, this method may be performed as a part of a service of a provider network, on a client device (sender and/or receiver), and/or a combination thereof.
  • a neural network to enhance an image, or a proper subset thereof is trained.
  • a generator or generator with filters of a GAN or one or more CNN layers are trained as discussed herein.
  • a neural network may be trained for different environments. For example, a different neural network may be used depending upon the bandwidth available to the neural network (which may impact the initial resolution), processing power available, power available (e.g., it may not be best to render very high-resolution images on a mobile device), an acceptable latency, etc.
  • a request to enhance one or more images is received.
  • This request may include at least one of a location of the neural network to use, a location of an image-based file (such as solo image file, a video file, etc.), a desired resolution, which images to enhance (for example, not enhance higher quality frames), etc.
  • a stored neural network may have different profiles to utilize depending upon environmental factors such as processing power, power, etc. The request may come from a client device user, an application, etc.
  • An image to be at least partially enhanced using a trained neural network is received at 1303.
  • a lower resolution image that is a part of a video stream or file is received by an image enhancement service or module which have been detailed above.
  • a determination of if the received image should be enhanced is made at 1304.
  • the higher quality frames may be found by an image quality assessment or by appending a known pattern to every frame from where to judge the distortion.
  • the different types of images may be handled differently. For example, higher quality images may not be enhanced by either a sender or a receiver. Assessing image quality per image to potentially enhance may also be used purposefully to reduce bandwidth not enhancing some frames with already higher quality and using those higher quality frames to help get a better overall output at a reduced average bitrate. If not, then the next image to potentially enhanced is received at 1303.
  • a determination of a proper subset of the received image to enhance is made at 1305. For example, an object recognition of a visage, etc. is made. Streamed and compressed images contain some regions that are more informative than others or more relevant such as people, edges of objects, center of the image, etc. Operating on fixing the entire image may be too expensive computationally or unnecessary. Note that in some embodiments, a sender using this method may reduce bandwidth by sending the most relevant parts of the image with less compression than the remainder of the image.
  • the encodes information with the image itself that notes the image, or a proper subset thereof is important so that the receiving end only applies the removal of artifacts on pre-defined areas making the process faster and also requiring less bandwidth overall.
  • the receiver can then determine what to enhance in an image.
  • a determination of a CNN to use for enhancement is made at 1306. Examples of what goes into that determination have been detailed.
  • the (proper subset of the) received image is enhanced at 1307 using the trained neural network to generate an enhanced image of the (proper subset of the) received image based on the (proper subset of the) received image and a previously generated higher resolution image.
  • the previously generated higher resolution image was generated according to 807.
  • the previously generated higher resolution image was simply made by using the trained neural network without having a second input. Note that multiple CNNs may be used on a single image. For example, a proper subset of an image may use one CNN and the rest of the image may use a different CNN, etc.
  • the enhanced image is output at 1309.
  • the enhanced image is stored, displayed, etc.
  • the enhanced image is merged with other (potentially unenhanced) images of a video file to generate a higher quality video file at 1311.
  • a device with very low memory could store very low resolution video locally, mix this local data with enhanced streamed low bitrate video to create a better quality output.
  • FIG. 14 is a flow diagram illustrating embodiments of a method for enhancing an image. Depending upon the implementation, this method may be performed as a part of a service of a provider network, on a client device (sender and/or receiver), and/or a combination thereof.
  • a neural network to enhance an image, or a proper subset thereof is trained.
  • a generator or generator with filters of a GAN or one or more CNN layers are trained as discussed herein.
  • a neural network may be trained for different environments. For example, a different neural network may be used depending upon the bandwidth available to the neural network (which may impact the initial resolution), processing power available, power available (e.g., it may not be best to render very high-resolution images on a mobile device), an acceptable latency, locality (for example, for a particular geography/look a network can be trained that works well with images from a particular location, lighting (day, night, inside, etc.), screen resolution, etc.
  • a request to enhance one or more images is received.
  • This request may include at least one of a location of the neural network to use, a location of an image-based file (such as solo image file, a video file, etc.), a desired resolution, etc.
  • a stored neural network may have different profiles to utilize depending upon environmental factors such as processing power, power, etc. The request may come from a client device user, an application, etc.
  • An image to be at least partially enhanced using a trained neural network is received at 1403.
  • a lower resolution image that is a part of a video stream or file is received by an image enhancement service or module which have been detailed above.
  • a determination of if the received image should be enhanced is made at 1404.
  • the higher quality frames may be found by an image quality assessment or by appending a known pattern to every frame from where to judge the distortion.
  • the different types of images may be handled differently. For example, higher quality images may not be enhanced by either a sender or a receiver. Assessing image quality per image to potentially enhance may also be used purposefully to reduce bandwidth not enhancing some frames with already higher quality and using those higher quality frames to help get a better overall output at a reduced average bitrate. If not, then the next image to potentially enhanced is received at 1403.
  • a determination of a proper subset of the received image to enhance is made at 1405. For example, an object recognition of a visage, etc. is made. Streamed and compressed images contain some regions that are more informative than others or more relevant such as people, edges of objects, center of the image, etc. Operating on fixing the entire image may be too expensive computationally or unnecessary. Note that in some embodiments, a sender using this method may reduce bandwidth by sending the most relevant parts of the image with less compression than the remainder of the image.
  • the encodes information with the image itself that notes the image, or a proper subset thereof is important so that the receiving end only applies the removal of artifacts on pre-defined areas making the process faster and also requiring less bandwidth overall.
  • the receiver can then determine what to enhance in an image.
  • the (proper subset of the) received image is enhanced according to the request at 1407.
  • the trained neural network generates a residual value based on the (proper subset of the) received image and a previously generated higher resolution image.
  • the (proper subset of the) received image is upscaled at 1411.
  • the upscaled (proper subset of the) received image and residual value are added to generate an enhanced image of the (proper subset of the) received image at 1413.
  • multiple CNNs may be used on a single image. For example, a proper subset of an image may use one CNN and the rest of the image may use a different CNN, etc.
  • the enhanced image is output at 1415.
  • the enhanced image is stored, displayed, etc.
  • the enhanced image is merged with other (potentially unenhanced) images of a video file to generate a higher quality video file at 1417.
  • a device with very low memory could store very low resolution video locally, mix this local data with enhanced streamed low bitrate video to create a better quality output.
  • FIG. 15 is a flow diagram illustrating embodiments of a method for enhancing an image. Depending upon the implementation, this method may be performed as a part of a service of a provider network, on a client device (sender and/or receiver), and/or a combination thereof.
  • a neural network to enhance an image, or a proper subset thereof is trained.
  • a generator or generator with filters of a GAN or one or more CNN layers are trained as discussed herein.
  • a neural network may be trained for different environments. For example, a different neural network may be used depending upon the bandwidth available to the neural network (which may impact the initial resolution), processing power available, power available (e.g., it may not be best to render very high-resolution images on a mobile device), an acceptable latency, locality (for example, for a particular geography/look a network can be trained that works well with images from a particular location, lighting (day, night, inside, etc.), screen resolution, etc.
  • a request to enhance one or more images is received.
  • This request may include at least one of a location of the neural network to use, a location of an image-based file (such as solo image file, a video file, etc.), a desired resolution, etc.
  • a stored neural network may have different profiles to utilize depending upon environmental factors such as processing power, power, etc.
  • the request may come from a client device user, an application, etc.
  • An image to be at least partially enhanced using a trained neural network is received at 1503. For example, a lower resolution image that is a part of a video stream or file is received by an image enhancement service or module which have been detailed above.
  • a determination of if the received image should be enhanced is made at 1504.
  • the higher quality frames may be found by an image quality assessment or by appending a known pattern to every frame from where to judge the distortion.
  • the different types of images may be handled differently. For example, higher quality images may not be enhanced by either a sender or a receiver. Assessing image quality per image to potentially enhance may also be used purposefully to reduce bandwidth not enhancing some frames with already higher quality and using those higher quality frames to help get a better overall output at a reduced average bitrate. If not, then the next image to potentially enhanced is received at 1503.
  • a determination of a proper subset of the received image to enhance is made at 1505. For example, an object recognition of a visage, etc. is made. Streamed and compressed images contain some regions that are more informative than others or more relevant such as people, edges of objects, center of the image, etc. Operating on fixing the entire image may be too expensive computationally or unnecessary. Note that in some embodiments, a sender using this method may reduce bandwidth by sending the most relevant parts of the image with less compression than the remainder of the image.
  • the encodes information with the image itself that notes the image, or a proper subset thereof is important so that the receiving end only applies the removal of artifacts on pre-defined areas making the process faster and also requiring less bandwidth overall.
  • the receiver can then determine what to enhance in an image.
  • the (proper subset of the) received image is enhanced using a generator with filters according to the request at 1507 as follows.
  • the lower-resolution image and one or more neighboring images are pre-processed by concatenating the lower-resolution images along a temporal dimension. This is followed, in some embodiments, with a temporal reduction of the concatenated images.
  • Artifacts are the (temporally reduced) concatenated images are removed at a first resolution to generate a first red, green, blue (RGB) image using an artifact removal layer and features of the first RGB image.
  • features of the first RGB image are processed at a second, higher resolution to generate a second RGB image and features of that second, higher resolution image are upsampled.
  • the features of the second RGB image are then processed at a third, higher resolution to generate a third RGB image and the features of the third RGB image are upsampled to generate a residual of the third RGB image.
  • a filter from the features of the first RGB image is generated and a product of the generated filter and the RGB image generated by the artifact removal layer is performed. A sum of the product with the residual of the third RGB image generates an enhanced image.
  • features output from the artifact removal layer are upsampled to generate a residual.
  • a sum of the residual with an input RGB image generates an enhanced image.
  • the enhanced image is output at 1515.
  • the enhanced image is stored, displayed, etc.
  • the enhanced image is merged with other (potentially unenhanced) images of a video file to generate a higher quality video file at 1517.
  • a device with very low memory could store very low resolution video locally, mix this local data with enhanced streamed low bitrate video to create a better quality output.
  • CNNs typically, the design of CNNs (such as those detailed above) is mostly hand-crafted. This can be inefficient.
  • Detailed here are embodiments of a method to help automatically generate and quickly assess potential networks via an optimization algorithm.
  • elements such as number of layers, filter sizes, time windows, and others are (hyper)parameterized.
  • parameters are encoded with a gradient that lead to them. That is for those CNNs that are more desirable it is possible to know which direction (parameter) helped them.
  • image quality is difficult to measure quantitatively and most commonly done via metrics such as peak signal to noise ratio (PSNR). This and other similar metrics are only roughly indicative of true perceptual quality but enough to guide the optimization algorithm.
  • PSNR peak signal to noise ratio
  • FIG. 16 illustrates an example of a created samples for a Pareto front
  • the Pareto front is a graph of solutions that are better in one or more aspects over other solutions and allows for hundreds of networks to be assessed and to perform optimization on a smaller subset (the front) for further iterations and to select the current bes performers from that.
  • a first network 1601 has a plurality of parameters and a gradient with respect to a parent network.
  • the hashed boxes represent a Pareto (optimal) value after evaluating the first network 1601.
  • the F parameter is optimal with a value of 8.
  • the parameters of the first network 1601 are added to the gradient and a (random) mutation 1603 is introduced to form a child network 1605 to analyze. After the analysis, another Pareto value (this time for parameter C) has been found. More mutations can be applied to either the first network 1601 or the child network 1605 to figure out Pareto values for each of the parameters.
  • FIG. 17 is a flow diagram illustrating a method for training a neural network using a Pareto front.
  • a model is evaluated to measure performance (accuracy and time) for the model having a set of hyperparameters and determine gradient values for those hyperparameters with respect to a parent model.
  • a plot of the model is made such as plotting peak signal-to-noise ratio (PSNR) between images and speed.
  • PSNR peak signal-to-noise ratio
  • Pareto hyperparameters are tracked that provide desired performance for the model with respect to a particular hyperparameter at 1703.
  • the gradient values are added to the non-Pareto hyperparameters and introduce a random mutation to at least some of those non-Pareto hyperparameters at 1705.
  • the model with the changes is evaluated to measure performance (accuracy and time) for the model using the Pareto hyperparameters, the mutated non-pareto parameters, and remaining hyperparameters.
  • the cycle of tracking Pareto hyperparameters, etc. may continue until all of the Pareto hyperparameters have been found or until the process is stopped for another reason.
  • FIG. 18 illustrates some exemplary comparisons of various image renderings.
  • Non- enhanced images 1801 show a noticeable amount of blur.
  • CNN enhanced images 1803 show a marked improvement from that.
  • a further improvement of using a generator from a GAN for the CNN is shown in images 1805.
  • FIG. 19 illustrates an example provider network (or“service provider system”) environment according to some embodiments.
  • a provider network 1900 may provide resource virtualization to customers via one or more virtualization services 1910 that allow customers to purchase, rent, or otherwise obtain instances 1912 of virtualized resources, including but not limited to computation and storage resources, implemented on devices within the provider network or networks in one or more data centers.
  • Local Internet Protocol (IP) addresses 1916 may be associated with the resource instances 1912; the local IP addresses are the internal network addresses of the resource instances 1912 on the provider network 1900.
  • the provider network 1900 may also provide public IP addresses 1914 and/or public IP address ranges (e.g., Internet Protocol version 4 (IPv4) or Internet Protocol version 6 (IPv6) addresses) that customers may obtain from the provider 1900.
  • IPv4 Internet Protocol version 4
  • IPv6 Internet Protocol version 6
  • the provider network 1900 may allow a customer of the service provider (e.g., a customer that operates one or more client networks 1950A-1950C including one or more customer device(s) 1952) to dynamically associate at least some public IP addresses 1914 assigned or allocated to the customer with particular resource instances 1912 assigned to the customer.
  • the provider network 1900 may also allow the customer to remap a public IP address 1914, previously mapped to one virtualized computing resource instance 1912 allocated to the customer, to another virtualized computing resource instance 1912 that is also allocated to the customer.
  • a customer of the service provider such as the operator of customer network(s) 1950A-1950C may, for example, implement customer-specific applications and present the customer’ s applications on an intermediate network 1940, such as the Internet.
  • Other network entities 1920 on the intermediate network 1940 may then generate traffic to a destination public IP address 1914 published by the customer network(s) 1950A-1950C; the traffic is routed to the service provider data center, and at the data center is routed, via a network substrate, to the local IP address 1916 of the virtualized computing resource instance 1912 currently mapped to the destination public IP address 1914.
  • response traffic from the virtualized computing resource instance 1912 may be routed via the network substrate back onto the intermediate network 1940 to the source entity 1920.
  • Local IP addresses refer to the internal or“private” network addresses, for example, of resource instances in a provider network.
  • Local IP addresses can be within address blocks reserved by Internet Engineering Task Force (IETF) Request for Comments (RFC) 1918 and/or of an address format specified by IETF RFC 4193, and may be mutable within the provider network.
  • Network traffic originating outside the provider network is not directly routed to local IP addresses; instead, the traffic uses public IP addresses that are mapped to the local IP addresses of the resource instances.
  • the provider network may include networking devices or appliances that provide network address translation (NAT) or similar functionality to perform the mapping from public IP addresses to local IP addresses and vice versa.
  • NAT network address translation
  • Public IP addresses are Internet mutable network addresses that are assigned to resource instances, either by the service provider or by the customer. Traffic routed to a public IP address is translated, for example via 1: 1 NAT, and forwarded to the respective local IP address of a resource instance.
  • Some public IP addresses may be assigned by the provider network infrastructure to particular resource instances; these public IP addresses may be referred to as standard public IP addresses, or simply standard IP addresses.
  • the mapping of a standard IP address to a local IP address of a resource instance is the default launch configuration for all resource instance types.
  • At least some public IP addresses may be allocated to or obtained by customers of the provider network 1900; a customer may then assign their allocated public IP addresses to particular resource instances allocated to the customer. These public IP addresses may be referred to as customer public IP addresses, or simply customer IP addresses. Instead of being assigned by the provider network 1900 to resource instances as in the case of standard IP addresses, customer IP addresses may be assigned to resource instances by the customers, for example via an API provided by the service provider. Unlike standard IP addresses, customer IP addresses are allocated to customer accounts and can be remapped to other resource instances by the respective customers as necessary or desired. A customer IP address is associated with a customer’s account, not a particular resource instance, and the customer controls that IP address until the customer chooses to release it.
  • customer IP addresses allow the customer to mask resource instance or availability zone failures by remapping the customer’s public IP addresses to any resource instance associated with the customer’s account.
  • the customer IP addresses for example, enable a customer to engineer around problems with the customer’s resource instances or software by remapping customer IP addresses to replacement resource instances.
  • FIG. 20 illustrates an example data center that implements an overlay network on a network substrate using IP tunneling technology, according to some embodiments.
  • a provider data center 2000 may include a network substrate that includes networking nodes 2012 such as routers, switches, network address translators (NATs), and so on, which may be implemented as software, hardware, or as a combination thereof.
  • networking nodes 2012 such as routers, switches, network address translators (NATs), and so on, which may be implemented as software, hardware, or as a combination thereof.
  • Some embodiments may employ an Internet Protocol (IP) tunneling technology to provide an overlay network via which encapsulated packets may be passed through network substrate 2010 using tunnels.
  • IP tunneling technology may provide a mapping and encapsulating system for creating an overlay network on a network (e.g., a local network in data center 2000 of FIG.
  • the IP tunneling technology provides a virtual network topology (the overlay network); the interfaces (e.g., service APIs) that are presented to customers are attached to the overlay network so that when a customer provides an IP address to which the customer wants to send packets, the IP address is run in virtual space by communicating with a mapping service (e.g., mapping service 2030) that knows where the IP overlay addresses are.
  • a mapping service e.g., mapping service 2030
  • the IP tunneling technology may map IP overlay addresses (public IP addresses) to substrate IP addresses (local IP addresses), encapsulate the packets in a tunnel between the two namespaces, and deliver the packet to the correct endpoint via the tunnel, where the encapsulation is stripped from the packet.
  • IP overlay addresses public IP addresses
  • substrate IP addresses local IP addresses
  • a packet may be encapsulated in an overlay network packet format before sending, and the overlay network packet may be stripped after receiving.
  • At least some networks in which embodiments may be implemented may include hardware virtualization technology that enables multiple operating systems to ran concurrently on a host computer (e.g., hosts 2020A and 2020B of FIG. 20), i.e. as virtual machines (VMs) 2024 on the hosts 2020.
  • the VMs 2024 may, for example, be executed in slots on the hosts 2020 that are rented or leased to customers of a network provider.
  • a hypervisor, or virtual machine monitor (VMM) 2022, on a host 2020 presents the VMs 2024 on the host with a virtual platform and monitors the execution of the VMs 2024.
  • VMM virtual machine monitor
  • Each VM 2024 may be provided with one or more local IP addresses; the VMM 2022 on a host 2020 may be aware of the local IP addresses of the VMs 2024 on the host.
  • a mapping service 2030 may be aware of (e.g., via stored mapping information 2032) network IP prefixes and IP addresses of routers or other devices serving IP addresses on the local network. This includes the IP addresses of the VMMs 2022 serving multiple VMs 2024.
  • the mapping service 2030 may be centralized, for example on a server system, or alternatively may be distributed among two or more server systems or other devices on the network.
  • a network may, for example, use the mapping service technology and IP tunneling technology to, for example, route data packets between VMs 2024 on different hosts 2020 within the data center 2000 network; note that an interior gateway protocol (IGP) may be used to exchange routing information within such a local network.
  • IGP interior gateway protocol
  • the data center 2000 network may implement IP tunneling technology, mapping service technology, and a routing service technology to route traffic to and from virtualized resources, for example to route packets from the VMs 2024 on hosts 2020 in data center 2000 to Internet destinations, and from Internet sources to the VMs 2024.
  • Internet sources and destinations may, for example, include computing systems 2070 connected to the intermediate network 2040 and computing systems 2052 connected to local networks 2050 that connect to the intermediate network 2040 (e.g., via edge router(s) 2014 that connect the network 2050 to Internet transit providers).
  • the provider data center 2000 network may also route packets between resources in data center 2000, for example from a VM 2024 on a host 2020 in data center 2000 to other VMs 2024 on the same host or on other hosts 2020 in data center 2000.
  • a service provider that provides data center 2000 may also provide additional data center(s) 2060 that include hardware virtualization technology similar to data center 2000 and that may also be connected to intermediate network 2040. Packets may be forwarded from data center 2000 to other data centers 2060, for example from a VM 2024 on a host 2020 in data center 2000 to another VM on another host in another, similar data center 2060, and vice versa.
  • VMs virtual machines
  • the hardware virtualization technology may also be used to provide other computing resources, for example storage resources 2018A-2018N, as virtualized resources to customers of a network provider in a similar manner.
  • FIG. 21 is a block diagram of an example provider network that provides a storage service and a hardware virtualization service to customers, according to some embodiments.
  • Hardware virtualization service 2120 provides multiple computation resources 2124 (e.g., VMs) to customers.
  • the computation resources 2124 may, for example, be rented or leased to customers of the provider network 2100 (e.g., to a customer that implements customer network 2150).
  • Each computation resource 2124 may be provided with one or more local IP addresses.
  • Provider network 2100 may be configured to route packets from the local IP addresses of the computation resources 2124 to public Internet destinations, and from public Internet sources to the local IP addresses of computation resources 2124.
  • a virtual computing system 2192 and/or another customer device 2190 may access the functionality of storage service 2110, for example via one or more APIs 2102, to access data from and store data to storage resources 2118A-2118N of a virtual data store 2116 (e.g., a folder or“bucket”, a virtualized volume, a database, etc.) provided by the provider network 2100.
  • a virtual data store 2116 e.g., a folder or“bucket”, a virtualized volume, a database, etc.
  • a virtualized data store gateway may be provided at the customer network 2150 that may locally cache at least some data, for example frequently-accessed or critical data, and that may communicate with storage service 2110 via one or more communications channels to upload new or modified data from a local cache so that the primary store of data (virtualized data store 2116) is maintained.
  • a user via a virtual computing system 2192 and/or on another customer device 2190, may mount and access virtual data store 2116 volumes via storage service 2110 acting as a storage virtualization service, and these volumes may appear to the user as local (virtualized) storage 2198.
  • the virtualization service(s) may also be accessed from resource instances within the provider network 2100 via API(s) 2102.
  • a customer, appliance service provider, or other entity may access a virtualization service from within a respective virtual network on the provider network 2100 via an API 2102 to request allocation of one or more resource instances within the virtual network or within another virtual network.
  • a system that implements a portion or all of the techniques for image enhancement as described herein may include a general-purpose computer system that includes or is configured to access one or more computer-accessible media, such as computer system 2200 illustrated in FIG. 22.
  • computer system 2200 includes one or more processors 2210 coupled to a system memory 2220 via an input/output (I/O) interface 2230.
  • Computer system 2200 further includes a network interface 2240 coupled to I/O interface 2230. While FIG. 22 shows computer system 2200 as a single computing device, in various embodiments a computer system 2200 may include one computing device or any number of computing devices configured to work together as a single computer system 2200.
  • computer system 2200 may be a uniprocessor system including one processor 2210, or a multiprocessor system including several processors 2210 (e.g., two, four, eight, or another suitable number).
  • Processors 2210 may be any suitable processors capable of executing instructions.
  • processors 2210 may be general-purpose or embedded processors implementing any of a variety of instruction set architectures (ISAs), such as the x86, ARM, PowerPC, SPARC, or MIPS ISAs, or any other suitable ISA.
  • ISAs instruction set architectures
  • each of processors 2210 may commonly, but not necessarily, implement the same ISA.
  • System memory 2220 may store instructions and data accessible by processor(s) 2210.
  • system memory 2220 may be implemented using any suitable memory technology, such as random-access memory (RAM), static RAM (SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type of memory.
  • RAM random-access memory
  • SRAM static RAM
  • SDRAM synchronous dynamic RAM
  • program instructions and data implementing one or more desired functions, such as those methods, techniques, and data described above are shown stored within system memory 2220 as code 2225 and data 2226.
  • I/O interface 2230 may be configured to coordinate I/O traffic between processor 2210, system memory 2220, and any peripheral devices in the device, including network interface 2240 or other peripheral interfaces.
  • I/O interface 2230 may perform any necessary protocol, timing or other data transformations to convert data signals from one component (e.g., system memory 2220) into a format suitable for use by another component (e.g., processor 2210).
  • I/O interface 2230 may include support for devices attached through various types of peripheral buses, such as a variant of the Peripheral Component Interconnect (PCI) bus standard or the Universal Serial Bus (USB) standard, for example.
  • PCI Peripheral Component Interconnect
  • USB Universal Serial Bus
  • I/O interface 2230 may be split into two or more separate components, such as a north bridge and a south bridge, for example. Also, in some embodiments some or all of the functionality of I/O interface 2230, such as an interface to system memory 2220, may be incorporated directly into processor 2210.
  • Network interface 2240 may be configured to allow data to be exchanged between computer system 2200 and other devices 2260 attached to a network or networks 2250, such as other computer systems or devices as illustrated in FIG. 1, for example.
  • network interface 2240 may support communication via any suitable wired or wireless general data networks, such as types of Ethernet network, for example.
  • network interface 2240 may support communication via telecommunications/telephony networks such as analog voice networks or digital fiber communications networks, via storage area networks (SANs) such as Fibre Channel SANs, or via I/O any other suitable type of network and/or protocol.
  • SANs storage area networks
  • I/O any other suitable type of network and/or protocol.
  • a computer system 2200 includes one or more offload cards 2270 (including one or more processors 2275, and possibly including the one or more network interfaces 2240) that are connected using an I/O interface 2230 (e.g., a bus implementing a version of the Peripheral Component Interconnect - Express (PCI-E) standard, or another interconnect such as a QuickPath interconnect (QPI) or UltraPath interconnect (UPI)).
  • PCI-E Peripheral Component Interconnect - Express
  • QPI QuickPath interconnect
  • UPI UltraPath interconnect
  • the computer system 2200 may act as a host electronic device (e.g., operating as part of a hardware virtualization service) that hosts compute instances, and the one or more offload cards 2270 execute a virtualization manager that can manage compute instances that execute on the host electronic device.
  • the offload card(s) 2270 can perform compute instance management operations such as pausing and/or un-pausing compute instances, launching and/or terminating compute instances, performing memory transfer/copying operations, etc.
  • management operations may, in some embodiments, be performed by the offload card(s) 2270 in coordination with a hypervisor (e.g., upon a request from a hypervisor) that is executed by the other processors 2210A-2210N of the computer system 2200.
  • a hypervisor e.g., upon a request from a hypervisor
  • the virtualization manager implemented by the offload card(s) 2270 can accommodate requests from other entities (e.g., from compute instances themselves), and may not coordinate with (or service) any separate hypervisor.
  • system memory 2220 may be one embodiment of a computer- accessible medium configured to store program instructions and data as described above. However, in other embodiments, program instructions and/or data may be received, sent or stored upon different types of computer-accessible media.
  • a computer-accessible medium may include non-transitory storage media or memory media such as magnetic or optical media, e.g., disk or DVD/CD coupled to computer system 2200 via I/O interface 2230.
  • a non- transitory computer- accessible storage medium may also include any volatile or non-volatile media such as RAM (e.g., SDRAM, double data rate (DDR) SDRAM, SRAM, etc.), read only memory (ROM), etc., that may be included in some embodiments of computer system 2200 as system memory 2220 or another type of memory.
  • a computer-accessible medium may include transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as a network and/or a wireless link, such as may be implemented via network interface 2240.
  • FIG. 23 illustrates a logical arrangement of a set of general components of an example computing device 2300 such as provider network 100, client device(s) 102, etc.
  • a computing device 2300 can also be referred to as an electronic device.
  • the techniques shown in the figures and described herein can be implemented using code and data stored and executed on one or more electronic devices (e.g., a client end station and/or server end station).
  • Such electronic devices store and communicate (internally and/or with other electronic devices over a network) code and data using computer-readable media, such as non-transitory computer-readable storage media (e.g., magnetic disks, optical disks, Random Access Memory (RAM), Read Only Memory (ROM), flash memory devices, phase-change memory) and transitory computer-readable communication media (e.g., electrical, optical, acoustical or other form of propagated signals, such as carrier waves, infrared signals, digital signals).
  • non-transitory computer-readable storage media e.g., magnetic disks, optical disks, Random Access Memory (RAM), Read Only Memory (ROM), flash memory devices, phase-change memory
  • transitory computer-readable communication media e.g., electrical, optical, acoustical or other form of propagated signals, such as carrier waves, infrared signals, digital signals.
  • such electronic devices include hardware, such as a set of one or more processors 2302 (e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding) coupled to one or more other components, e.g., one or more non-transitory machine-readable storage media (e.g., memory 2304) to store code (e.g., instructions 2314) and/or data, and a set of one or more wired or wireless network interfaces 2308 allowing the electronic device to transmit data to and receive data from other computing devices, typically across one or more networks (e.g., Local Area Networks (LANs), the Internet).
  • LANs Local Area Networks
  • the coupling of the set of processors and other components is typically through one or more interconnects within the electronic device, (e.g., busses and possibly bridges).
  • the non- transitory machine-readable storage media (e.g., memory 2304) of a given electronic device typically stores code (e.g., instructions 2314) for execution on the set of one or more processors 2302 of that electronic device.
  • code e.g., instructions 2314
  • One or more parts of various embodiments may be implemented using different combinations of software, firmware, and/or hardware.
  • a computing device 2300 can include some type of display element 2306, such as a touch screen or liquid crystal display (LCD), although many devices such as portable media players might convey information via other means, such as through audio speakers, and other types of devices such as server end stations may not have a display element 2306 at all.
  • some computing devices used in some embodiments include at least one input and/or output component(s) 2312 able to receive input from a user.
  • This input component can include, for example, a push button, touch pad, touch screen, wheel, joystick, keyboard, mouse, keypad, or any other such device or element whereby a user is able to input a command to the device.
  • a device might be controlled through a combination of visual and/or audio commands and utilize a microphone, camera, sensor, etc., such that a user can control the device without having to be in physical contact with the device.
  • the system includes an electronic client device 2402, which may also be referred to as a client device and can be any appropriate device operable to send and receive requests, messages or information over an appropriate network 2404 and convey information back to a user of the device 2402.
  • client devices include personal computers (PCs), cell phones, handheld messaging devices, laptop computers, set-top boxes, personal data assistants, electronic book readers, wearable electronic devices (e.g., glasses, wristbands, monitors), and the like.
  • the one or more networks 2404 can include any appropriate network, including an intranet, the Internet, a cellular network, a local area network, or any other such network or combination thereof. Components used for such a system can depend at least in part upon the type of network and/or environment selected.
  • a user might submit a search request for a certain type of item.
  • the data store 2410 might access the user information 2416 to verify the identity of the user and can access a production data 2412 to obtain information about items of that type. The information can then be returned to the user, such as in a listing of results on a web page that the user is able to view via a browser on the user device 2402. Information for a particular item of interest can be viewed in a dedicated page or window of the browser.
  • the environment in one embodiment is a distributed computing environment utilizing several computer systems and components that are interconnected via communication links, using one or more computer networks or direct connections.
  • a distributed computing environment utilizing several computer systems and components that are interconnected via communication links, using one or more computer networks or direct connections.
  • FIG. 24 the depiction of the environment 2400 in FIG. 24 should be taken as being illustrative in nature and not limiting to the scope of the disclosure.
  • Various embodiments discussed or suggested herein can be implemented in a wide variety of operating environments, which in some cases can include one or more user computers, computing devices, or processing devices which can be used to operate any of a number of applications.
  • the environment can include a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across the network. In a particular set of embodiments, the information may reside in a storage-area network (SAN) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers, servers, or other network devices may be stored locally and/or remotely, as appropriate.
  • SAN storage-area network
  • Storage media and computer readable media for containing code, or portions of code can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non removable media implemented in any method or technology for storage and/or transmission of information such as computer readable instructions, data structures, program modules, or other data, including RAM, ROM, Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory or other memory technology, Compact Disc-Read Only Memory (CD- ROM), Digital Versatile Disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a system device.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • CD- ROM Compact Disc-Read Only Memory
  • DVD Digital Versatile Disk
  • magnetic cassettes magnetic tape
  • magnetic disk storage magnetic disk storage devices
  • Reference numerals with suffix letters may be used to indicate that there can be one or multiple instances of the referenced entity in various embodiments, and when there are multiple instances, each does not need to be identical but may instead share some general traits or act in common ways. Further, the particular suffixes used are not meant to imply that a particular amount of the entity exists unless specifically indicated to the contrary. Thus, two entities using the same or different suffix letters may or may not have the same number of instances in various embodiments.
  • references to“one embodiment,”“an embodiment,”“an example embodiment,” etc. indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • disjunctive language such as the phrase“at least one of A, B, or C” is intended to be understood to mean either A, B, or C, or any combination thereof (e.g., A, B, and/or C). As such, disjunctive language is not intended to, nor should it be understood to, imply that a given embodiment requires at least one of A, at least one of B, or at least one of C to each be present.
  • disjunctive language is not intended to, nor should it be understood to, imply that a given embodiment requires at least one of A, at least one of B, or at least one of C to each be present.
  • a computer-implemented method comprising:
  • removing artifacts by generating, using a layer of the trained neural network, a residual value based on the proper subset of the received frame and at least one corresponding image portion of a preceding lower resolution frame in the video file and at least one corresponding frame portion of a subsequent lower resolution image in the video file, upscaling at least the proper subset of the received frame using bilinear upsampling, and combining the upscaled proper subset of the received image and residual value to generate an enhanced frame;
  • a computer-implemented method comprising:
  • removing artifacts by generating, using a layer of the trained neural network, a residual value based on the proper subset of the received image and at least one corresponding image portion of a preceding lower resolution image in the video file and at least one corresponding image portion of a subsequent lower resolution image in the video file, upscaling the received image using bilinear upsampling, and
  • a system comprising:
  • an image enhancement service implemented by one or more electronic devices the image enhancement service including instructions that upon execution cause the image enhancement service to:
  • a computer-implemented method comprising :
  • a computer-implemented method comprising:
  • a system comprising:
  • an image enhancement service implemented by one or more electronic devices the image enhancement service including instructions that upon execution cause the image enhancement service to:
  • the image enhancement service is to determine the neural network to perform at least a portion of enhancing the lower- resolution image by using one or more of: object recognition, bandwidth available, processing power available, power, an acceptable latency, locality information for the image and/or destination viewer, lighting information for the image, and screen resolution.
  • a computer-implemented method comprising :
  • a computer-implemented method comprising:
  • a system comprising:
  • an image enhancement service implemented by one or more electronic devices the image enhancement service including instructions that upon execution cause the image enhancement service to:
  • the image enhancement service is to determine the neural network to perform at least a portion of enhancing the lower- resolution image by using one or more of: object recognition, bandwidth available, processing power available, power, an acceptable latency, locality information for the image and/or destination viewer, lighting information for the image, and screen resolution.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne des techniques permettant d'améliorer une image. Par exemple, une image à résolution inférieure d'un fichier vidéo peut être améliorée à l'aide d'un réseau neuronal appris en appliquant le réseau neuronal appris à l'image à résolution inférieure pour : supprimer des artéfacts en générant, à l'aide d'une couche du réseau neuronal appris, une valeur résiduelle basée sur le sous-ensemble approprié de l'image reçue et au moins une partie d'image correspondante d'une image de résolution inférieure précédente dans le fichier vidéo et au moins une partie d'image correspondante d'une image de résolution inférieure ultérieure dans le fichier vidéo ; mettre à l'échelle l'image à résolution inférieure à l'aide d'un suréchantillonnage bilinéaire ; et combiner l'image reçue à l'échelle supérieure et la valeur résiduelle afin de générer une image améliorée.
PCT/US2020/031233 2019-05-03 2020-05-03 Amélioration vidéo à l'aide d'un réseau neuronal WO2020227179A1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US16/403,386 2019-05-03
US16/403,367 US11210769B2 (en) 2019-05-03 2019-05-03 Video enhancement using a recurrent image date of a neural network
US16/403,355 2019-05-03
US16/403,367 2019-05-03
US16/403,386 US11017506B2 (en) 2019-05-03 2019-05-03 Video enhancement using a generator with filters of generative adversarial network
US16/403,355 US11216917B2 (en) 2019-05-03 2019-05-03 Video enhancement using a neural network

Publications (1)

Publication Number Publication Date
WO2020227179A1 true WO2020227179A1 (fr) 2020-11-12

Family

ID=70802962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/031233 WO2020227179A1 (fr) 2019-05-03 2020-05-03 Amélioration vidéo à l'aide d'un réseau neuronal

Country Status (1)

Country Link
WO (1) WO2020227179A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022094522A1 (fr) * 2020-10-28 2022-05-05 Qualcomm Incorporated Dispositifs et procédé faisant appel à un réseau de neurones pour éliminer des artefacts de codage vidéo dans un flux vidéo
WO2022206244A1 (fr) * 2021-03-29 2022-10-06 International Business Machines Corporation Réduction de la consommation de largeur de bande grâce à des réseaux antagonistes génératifs
WO2023125550A1 (fr) * 2021-12-30 2023-07-06 北京字跳网络技术有限公司 Procédé et appareil de réparation de trame vidéo, et dispositif, support de stockage et produit-programme

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016132152A1 (fr) * 2015-02-19 2016-08-25 Magic Pony Technology Limited Interpolation de données visuelles

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016132152A1 (fr) * 2015-02-19 2016-08-25 Magic Pony Technology Limited Interpolation de données visuelles

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FIELDSEND J E ET AL: "Pareto Evolutionary Neural Networks", IEEE TRANSACTIONS ON NEURAL NETWORKS, IEEE, PISCATAWAY, NJ, US, vol. 16, no. 2, 1 March 2005 (2005-03-01), pages 338 - 354, XP011127551 *
JO YOUNGHYUN ET AL: "Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 3224 - 3232, XP033476291, DOI: 10.1109/CVPR.2018.00340 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022094522A1 (fr) * 2020-10-28 2022-05-05 Qualcomm Incorporated Dispositifs et procédé faisant appel à un réseau de neurones pour éliminer des artefacts de codage vidéo dans un flux vidéo
US11538136B2 (en) 2020-10-28 2022-12-27 Qualcomm Incorporated System and method to process images of a video stream
WO2022206244A1 (fr) * 2021-03-29 2022-10-06 International Business Machines Corporation Réduction de la consommation de largeur de bande grâce à des réseaux antagonistes génératifs
WO2023125550A1 (fr) * 2021-12-30 2023-07-06 北京字跳网络技术有限公司 Procédé et appareil de réparation de trame vidéo, et dispositif, support de stockage et produit-programme

Similar Documents

Publication Publication Date Title
US11017506B2 (en) Video enhancement using a generator with filters of generative adversarial network
US11210769B2 (en) Video enhancement using a recurrent image date of a neural network
US11741582B1 (en) Video enhancement using a neural network
WO2020227179A1 (fr) Amélioration vidéo à l'aide d'un réseau neuronal
US10534965B2 (en) Analysis of video content
US11115697B1 (en) Resolution-based manifest generator for adaptive bitrate video streaming
US11055819B1 (en) DualPath Deep BackProjection Network for super-resolution
US11194995B1 (en) Video composition management system
US9628810B1 (en) Run-length encoded image decompressor for a remote desktop protocol client in a standards-based web browser
US10809983B1 (en) Using an abstract syntax tree for generating names in source code
US11736753B1 (en) Video enhancement service
US10476927B2 (en) System and method for display stream compression for remote desktop protocols
EP3740869B1 (fr) Distribution automatisée de modèles destinés à être exécutés sur un dispositif non périphérique et sur un dispositif périphérique
US11647056B1 (en) Hybrid videoconferencing architecture for telemedicine
US10482887B1 (en) Machine learning model assisted enhancement of audio and/or visual communications
US10949982B1 (en) Moving object recognition, speed estimation, and tagging
US11445168B1 (en) Content-adaptive video sampling for cost-effective quality monitoring
US10412002B1 (en) Processing packet data using an offload engine in a service provider environment
US11317172B1 (en) Video fragment aware audio packaging service
US11019127B1 (en) Adaptive media fragment backfilling
US11689598B1 (en) Synchronized out-of-order live video encoding for reduced latency
US11606339B1 (en) Privacy protecting transaction engine for a cloud provider network
US11218743B1 (en) Linear light scaling service for non-linear light pixel values
US11503341B1 (en) Perceptually motivated video pre-filter
US11971956B1 (en) Secondary distillation for self-supervised video representation learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20727793

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20727793

Country of ref document: EP

Kind code of ref document: A1