CN113393374A

CN113393374A - Video processing circuit and method for performing SR operation

Info

Publication number: CN113393374A
Application number: CN202110247153.6A
Authority: CN
Inventors: 任正隆; 丛培贵; 王耀笙; 陈志玮; 古志文; 曾宇晟; 石铭恩; 罗国强
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2020-03-11
Filing date: 2021-03-05
Publication date: 2021-09-14

Abstract

The video processing circuit comprises an input buffer, an online adaptation circuit and an Artificial Intelligence (AI) super-resolution (SR) circuit. The input buffer receives input Low Resolution (LR) frames and High Resolution (HR) frames from a video source over a network. The online adaptation circuit forms a training pair and uses the training pair to calculate an update of a representative feature having the characteristics of the incoming LR frame. Each training pair consists of one of the input LR frames and one of the HR frames. The AI SR circuit receives the incoming LR frame from the input buffer and the representative feature from the on-line adaptation circuit. While calculating the update to the representative feature, the AI SR circuit generates an SR frame for display from the input LR frame based on the representative feature. Each SR frame has a higher resolution than a respective one of the input LR frames.

Description

Video processing circuit and method for performing SR operation

Technical Field

Embodiments of the present invention relate generally to video playback technology and, more particularly, relate to a video processing circuit and method for performing super-resolution (SR) operations.

Background

Current image display devices are capable of streaming video over a network and enhancing streaming media content (streaming content) prior to displaying the content. Some devices are capable of performing super-resolution (SR) operations on streaming media content using image enhancement techniques. The super-resolution operation refers to up-sampling (up-scaling) a Low Resolution (LR) image to a high resolution image, for example, from an input image of (720x480) pixels to an output image of (3840x2160) pixels. Conventional image adjusting techniques (image resizing techniques) based on up/down-sampling (up/down-sampling) may reduce image quality (image quality) with respect to blur (amount of blur), noise (noise), distortion (distortion), color condition (color condition), sharpness (sharpness), contrast (contrast), and the like.

Typical edge devices, such as televisions or smart phones, have limited computing power due to stringent requirements for power consumption and thermal performance. Thus, image enhancement operations on edge devices are typically based on algorithms and parameters that are pre-configured by the manufacturer of the device. This results in limited flexibility in adjusting the preconfiguration after the consumer uses the device. When the input image contains various contents and qualities, the limited flexibility may adversely affect the output image quality. Therefore, there is a need to improve image enhancement operations to minimize the upsampling effect on the output image quality.

Disclosure of Invention

It is therefore an object of the present invention to provide a video processing circuit and a method for performing a Super Resolution (SR) operation with image enhancement capability.

In one embodiment, a video processing circuit is provided that includes an input buffer, an online adaptation circuit, and an Artificial Intelligence (AI) Super Resolution (SR) circuit. The input buffer receives input Low Resolution (LR) frames and High Resolution (HR) frames from a video source over a network. The on-line adaptation circuit forms a training pair and uses the training pair to compute an update of a representative feature, wherein the representative feature is characteristic of an incoming Low Resolution (LR) frame. Each training pair consists of one of the input Low Resolution (LR) frames and one of the HR frames. An AI super-resolution (SR) circuit receives an incoming Low Resolution (LR) frame from an input buffer and representative features from an online adaptation circuit. While calculating the update to the representative feature, an AI super-resolution (SR) circuit generates an SR frame for display from an input Low Resolution (LR) frame based on the representative feature. Each SR frame has a higher resolution than a corresponding one of the input Low Resolution (LR) frames.

In one embodiment, the representative characteristics include one or more of: scene type, degradation level, and color status.

In one embodiment, the representative features include information for the AI SR circuit to update characteristics of an AI model used to generate the SR frame.

In one embodiment, the incoming HR frame is received over the network less frequently than the incoming LR frame.

In an embodiment, one or more of the training pairs includes HR and LR frames having different content and different resolutions.

In one embodiment, the in-line adaptation circuit is operable to: the representative features are identified using a Convolutional Neural Network (CNN).

In one embodiment, the AI SR circuit is operable to: the SR frame is generated using a Convolutional Neural Network (CNN).

In one embodiment, the in-line adaptation circuit is operable to: periodically receiving the input HR frame; and pairing each input HR frame with a plurality of LR frames of the input LR frame to form a plurality of training pairs.

In one embodiment, the in-line adaptation circuit is operable to: one or more of the incoming HR frames are received when a predetermined event is detected, wherein the predetermined event comprises a scene change or the available network bandwidth exceeding a threshold.

In one embodiment, upon detection of a predetermined event, the AI SR circuit is operable to: updating an AI model used to generate the SR frame, wherein the predetermined event comprises one of: unstable network bandwidth, scene changes, fixed time periods and duration of each frame.

In another embodiment, a method for performing an SR operation is provided. Input Low Resolution (LR) and HR frames are received from a video source over a network. Training pairs are formed, wherein each training pair consists of one of the input Low Resolution (LR) frames and one of the HR frames. An update of a representative feature having characteristics of a Low Resolution (LR) frame of the input is computed using the training pair. While calculating the updates to the representative features, SR frames for display are generated from the input Low Resolution (LR) frames based on the representative features. Each SR frame has a higher resolution than a corresponding one of the input Low Resolution (LR) frames.

In one embodiment, the representative features include information for updating characteristics of an AI model used to generate the SR frame.

In one embodiment, the representative features are identified using a Convolutional Neural Network (CNN).

In one embodiment, the SR frame is generated using a Convolutional Neural Network (CNN).

In one embodiment, forming the training pair further comprises: periodically receiving the input HR frame; and pairing each input HR frame with a plurality of LR frames of the input LR frame to form a plurality of training pairs.

In an embodiment, forming the training pair further comprises: one or more of the incoming HR frames are received when a predetermined event is detected, wherein the predetermined event comprises a scene change or the available network bandwidth exceeding a threshold.

In an embodiment, the method further comprises: updating an AI model used to generate the SR frame upon detection of a predetermined event, wherein the predetermined event comprises one of: unstable network bandwidth, scene changes, fixed time periods and duration of each frame.

Other aspects and features will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures. This summary is not intended to be limiting of the invention. The invention is defined by the claims.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this disclosure. The drawings illustrate the implementation of the embodiments of the present disclosure and together with the description serve to explain the principles of the embodiments of the disclosure. It is to be understood that the figures are not necessarily to scale, since some features may be shown out of proportion to actual implementation dimensions, in order to clearly illustrate the concepts of the embodiments of the disclosure.

Fig. 1 is a block diagram illustrating a video processing circuit performing an SR operation according to an embodiment of the present invention.

Fig. 2 is a block diagram illustrating a video processing circuit performing an SR operation according to another embodiment of the present invention.

Fig. 3A and 3B illustrate an apparatus (device) for forming and using an online training pair according to an embodiment of the present invention.

FIG. 4 is a flow diagram illustrating a method for performing SR operations in accordance with an embodiment of the present invention.

FIG. 5 illustrates an example of an apparatus operable to perform SR operations according to an embodiment of the present invention.

In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details, and that different embodiments may be combined as desired, and should not be limited to the embodiments set forth in the accompanying drawings.

Detailed Description

The following description is of the preferred embodiments of the present invention, which are provided for illustration of the technical features of the present invention and are not intended to limit the scope of the present invention. Certain terms are used throughout the description and claims to refer to particular elements, it being understood by those skilled in the art that manufacturers may refer to a like element by different names. Therefore, the present specification and claims do not intend to distinguish between components that differ in name but not function. The terms "component," "system," and "apparatus" used herein may be an entity associated with a computer, wherein the computer may be hardware, software, or a combination of hardware and software. In the following description and claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to …". Furthermore, the term "coupled" means either an indirect or direct electrical connection. Thus, if one device is coupled to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.

Wherein corresponding numerals and symbols in the various figures of the drawing generally refer to corresponding parts unless otherwise indicated. The accompanying drawings, which are drawn to clearly illustrate the relevant portions of the embodiments, are not necessarily drawn to scale.

The term "substantially" or "approximately" as used herein means within an acceptable range that a person skilled in the art can solve the technical problem to substantially achieve the technical effect to be achieved. For example, "substantially equal" refers to a manner that is acceptable to the skilled artisan with some error from "substantially equal" without affecting the correctness of the results.

Embodiments of the present invention provide a video processing circuit (video processing circuit) that performs a super-resolution (SR) operation on input frames (input frames) based on online training (online training). The video processing circuit includes a Super Resolution (SR) engine that operates on Low Resolution (LR) input frames using an Artificial Intelligence (AI) model and generates output frames (referred to as SR frames) having a desired resolution (or target resolution). The SR engine (also known as an AI Super Resolution (SR) circuit) may reduce or eliminate artifacts (artifacts) in the (remove) input Low Resolution (LR) frames. Each AI model is defined by features (characteristics) such as parameters, structures, and operators. These features may be updated during run-time based on online training pairs (online training pairs), where each training pair includes a low-resolution (LR) frame and a high-resolution (HR) frame. In some embodiments, the present invention provides an online AI super resolution engine and associated operations, and in other embodiments, the present invention relates to an edge device (edge device) with image enhancement capability to perform super resolution operations.

Conventional super-resolution (SR) circuits typically rely only on pre-trained parameters, or offline training parameters, which are typically pre-stored in the device. The number of pre-trained parameters is limited due to the limited storage capacity. Therefore, the conventional super-resolution (SR) circuit cannot correctly process different types of input images having different image qualities.

The video processing circuit described herein also includes an online adaptation circuit (online adaptation circuit) that identifies representative features (or, critical features) from the online training pair and provides the identified representative features to an AI super-resolution (SR) circuit. Via the same communication network, the video processing circuit may receive the online training pair and the input Low Resolution (LR) frame in parallel from the same video source. In one embodiment, the online training pair is received less frequently than the incoming Low Resolution (LR) frame to reduce network bandwidth usage. That is, incoming Low Resolution (LR) frames are received more frequently than online training pairs.

Further, the Low Resolution (LR) frames in the online training pair may be a subset of the input Low Resolution (LR) frames; thus, the on-line training pair typically has more relevant content (content) and quality information (quality information) to the input Low Resolution (LR) frame than the pre-training parameters. When the content and/or quality of the input frames changes, the content and/or quality of the online training pairs also changes accordingly. Thus, the representative features extracted from the online training pairs can provide hints (hit) for how to enhance the input Low Resolution (LR) frames. For edge devices with limited processing resources and storage capacity (e.g., smart TVs, smartphones, IoT devices, etc.), the approach described herein can provide great flexibility from real-time training with minimal computational overhead.

As used herein, the terms "LR" and "HR" are relative; that is, for the same display size (e.g., N square inches), a Low Resolution (LR) frame has fewer pixels than a High Resolution (HR) frame. For example, for the same display size, a Low Resolution (LR) frame may have (720x480) pixels and a High Resolution (HR) frame may have (3840x2160) pixels. It should be understood that the Low Resolution (LR) frame and the High Resolution (HR) frame may have any number of pixels for the same display size, as long as the pixels of the Low Resolution (LR) frame are less than the pixels of the High Resolution (HR) frame. The resolution of the super-resolution (SR) frame is higher than that of the low-resolution (LR) frame, and may be equal to or lower than that of the high-resolution (HR) frame.

Fig. 1 is a block diagram illustrating a video processing circuit 100 according to an embodiment of the present invention. The video processing circuit 100 may be part of a device (also referred to as an edge device). Examples of the apparatus may include: a television, a smartphone, a computing device, a network connection device, a gaming device, an entertainment device, an Internet of things (IoT) device, or any device capable of processing and displaying images and/or video.

In one embodiment, the video processing circuit 100 includes, among other components, an AI Super Resolution (SR) circuit 120 coupled to an online adaptation circuit (online adaptation circuit) 140. The AI Super Resolution (SR) circuit 120 is also coupled to the input port (input port)110 to receive the low resolution input frame. (i.e., input Low Resolution (LR) frames). The AI super-resolution (SR) circuit 120 may perform an SR operation on an input Low Resolution (LR) frame according to one or more AI models. One example of an AI model is an artificial neural network, such as a Convolutional Neural Network (CNN) or other machine learning or deep learning network. Examples of the SR operation performed by the AI super-resolution (SR) circuit 120 include, but are not limited to: CNN operation, machine learning operation or deep learning operation. For each input Low Resolution (LR) frame, the AI super-resolution (SR) circuit 120 generates a higher resolution (high resolution) output frame, referred to as a super-resolution (SR) frame. The AI super-resolution (SR) circuit 120 outputs an SR frame to an output port (output port) 130. The SR frame is sent to a display for viewing by a user.

In one embodiment, the online adaptation circuit 140 is coupled to another input port 115 to receive an online training pair. Each online training pair includes a Low Resolution (LR) frame and a corresponding High Resolution (HR) frame. The Low Resolution (LR) frame included in the online training pair may be one of the input Low Resolution (LR) frames processed by the AI super-resolution (SR) circuit 120 for generating an SR frame. The respective HR frame included in the online training pair may be a High Resolution (HR) frame or a plurality of close High Resolution (HR) frames, e.g., if the high resolution frame HR1 and the high resolution frame HR2 correspond to the same scene type, respectively, then they may be considered close. In an embodiment, the Low Resolution (LR) frames in the plurality of online training pairs may be a subset of the input Low Resolution (LR) frames. In an embodiment, the High Resolution (HR) frames in the plurality of online training pairs may be a subset (subset) of the input High Resolution (HR) frames, e.g., in the example of fig. 3A, if HR1 and HR2 are close, HR1 may also be used to assist in online training of the LR frames in the online training pair to which HR2 corresponds. In an embodiment, the AI super-resolution (SR) circuit 120 and the inline adaptation circuit 140 may receive their respective Low Resolution (LR) frames in parallel.

The online adaptation circuit 140 performs online training using an online training pair. In an embodiment, the online adaptation circuit 140 may identify (e.g., detect or extract) representative features from the online training pair and provide the representative features to the AI Super Resolution (SR) circuit 120 to improve the performance of SR operations. The online adaptation circuit 140 performs online training and, in parallel, the AI super-resolution (SR) circuit performs SR operations. The online adaptation circuit 140 may perform non-AI calculations to detect some representative features and/or AI operations (e.g., CNN operations, machine learning operations, or deep learning operations) to detect some representative features.

In an embodiment, the representative feature may indicate (index) a feature of an input Low Resolution (LR) frame (characteristics), wherein the feature may include, but is not limited to: scene type (scene type), degradation type (degradation type), degradation level (degradation level), color condition (color condition), and other indications of image content and/or quality. For example, the scene type may include a natural scene (natural scene), a calculator-generated (CG) scene, and the like; the degradation type and level may include image noise type and level, video compression parameters, blurriness, texture distortion (texture), edge/contour jaggy (edge jaggy), etc.; the color status may include color saturation (color saturation), contrast (contrast), sharpness (sharpness), and the like. The representative features may include global features (global features) or local features (local features) of the frame or sequence of frames, and/or high-level features (e.g., scene type) or low-level features (e.g., noise level) of the frame or sequence of frames.

Additionally or alternatively, the representative features may be updated to indicate updated parameters and/or structures of an AI model used by the AI Super Resolution (SR) circuit 120 to generate the SR frame. The AI Super Resolution (SR) circuit 120 may update internal layers and/or output characteristics (characteristics) of the neural network based on the representative features. For example, the update may be applied to structures and/or parameters of feature maps (features maps), activation layers (activation layers), filter kernels (filter kernels), and the like. The updating is performed periodically (periodically), e.g. every frame (per frame) or every fixed period (at a fixed period). Alternatively, the update may be performed when a predetermined condition (predetermined condition) is detected. Examples of the predetermined condition may include, but are not limited to: scene changes, unstable internet (internet) bandwidth, etc.

Accordingly, the AI super-resolution (SR) circuit 120 may reduce or eliminate artifacts in the input Low Resolution (LR) frame using the identified representative features that characterize (characterize) the features in the input Low Resolution (LR) frame. Further, the AI super-resolution (SR) circuit 120 may reduce or remove artifacts in the input low-resolution (LR) frame using one or more AI models that are updated online based on representative features obtained from the online training pair.

Fig. 2 is a block diagram illustrating a video processing circuit 200 according to another embodiment of the present invention. The video processing circuit 200 includes an AI Super Resolution (SR) circuit 120 and an online adaptation circuit 140 in fig. 1. In this embodiment, the output of the AI Super Resolution (SR) circuit 120 is coupled to an image quality (PQ) engine 250 to further enhance the quality of the Super Resolution (SR) frame produced by the AI Super Resolution (SR) circuit 120. In an embodiment, the image quality (PQ) engine 250 performs image enhancement operations including, but not limited to: focus peaking, sharpness enhancement, saturation tone mapping, and the like. The output of the image quality (PQ) engine 250 is coupled to the output port 130, which provides the enhanced SR frame to the display for viewing by the user.

Fig. 3A is a block diagram illustrating an apparatus 310 including a video processing circuit 300 according to an embodiment of the invention. Video processing circuit 300 may be an example of video processing circuit 100 in fig. 1 or video processing circuit 200 in fig. 2. The video processing circuit 300 is coupled to an input buffer (input buffer)320 and a display (display) 330. Device 310 receives input video from video server 350 over a communication network 340. Video server 350 provides video streaming services for videos of multiple resolutions that are available for selection by device 310 (e.g., a user of device 310). In the example of fig. 3A, the device 310 selects a video stream having two resolutions, a Low Resolution (LR) frame and a High Resolution (HR) frame. The device 310 may configure the streaming operation such that High Resolution (HR) frames are received less frequently than Low Resolution (LR) frames, that is, Low Resolution (LR) frames are received more frequently. In an embodiment, the apparatus 310 may receive the HR frame periodically, for example, periodically with a predetermined fixed time period. Alternatively, the apparatus 310 may receive an HR frame when a particular trigger event (such as a scene change) is detected, when the available network bandwidth exceeds a threshold or another trigger event. The device 310 receives one or more HR frames at a fixed period or upon detection of the aforementioned event. Alternatively, the apparatus 310 may receive a predetermined number of consecutive HR frames at a fixed period or upon detection of the aforementioned event.

In the example of fig. 3A, two HR frames (HR1 and HR2) and four Low Resolution (LR) frames (LR1, LR2, LR3 and LR4) are received within the same time period. HR1 and LR1 may have the same content, but different resolutions. The term "content" in this context refers to scenes, image objects and backgrounds, etc. HR1 and LR2 may have similar or different content (e.g., HR1 contains images of cats and LR2 contains images of houses) and have different resolutions. Likewise, HR1 and LR3 may have similar or different content and different resolutions. In this example, HR1 may be paired with three Low Resolution (LR) frames to form three training pairs. E.g., (HR1, LR1), (HR1, LR2), and (HR1, LR3), regardless of the degree of similarity or difference in content between the High Resolution (HR) and Low Resolution (LR) frames in each training pair. The same applies to HR2, which may be paired with one or more Low Resolution (LR) frames to form one or more training pairs, e.g., (HR2, LR 4). In one embodiment, if the high resolution frames HR2 are close to HR1 (where HR1 is close to HR2 generally means that the types of scenes involved are the same, e.g., have the same living room background, but the shooting angles may be different), then the high resolution frames HR1 may also be used to assist in online training of LR frames associated with HR2, as shown in fig. 3B, e.g., forming an online training pair (HR1, LR4) with the high resolution frames HR1 that were previously close to them. That is, if the previous HR frame is close to the current HR frame, the previous HR frame may be further used to assist in the online training of the current LR frame. Since High Resolution (HR) frames contain information of higher quality and more detail than Low Resolution (LR) frames, the training pair can be trained efficiently, regardless of the corresponding content in the HR and Low Resolution (LR) frames that make up the training pair.

The training pairs include a corresponding pair of High Resolution (HR) frames and Low Resolution (LR) frames. The training pair disclosed in this specification is an "online training pair" in that the training is performed concurrently with SR generation, wherein the training uses an input Low Resolution (LR) frame and a corresponding HR frame (e.g., one of a plurality of HR frames), and the SR generation produces an SR frame that is identical in content to the input LR frame but of higher resolution. More specifically, when the AI super-resolution (SR) circuit 120 processes an input Low Resolution (LR) frame (e.g., LR1-LR4) for display, at the same time, the online adaptation circuit 140 performs training (e.g., recognizing or updating representative features) using the same input Low Resolution (LR) frame and the HR frame paired therewith. During video stream processing, the online adaptation circuit 140 may continuously calculate updates to the representative features and output the updated representative features to the AI super-resolution (SR) circuit 120. In some embodiments, the representative features computed by the online adaptation circuit 140 from the input frames (e.g., LR1-LR4 and HR1-HR2) are received by the AI super-resolution (SR) circuit 120 after LR1-LR4 is processed into an SR frame. That is, the AI super-resolution (SR) circuit 120 may generate SR frames from LR1-LR4 based on representative features calculated from those input frames that precede LR1-LR4 and HR1-HR 2.

In one embodiment, the device 310 uses the input buffer 320 to buffer frames received from the network 340. The input buffer 320 may buffer received Low Resolution (LR) frames and High Resolution (HR) frames. The corresponding Low Resolution (LR) and High Resolution (HR) frames form an online training pair that is sent from the input buffer 320 to the online adaptation circuit 140. The input buffer 320 also sends Low Resolution (LR) frames (including those in the online training pair) as input LR frames to the AI Super Resolution (SR) circuit 120. In this example, the Low Resolution (LR) frames in the plurality of online training pairs are a subset of the input Low Resolution (LR) frames.

Since the apparatus 310 obtains the online training pair and the input Low Resolution (LR) frame from the same video source via the same communication network path, the representative features identified from the online training pair provide a strong indication (strong indication) of the features in the input Low Resolution (LR) frame, as well as the structure/parameters of the AI model that are best suited for SR operations.

In some embodiments, video processing Circuit 300 may be an AI processor, a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), or another general or special purpose processing Circuit. In one embodiment, the AI processor may be configured to perform CNN calculations to detect representative features and process the current frame. In an embodiment, video processing circuit 300 may be implemented as a system-on-a-chip (SoC). In some embodiments, video processing circuit 300 may be implemented in more than one chip in the same electronic device.

In one embodiment, the AI Super Resolution (SR) circuit 120 includes a CNN accelerator (accelerator) to perform a CNN operation on an input Low Resolution (LR) frame. The CNN accelerator comprises hardware components special for accelerating neural network operation through convolution operation, full-connection operation, activation, pooling, normalization, element-by-element mathematical calculation and the like. In some embodiments, a CNN accelerator includes a plurality of computational units and Memory (e.g., Static Random Access Memory (SRAM)), where each computational unit further includes multiplier and adder circuitry to perform mathematical operations, such as multiply-and-accumulate (MAC) operations to accelerate convolution, activation, pooling, normalization, and other neural network operations.

In an embodiment, the AI Super Resolution (SR) circuit 120 performs CNN operations according to a CNN model (which is an example of an AI model). In addition, the CNN operation includes convolution of the input feature map with the kernel filter. For example, an input feature map from a previous layer of a CNN operation may be convolved with a kernel filter to generate an output feature map to a next layer. The characteristics of the AI model (e.g., the neural network's hierarchy and the parameters of the kernel filter) may be updated by the results of the online training generated by the online adaptation circuit 120.

FIG. 4 is a flow diagram illustrating a method 400 for performing SR operations in accordance with an embodiment of the present invention. For example, method 400 may be performed by any of the embodiments described in conjunction with fig. 1, fig. 2, fig. 3A, 3B, and fig. 5. It should be understood that this embodiment is merely illustrative and that other devices or circuits having video processing capabilities may perform method 400.

The method 400 begins at step 410 when the video processing circuit receives input Low Resolution (LR) frames and HR frames from a video source over a network. At step 420, the video processing circuit forms training pairs, each training pair formed from one of the input Low Resolution (LR) frames and one of the HR frames. At step 430, the video processing circuitry uses the training pairs to compute updates of representative features that characterize the input Low Resolution (LR) frames. While calculating the updates to the representative features, the video processing circuit generates an SR frame for display from the input Low Resolution (LR) frame based on the representative features at step 440. Each SR frame has a higher resolution than a respective one of the input Low Resolution (LR) frames.

In one embodiment, the representative characteristics include one or more of: scene type, degradation level, and color status. In an embodiment, the representative features include information for an AI super-resolution (SR) circuit to update characteristics of an AI model used to generate the SR frame.

In an embodiment, HR frames are received less frequently than input Low Resolution (LR) frames. In an embodiment, one or more training pairs include HR frames and Low Resolution (LR) frames, which may have different content in addition to having different resolutions.

In an embodiment, a video processing circuit includes an AI Super Resolution (SR) circuit and an online adaptation circuit. The in-line adaptation circuit is used to identify the representative features using the CNN. An AI Super Resolution (SR) circuit is used to generate SR frames using CNNs. The online adaptation circuit may periodically receive HR frames and may pair each HR frame with a plurality of Low Resolution (LR) frames of an input LR frame to form a plurality of training pairs. The HR frame may be received by the online adaptation circuit when a predetermined triggering event is detected, including a scene change or the available network bandwidth exceeding a threshold. When a predetermined trigger event is detected, an AI super-resolution (SR) circuit may update an AI model used to generate the SR frame. The event comprises one of: unstable network bandwidth, scene changes, fixed time periods, and per frame time.

Fig. 5 illustrates an example of an apparatus 500 according to an embodiment of the invention. One example of an apparatus 500 is a television that receives a low resolution (e.g., 720x480 pixels) video and performs an SR operation to scale the video to a higher resolution (e.g., 3840x2160 pixels) for display on a television screen. For example, the apparatus 500 receives an input Low Resolution (LR) frame and an input High Resolution (HR) frame from the network entity 570 via the network 580. Alternatively, apparatus 500 may be a smartphone, computing device, network connected device, gaming device, entertainment device, internet of things (IoT) device, or any device capable of processing and displaying images and/or video.

The apparatus 500 includes processing hardware (processing hardware)510, which may comprise any of the video processing circuits 100, 200 and 300 of fig. 1, 2 and 3A, 3B, respectively. In an embodiment, the processing hardware 510 may include one or more processors, such as one or more of the following: a Central Processing Unit (CPU), a GPU, a digital processing unit (DSP), an AI processor, a multimedia processor, other general and/or special purpose processing circuits. In an embodiment, the processing hardware 510 may include a hardware accelerator, such as a CNN accelerator. In an embodiment, in the above embodiments, the processing hardware 510 includes the AI Super Resolution (SR) circuit 120 and the online adaptation circuit 140.

The apparatus 500 also includes memory and buffers 520 coupled to the processing hardware 510. In one embodiment, memory and buffer 520 may comprise input buffer 320 of FIG. 3A. Memory and cache 520 may include storage devices such as Dynamic Random Access Memory (DRAM), SRAM, flash memory, and other non-transitory machine-readable storage media; such as volatile or non-volatile memory devices. The memory and buffer 520 may further include storage devices, such as any type of solid state or magnetic storage device. In some embodiments, the memory and cache 520 may store instructions that, when executed by the processing hardware 510, cause the processing hardware 510 to perform the operations previously described for generating SR frames, such as the method 400 of fig. 4.

The apparatus 500 may also include a display panel 530 to display information such as images, video, messages, web pages, games, text, and other types of text, image, and video data. The apparatus 500 may also include audio hardware 540, such as a microphone and speaker, for receiving and producing sound.

In some embodiments, the apparatus 500 may also include a network interface 550 to connect to a wired and/or wireless network for sending and/or receiving voice, digital data, and/or media signals. It will be appreciated that the embodiment of fig. 5 is simplified for illustrative purposes. In another embodiment, other additional hardware components may be included.

The operation of the flowchart of fig. 4 has been described with reference to the exemplary embodiments of fig. 1, 2, 3A, 3B, and 5. However, it should be understood that the operations of the flowchart of fig. 4 may be performed by embodiments of the present invention other than the embodiments of fig. 1, 2, 3A, 3B, and 5, and that the embodiments of fig. 1, 2, 3A, 3B, and 5 may perform operations different from those discussed with reference to the flowcharts. While the flow diagram of fig. 4 illustrates a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).

While the invention has been described by way of example and in terms of preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art), e.g., combinations or substitutions of different features in different embodiments. The scope of the appended claims should, therefore, be accorded the broadest interpretation so as to encompass all such modifications and similar structures.

Claims

1. The video processing circuit is characterized by comprising an input buffer, an online adaptation circuit and an artificial intelligence AI super-resolution SR circuit;

wherein the input buffer is configured to receive an input low resolution LR frame and an input high resolution HR frame from a video source over a network;

the online adaptation circuit is used for: forming training pairs, each training pair consisting of one of the input LR frames and one of the input HR frames; and, calculating an update of a representative feature characterizing the input LR frame using the training pair; and the number of the first and second groups,

the AI SR circuit is used for: receiving the incoming LR frame from the input buffer and the representative feature from the on-line adaptation circuit; and, while calculating the update of the representative feature, generating SR frames for display from the input LR frames based on the representative feature, wherein each SR frame has a higher resolution than a respective one of the input LR frames.

2. The video processing circuit of claim 1, wherein the representative features include one or more of: scene type, degradation level, and color status.

3. The video processing circuit of claim 1, wherein the representative feature includes information for the AI SR circuit to update a characteristic of an AI model used to generate the SR frame.

4. The video processing circuit of claim 1, wherein the incoming HR frames are received at a lower frequency over the network than the incoming LR frames.

5. The video processing circuit of claim 1 wherein one or more of the training pairs includes HR frames and LR frames having different content and different resolutions.

6. The video processing circuit of claim 1, wherein the online adaptation circuit is operable to:

the representative features are identified using a convolutional neural network CNN.

7. The video processing circuit of claim 1, wherein the AI SR circuit is operable to:

the SR frame is generated using a convolutional neural network CNN.

8. The video processing circuit of claim 1, wherein the online adaptation circuit is operable to:

periodically receiving the input HR frame; and the number of the first and second groups,

each input HR frame is paired with a plurality of LR frames of the input LR frame to form a plurality of training pairs.

9. The video processing circuit of claim 1, wherein the online adaptation circuit is operable to:

one or more of the incoming HR frames are received when a predetermined event is detected, wherein the predetermined event comprises a scene change or the available network bandwidth exceeding a threshold.

10. The video processing circuit of claim 1, wherein upon detection of a predetermined event, the AI SR circuit is operable to: updating an AI model used to generate the SR frame, wherein the predetermined event comprises one of: unstable network bandwidth, scene changes, fixed time periods and duration of each frame.

11. A method for performing a super resolution SR operation, comprising:

receiving an input low resolution LR frame and an input high resolution HR frame from a video source over a network;

forming training pairs, each training pair consisting of one of the input LR frames and one of the input HR frames;

calculating an update of a representative feature characterizing the input LR frame using the training pair; and the number of the first and second groups,

while calculating the update of the representative feature, SR frames for display are generated from the input LR frames based on the representative feature, wherein each SR frame has a higher resolution than a respective one of the input LR frames.

12. The method of claim 11, wherein the representative characteristics include one or more of: scene type, degradation level, and color status.

13. The method of claim 11, wherein the representative characteristics include information for updating characteristics of AI models used to generate the SR frame.

14. The method of claim 11, wherein the incoming HR frame is received through the network less frequently than the incoming LR frame.

15. The method of claim 11, wherein one or more of the training pairs includes HR frames and LR frames having different content and different resolutions.

16. The method of claim 11, wherein the representative features are identified using Convolutional Neural Network (CNN).

17. The method of claim 11, wherein the SR frame is generated using a Convolutional Neural Network (CNN).

18. The method of claim 11, wherein forming the training pair further comprises:

19. The method of claim 11, wherein forming the training pair further comprises:

20. The method of claim 11, further comprising:

updating an AI model used to generate the SR frame upon detection of a predetermined event, wherein the predetermined event comprises one of: unstable network bandwidth, scene changes, fixed time periods and duration of each frame.