GB2554663A - Method of video generation - Google Patents

Method of video generation Download PDF

Info

Publication number
GB2554663A
GB2554663A GB1616687.8A GB201616687A GB2554663A GB 2554663 A GB2554663 A GB 2554663A GB 201616687 A GB201616687 A GB 201616687A GB 2554663 A GB2554663 A GB 2554663A
Authority
GB
United Kingdom
Prior art keywords
video frame
pixels
group
frame data
encoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1616687.8A
Other versions
GB201616687D0 (en
GB2554663B (en
Inventor
Chesnokov Viacheslav
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apical Ltd
Original Assignee
Apical Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apical Ltd filed Critical Apical Ltd
Priority to GB1616687.8A priority Critical patent/GB2554663B/en
Publication of GB201616687D0 publication Critical patent/GB201616687D0/en
Priority to US15/691,360 priority patent/US10462478B2/en
Publication of GB2554663A publication Critical patent/GB2554663A/en
Application granted granted Critical
Publication of GB2554663B publication Critical patent/GB2554663B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/162User input
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Abstract

Method for generating an output video frame comprises receiving 105 and decoding 110 an encoded video to produce decoded video frame 200. First video frame data is retrieved 115 from first group of pixels 205 of the decoded frame and second video frame data is retrieved 120 from second group of pixels 210, different to the first group of pixels, of the decoded frame. An output video frame is generated 125 by combining information derived from said first video frame data and information derived from said second video frame data, wherein said combining comprises, for a given pixel of the output video frame, combining information describing said given pixel derived from said first video frame data and information describing said given pixel derived from said second video frame data. Also claimed is a corresponding encoding method. The first video frame may comprise a representation of pixel values of an input video frame. The second video frame may define processing, such as a gain or an attenuation, which is applied to the first video frame in the combining step and encoding and decoding apparatuses in which this is the case are claimed. The methods may be used in encoding HDR video.

Description

(71) Applicant(s):
Apical Ltd (Incorporated in the United Kingdom)
110 Fulbourn Road, Cambridge, Cambridgeshire, CB1 9NJ, United Kingdom (72) Inventor(s):
Viacheslav Chesnokov (56) Documents Cited:
EP 2579591 A1 WO 2014/025294 A1 US 20100283861 A1 (58) Field of Search:
INT CL H04N
Other: EPODOC and WPI
WO 2015/193114 A1 US 20140112394 A1 (74) Agent and/or Address for Service:
EIP
Fairfax House, 15 Fulwood Place, LONDON, WC1V 6HU, United Kingdom (54) Title of the Invention: Method of video generation
Abstract Title: Video encoding in which an output frame is generated using two groups of pixels packed into a frame of the encoded video (57) Method for generating an output video frame comprises receiving 105 and decoding 110 an encoded video to produce decoded video frame 200. First video frame data is retrieved 115 from first group of pixels 205 of the decoded frame and second video frame data is retrieved 120 from second group of pixels 210, different to the first group of pixels, of the decoded frame. An output video frame is generated 125 by combining information derived from said first video frame data and information derived from said second video frame data, wherein said combining comprises, for a given pixel of the output video frame, combining information describing said given pixel derived from said first video frame data and information describing said given pixel derived from said second video frame data. Also claimed is a corresponding encoding method. The first video frame may comprise a representation of pixel values of an input video frame. The second video frame may define processing, such as a gain or an attenuation, which is applied to the first video frame in the combining step and encoding and decoding apparatuses in which this is the case are claimed. The methods may be used in encoding HDR video.
100
Fig. 1
Figure GB2554663A_D0001
Figure GB2554663A_D0002
100
17
105
110
115
120
125
Figure GB2554663A_D0003
130
Fig. 1
2/7
Figure GB2554663A_D0004
Fig. 2
3/7
300
Figure GB2554663A_D0005
Fig. 3
4/7
400
405
407
410
415
420
Figure GB2554663A_D0006
Fig. 4
5/7
Figure GB2554663A_D0007
505
505
Fig. 5
6/7
Figure GB2554663A_D0008
640
Fig. 6
7/7
Figure GB2554663A_D0009
Fig. 7
METHOD OF VIDEO GENERATION
Technical Field
The present invention relates to methods, apparatus and computer programs for encoding, decoding and displaying a video.
Background
It is frequently desirable to encode a video, for example to compress the video. Encoded videos may be transmitted for decoding at a receiver, for example as streaming video.
Summary
According to a first aspect of the present invention, there is provided a method for generating an output video frame. The method comprises:
receiving an encoded video;
decoding the encoded video whereby to produce a decoded video frame; retrieving first video frame data from a first group of pixels of the decoded video frame;
retrieving second video frame data from a second group of pixels, different to the first group of pixels, of the decoded video frame;
generating an output video frame by combining information derived from said first video frame data and information derived from said second video frame data, wherein said combining comprises:
for a given pixel of the output video frame, combining information describing said given pixel derived from said first video frame data and information describing said given pixel derived from said second video frame data.
In an example, the first video frame data comprises a representation of pixel values of an input video frame. Generating the output video frame may then comprise upscaling the representation to a display resolution.
In some examples, the second video frame data defines processing that may be applied to the first video frame data, whereby to generate the output video frame; and the combining information for said given pixel comprises applying the processing to the information describing said given pixel derived from said first video frame data.
In some such examples, the second video frame data comprises gain information, and the processing comprises application of at least one gain, based on the gain information, to the first video frame data.
The second group of pixels may have an area less than one fifth of a corresponding area of the first group of pixels.
According to a further aspect of the present disclosure, there is presented a method for encoding a video, the method comprising:
receiving information describing an input video frame;
processing the input video frame to generate first video frame data and second video frame data, wherein some information relating to a given pixel of the input video frame is placed in said first video frame data and other information relating to the given pixel of the input video frame is placed in said second video frame data;
storing said first video frame data in a first group of pixels of a video frame to be encoded;
storing said second video frame data in a second group of pixels, different to the first group of pixels, of the video frame to be encoded; and generating an encoded video, the generating comprising encoding the video frame to be encoded.
The first video frame data may comprise a representation of pixel values of the input video frame. In such an example, storing the representation may comprises downscaling the pixel values of the input video frame to a resolution corresponding to an area of the first group of pixels. The downscaling may be a horizontal downscaling, such that the first group of pixels has a width less than the width of the input video frame. In this example, the second group of pixels may be at one side of the first group of pixels.
In some examples, the second video frame data defines image processing applied to the input video frame.
According to a further aspect of the present disclosure, there is provided a nontransitory computer-readable storage medium comprising a set of computer-readable instructions stored thereon which, when executed by at least one processor, cause the at least one processor to perform a method as described above.
According to some aspects of the present disclosure, there is provided an apparatus for displaying streamed video, the apparatus comprising:
a receiver configured to receive a streamed encoded video; a decoder configured to, in real time, decode the encoded video whereby to produce a decoded video frame;
a processor configured to, in real time:
retrieve from a first group of pixels of the decoded video frame a representation of pixel values of an input video frame;
retrieve from a second group of pixels of the decoded video frame, different to the first group of pixels, a representation of processing that may be applied to the representation of pixel values; and generate an output video frame, wherein the generating comprises applying the processing to the representation of pixel values, and an output configured to, in real time, output the output video frame to a display. There is further provided an apparatus comprising:
an input configured to receive an input video frame; and a processor configured to:
store, in a first group of pixels of a video frame to be encoded, first video frame data comprising a representation of pixel values of the input video frame;
store, in a second group of pixels of the video frame to be encoded, different to the first group of pixels, second video frame data defining processing that may be applied to the first video frame data whereby to produce a display video frame; and generate an encoded video, the generating comprising encoding the video frame to be encoded.
The apparatus may comprise a transmitter configured to transmit the encoded video to a receiver.
Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings.
Brief Description of the Drawings
Figure 1 shows a flow diagram of a method for generating a video frame according to an example;
Figures 2 and 3 show schematic representations of decoded video frames;
Figure 4 shows a flow diagram of a method for encoding a video according to an example;
Figure 5 shows a schematic representation of downscaling video frame data;
Figure 6 is a schematic representation of an apparatus for displaying streamed video according to an example; and
Figure 7 shows a schematic representation of an apparatus according to an example.
Detailed Description
Figure 1 shows a flow diagram of a method 100 for generating an output video frame. The method comprises a receiving step 105 in which an encoded video is received. The video may for example be encoded using a known encoding method such as H.264 or H.265 encoding. The method then comprises a step 110 of decoding the encoded video, whereby to produce a decoded video frame. For example, where the video is encoded using H.264 encoding, the decoding step comprises using an H.264 decoder. Such a decoder may for example produce a series of decoded frames.
The method 100 then comprises a first retrieving step 115 and a second retrieving step 120. The first retrieving step 115 comprises retrieving first video frame data from a first group of pixels of the decoded video frame. The second retrieving step 120 comprises retrieving second video frame data from a second group of pixels, different to the first group of pixels, of the decoded video frame.
The method 100 then comprises a generating step 125, comprising generating an output video frame by combining information derived from said first video frame data and information derived from said second video frame data. The combining comprises, for a given pixel of the output video frame, combining information describing said given pixel derived from said first video frame data and information describing said given pixel derived from said second video frame data.
Finally, the method 100 comprises a step 130 of outputting the output video frame.
Figure 2 shows a schematic representation of a decoded video frame 200 according to an example. In this example, the first group of pixels is a first region 205 and the second group of pixels is a second region 210. In other examples, the first and second groups of pixels are interspersed with each other, for example in sub-groups of pixels. The first and second groups of pixels may be positioned alongside each other as shown in Figure 2 or may have other relative orientations as will be described in more detail below.
In some examples, the first video frame data comprises a representation of pixel values of an input video frame, wherein the input video frame is a frame of a video prior to encoding, for example a raw video captured from a camera. For example, the first video frame data may comprise each pixel value of the input video frame, or data allowing each pixel value of the input video frame to be reconstructed such as a lossless compression of the input video frame. Alternatively, the first video frame data may comprise a lossy compression of the input video frame. For example, the first video frame data may comprise a downscaled version of the input video frame. Examples of such downscaling include vertical downscaling, horizontal downscaling, and combined horizontal and vertical downscaling. Such downscaling may for example be performed by cubic interpolation.
In examples wherein the first video frame data comprises a representation of pixel values of the input video frame, generating the output video frame may comprise upscaling the representation to a display resolution, for example the standard high definition resolution of 1920 x 1080. In some such examples wherein the representation is a downscaled representation of the input video frame, the display resolution is equal to the resolution of the input video frame.
In further such examples, metadata is associated with the encoded video. The metadata may identify the first group of pixels as a region of interest. The output video frame may then be generated based on the first video frame data, as identified in the metadata as a region of interest, and not on the second video frame data. This may for example allow an output video frame to be produced by a device that is not configured to process the second video frame data. For example, where the second video frame data defines processing that may be applied to the first video frame data as described below, a device that is not configured to perform such processing may generate the output video frame based on the first video frame data without performing the processing.
In some examples, the second video frame data defines processing that may be applied to the first video frame data, whereby to generate the output video frame. Generating the output video frame may then comprise applying the processing to the first video frame data. In particular, the combining information for a given pixel may comprise applying the processing to the information describing the given pixel derived from the first video frame data. For example, the second video frame data may comprise gain information and the processing may comprise application of at least one gain, based on the gain information, to the first video frame data. In some examples, a compression artefact removal algorithm is applied to the first video frame data prior to applying the processing defined by the second video frame data.
Figure 3 shows a schematic representation of a decoded frame 300 according to one such example. The frame 300 comprises a first group of pixels 310 in a first region and a second group of pixels 315 in a second region. The first group of pixels 310 comprises sub-regions including sub-regions 310a, 310b and 310c. Each of these subregions comprises one or more pixels of the frame 300. The second group of pixels 315 comprises sub-regions including sub-regions 315a, 315b and 315c. Each of these subregions comprises one or more pixels of the frame 300.
The sub-regions of the second group of pixels 315 form a gain map indicating gains to be applied to corresponding sub-regions of the first group of pixels 310. For example, sub-region 315a of the second group 315 indicates a gain to apply to subregion 310a of the first group 310. Likewise, sub-regions 315b and 315c indicate gains to apply to sub-regions 310b and 310c, respectively.
In some examples, the input video frame is a high dynamic range frame. Generation of the encoded video may comprise locally tone mapping the input frame. The locally tone mapped input frame is thus a representation of the high dynamic range input frame in a format with a more limited dynamic range. For example, the high dynamic range input frame may be a 10-bit frame, and the locally tone mapped frame derived from this may be an 8-bit frame. The gains may be determined during the tone mapping as gains to apply during generation of the output video frame to approximate the appearance of the original high dynamic range frame. In some examples, the gains may comprise attenuation values. These may be represented as gain values less than 1. In such examples, a gain map as described above may alternatively be termed an attenuation map. For example, the tone mapped input frame may be a full brightness tone mapped image and the gain information may describe brightness attenuation values to apply to the full brightness tone mapped image. In such examples, the receiver typically applies the gain derived from a given sub-region 315a, 315b, 315c of the second group of pixels 315 to each pixel of the corresponding sub-region 310a, 310b, 310c of the first group of pixels 310.
In some such examples, it is desirable to adjust the strength of the tone mapping when generating the output video frame. This has the effect of changing the perceived brightness of the output video frame. For example, a user may select a desired strength. Alternatively or additionally, the strength may be selected automatically, for example in response to a detection of ambient light conditions or to reduce power usage. As another example, the strength may be selected to increase perceived brightness to compensate a reduction in actual brightness, for example when operating in a power saving mode. The desired strength may be implemented by application of an effective gain based on the gain value determined from the second group of pixels 315 and the desired strength. For example, the effective gain may be determined by alpha blending of determined gain and strength, i.e.:
A* = strength*A + (1- strength), where A* is the effective gain, A is the gain determined from the second group of pixels 315, and the strength is variable from 0 to 1. The values from 0 to 1 may for example be expressed as an 8-bit integer.
Gains as described above may be determined, during local tone mapping, for each pixel of a locally tone mapped frame. In many video frames, the local gains determined during tone-mapping vary slowly across the frame. This may be exploited by assigning the second group of pixels 315 a smaller area than the first group of pixels
310, for example as shown in Figure 3. The second group 315 may thus be stored as a downscaled representation of the determined gains. As a consequence of the aforementioned slow variation of the gains across the frame, a relatively significant degree of downscaling can be used without significant detriment to perceived quality of the output video frame. For example, the second group of pixels may have an area less than one fifth of a corresponding area of the first group of pixels. In one advantageous example, the second group of pixels has an area around one eightieth of the area of the first group of pixels. Where the first and second groups are contiguous regions, as shown in Figures 2 and 3, this may be achieved by implementing a 9:1 ratio between the dimensions of the first region and the corresponding dimensions of the second region, leading to an 81:1 ratio of their respective areas.
The method 100 comprising decoding of the encoded video and generation of the output video frame may be performed in real time. For example, a video may be stored in an encoded format in a memory, and output video frames generated in real time and displayed as a user watches a video. In some examples, the video comprises streaming video, for example received from an online streaming service. Output video frames may then be generated from the received streamed video as described above.
Figure 4 shows a flow diagram of a method 400 for encoding a video according to an example. The method 400 comprises a receiving step 405 wherein information describing an input video frame is received. For example, the information may be received from a camera or from a memory or other storage. In some examples, for example as described above, the input video frame is a high dynamic range frame and the method comprises locally tone mapping the input video frame. In such examples, the method may comprise determining gain information associated with the locally tone mapped frame, for example for generating a display frame as described above. In other examples, the input video frame may be received as a locally tone mapped version of an original high dynamic range video frame. Associated gain information may be received with the locally tone mapped frame.
The method further comprises a processing step 407. The processing step comprises processing the input video frame to generate first video frame data and second video frame data, wherein some information relating to a given pixel of the input video frame is placed in said first video frame data and other information relating to the given pixel of the input video frame is placed in said second video frame data.
The method 400 then comprises storage steps 410 and 415. Storage step 410 comprises storing the first video frame data in a first group of pixels of a video frame to be encoded. For example, the first video frame data may comprise a representation of pixel values of the input frame as described above. Storage step 415 comprises storing the second video frame data in a second group of pixels, different to the first group of pixels, of the video frame to be encoded. The second video frame data may for example comprise gain information as described above. The first and second groups of pixels may comprise contiguous regions, for example as described above in relation to Figures 2 and 3. Alternatively, the first and second groups of pixels may be noncontiguous. For example, they may comprise interlaced regions of the video frame to be encoded.
The method 400 may comprise applying gamma correction, which may also be termed global tone mapping. In one such example, the method 400 comprises applying gamma correction to the input video frame. In other such examples, the method 400 comprises applying gamma correction to the first group of pixels of the video frame to be encoded. This has the effect of boosting detail in bright regions while losing some detail in dark regions, which may result in an improvement in perceived video quality.
The method 400 then comprises a step 420 of generating an encoded video, the generating comprising encoding the video frame to be encoded. For example, the encoding may be H.264 encoding as noted above.
In some examples wherein the second video frame data comprises local gain information, the gain information comprises an inverse of each gain value. As an illustrative example, a bright region could have a low associated gain of 1, and a dark region could have a high associated gain of 16. Storing the inverse of each of these values typically reduces the severity of floating point errors for low gain values during the encoding step. A consequence of this is a decrease in compression artefacts in bright regions and a corresponding increase in compression artefacts in dark regions. As dark regions typically have a greater incidence of noise than bright regions, this may increase perceived image quality.
The second video frame data, for example local gain information, may be stored as brightness values of the pixels of the second group. Many standard encoders, such as the H.264 encoder mentioned above, store brightness as a luma value. Frequently, luma values are stored at full resolution whereas chroma values, comprising colour information, are stored at a reduced resolution. The present method may thus allow accurate reconstruction of the second video frame data following decoding of the encoded video.
In some examples wherein the first video frame data comprises a representation of pixel values of the input video frame, storing the representation comprises downscaling the pixel values of the input video frame to a resolution corresponding to an area of the first group of pixels. Figure 5 shows a schematic representation of such a downscaling, wherein pixels of an input video frame 500 are stored as a downscaled representation in a first group of pixels 505 of a video frame to be encoded 510. The downscaling may for example be performed by cubic interpolation. Although Figure 5 shows a horizontal downscaling, other downscaling schemes are envisaged. For example, the downscaling may be a vertical downscaling, or a combined horizontal and vertical downscaling. Horizontal downscaling typically has less impact on perceived video quality than vertical or combined downscaling. The second video frame data is then stored in a second group of pixels 515 of the video frame to be encoded 510. Where the downscaling is a horizontal downscaling or a combined horizontal and vertical downscaling, the second group of pixels 515 may be stored at one side of the first group of pixels 505, as shown in Figure 5. Analogously, where the downscaling is a vertical downscaling or a combined horizontal and vertical downscaling, the second group of pixels 515 may be stored above and/or below the first group of pixels 505.
In some examples, as shown in Figure 5, the downscaling and storing is such that the video frame to be encoded 510 has the same resolution as the input video frame 500. This may for example be a standard resolution, such as 1920 x 1080. This allows the encoded video to have similar properties, such as file size, as an encoded video produced directly from input video frames such as the frame 500. An advantage of such an example is that the encoded video may be transmitted to a receiver using infrastructure that is already optimised for transmission of video of that resolution. As such, an existing system may be updated to implement the present method by way of a software update, with no change of encoder/decoder firmware or hardware, or video transmission infrastructure. Furthermore, as the second video frame data is stored in the frame instead of, for example, as metadata, there is no increase in required metadata storage space and/or transmission bandwidth.
In other examples, the video frame to be encoded 510 has a larger resolution than the input video frame 500. This allows the representation of pixel values to be downscaled to a lesser degree, or to not be downscaled. A consequence of this is that, where the input video frame has a standard resolution, the video frame to be encoded will generally have a non-standard resolution. A further consequence is that higher storage, bandwidth and processing power will be required to store, transmit and encode/decode the video.
In further examples, the second group of pixels in the video frame to be encoded may correspond to an otherwise unused group of pixels in the input video frame. For example, a widescreen video may be transmitted as a “letterboxed” video at a nonwidescreen resolution, with each frame comprising blank bands above and/or below an image-containing portion. The second group of pixels may thus comprise a group of pixels in one or both of these blank bands. This allows the video frame to be encoded to have the same resolution as the input video frame, whilst requiring no downscaling of the image-containing pixel values of the input video frame.
As described above in relation to the method of Figure 1, the second video frame data may define image processing to be applied to a decoded video frame whereby to generate an output video frame, such as gain information. Alternatively or additionally, the second video frame data may define image processing applied to the input video frame. For example, the second video frame data may define areas of the input video frame to which image processing techniques, such as line detection or sharpening algorithms, have been applied.
Figure 6 is a schematic representation of an apparatus 600 for displaying streamed video in accordance with aspects of the present disclosure. The apparatus 600 comprises a receiver 605 configured to receive a streamed encoded video 607. The video may for example be received via a wired and/or wireless network from a streaming server.
The apparatus 600 comprises a decoder 610 configured to, in real time, decode the encoded video 607 whereby to produce a decoded video frame, for example as described above in relation to Figure 1.
The apparatus 600 further comprises a processor 615. The processor 615 could for example be a central processing unit or a graphics processing unit.
The processor 615 is configured to, in real time, retrieve 620 from a first group of pixels of the decoded video frame a representation of pixel values of an input video frame. The representation may be a downscaled representation as described above. The processor 620 is further configured to, in real time, retrieve 625 from a second group of pixels of the decoded video frame, different to the first group of pixels, a representation of processing that may be applied to the representation of pixel values. In some examples, the representation comprises gain values that may be applied to the pixel values as described above.
The processor 630 is then configured to, in real time, generate an output video frame, wherein the generating comprises applying the processing to the representation of pixel values.
The apparatus 600 further comprises an output 635 configured to, in real time, output the output video frame 640 to a display.
The receiver 605, decoder 610, processor 615 may be elements of a computer. The display may be an element of the same computer, or an external display. In some examples, the apparatus 600 is a mobile device such as a mobile telephone or tablet. In such examples, the display may be an integrated display of the mobile device.
Figure 7 shows a schematic representation of an apparatus 700 according to an example. The apparatus 700 comprises an input 705 configured to receive an input video frame 710.
The apparatus 700 further comprises a processor 715. The processor is configured to store 720, in a first group of pixels of a video frame to be encoded, first video frame data comprising a representation of pixel values of the input video frame 705. The processor 700 is further configured to store 725, in a second group of pixels of the video frame to be encoded, different to the first group of pixels, second video frame data defining processing that may be applied to the first video frame data whereby to produce a display video frame. For example, the second video frame data may comprise gain information as described above.
The processor 700 is then configured to generate 730 an encoded video, the generating comprising encoding the video frame to be encoded.
In some examples, the apparatus 700 further comprises a transmitter configured to transmit the encoded video to a receiver. The receiver may for example be the receiver 605 described above in relation to Figure 6. In other examples, the apparatus 700 is configured to transmit the encoded video to a server, for example a streaming server. The server may then be configured to transmit the encoded video to a receiver, for example as streamed video.
Methods of the present disclosure may be implemented by way of a nontransitory computer-readable storage medium comprising a set of computer-readable instructions stored thereon which, when executed by at least one processor, cause the at least one processor to perform a method according to the present disclosure. The computer readable instructions may be retrieved from a machine-readable media, e.g. any media that can contain, store, or maintain programs and data for use by or in connection with an instruction execution system. In this case, machine-readable media can comprise any one of many physical media such as, for example, electronic, magnetic, optical, electromagnetic, or semiconductor media. More specific examples of suitable machine-readable media include, but are not limited to, a hard drive, a random access memory (RAM), a read only memory (ROM), an erasable programmable read-only memory, or a portable disc.
The above embodiments are to be understood as illustrative examples of the invention. Further embodiments of the invention are envisaged. For example, it is described above that gain values corresponding to tone mapping of an input video frame may be stored. Other parameters describing the tone mapping could be stored instead of or as well as the gain values, for example coefficients expressing the shape of local tone curves.
Alternatively or additionally to gain information, the second video frame data described above may comprise a representation, such as a two-dimensional map, of regions of interest in the input video frame. These regions may then be highlighted in the output video frame. For example, regions of interest may be highlighted in response to a user input requesting such highlighting. It may for example be desirable to highlight human faces in a video from a security camera. As a further example, the second video frame data may comprise depth information indicating depths for reconstruction by a 3D display.
It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.

Claims (15)

1. A method for generating an output video frame, the method comprising: receiving an encoded video;
decoding the encoded video whereby to produce a decoded video frame; retrieving first video frame data from a first group of pixels of the decoded video frame;
retrieving second video frame data from a second group of pixels, different to the first group of pixels, of the decoded video frame;
generating an output video frame by combining information derived from said first video frame data and information derived from said second video frame data, wherein said combining comprises:
for a given pixel of the output video frame, combining information describing said given pixel derived from said first video frame data and information describing said given pixel derived from said second video frame data.
2. A method according to claim 1, wherein the first video frame data comprises a representation of pixel values of an input video frame.
3. A method according to claim 2, wherein generating the output video frame comprises upscaling the representation to a display resolution.
4. A method according to any preceding claim, wherein:
the second video frame data defines processing that may be applied to the first video frame data, whereby to generate the output video frame; and the combining information for said given pixel comprises applying the processing to the information describing said given pixel derived from said first video frame data.
5. A method according to claim 4, wherein:
the second video frame data comprises gain information; and the processing comprises application of at least one gain, based on the gain information, to the first video frame data.
6. A method according to any preceding claim, wherein the second group of pixels has an area less than one fifth of a corresponding area of the first group of pixels.
7. A method for encoding a video, the method comprising: receiving information describing an input video frame;
processing the input video frame to generate first video frame data and second video frame data, wherein some information relating to a given pixel of the input video frame is placed in said first video frame data and other information relating to the given pixel of the input video frame is placed in said second video frame data;
storing said first video frame data in a first group of pixels of a video frame to be encoded;
storing said second video frame data in a second group of pixels, different to the first group of pixels, of the video frame to be encoded; and generating an encoded video, the generating comprising encoding the video frame to be encoded.
8. A method according to claim 7, wherein the first video frame data comprises a representation of pixel values of the input video frame.
9. A method according to claim 8, wherein storing the representation comprises downscaling the pixel values of the input video frame to a resolution corresponding to an area of the first group of pixels.
10. A method according to claim 9, wherein:
the downscaling is a horizontal downscaling, such that the first group of pixels has a width less than the width of the input video frame; and the second group of pixels is at one side of the first group of pixels.
11. A method according to any of claims 7 to 10, wherein the second video frame data defines image processing applied to the input video frame.
12. A non-transitory computer-readable storage medium comprising a set of computer-readable instructions stored thereon which, when executed by at least one processor, cause the at least one processor to perform a method according to any preceding claim.
13. An apparatus for displaying streamed video, the apparatus comprising: a receiver configured to receive a streamed encoded video;
a decoder configured to, in real time, decode the encoded video whereby to produce a decoded video frame;
a processor configured to, in real time:
retrieve from a first group of pixels of the decoded video frame a representation of pixel values of an input video frame;
retrieve from a second group of pixels of the decoded video frame, different to the first group of pixels, a representation of processing that may be applied to the representation of pixel values; and generate an output video frame, wherein the generating comprises applying the processing to the representation of pixel values, and an output configured to, in real time, output the output video frame to a display.
14. An apparatus comprising:
an input configured to receive an input video frame; and a processor configured to:
store, in a first group of pixels of a video frame to be encoded, first video frame data comprising a representation of pixel values of the input video frame;
store, in a second group of pixels of the video frame to be encoded, different to the first group of pixels, second video frame data defining processing that may be applied to the first video frame data whereby to produce a display video frame; and generate an encoded video, the generating comprising encoding the video frame to be encoded.
15. An apparatus according to claim 14, wherein the apparatus comprises a 5 transmitter configured to transmit the encoded video to a receiver.
Intellectual
Property
Office
Application No: GB1616687.8 Examiner: Mr George Mathews
GB1616687.8A 2016-09-30 2016-09-30 Method of video generation Active GB2554663B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB1616687.8A GB2554663B (en) 2016-09-30 2016-09-30 Method of video generation
US15/691,360 US10462478B2 (en) 2016-09-30 2017-08-30 Method of video generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1616687.8A GB2554663B (en) 2016-09-30 2016-09-30 Method of video generation

Publications (3)

Publication Number Publication Date
GB201616687D0 GB201616687D0 (en) 2016-11-16
GB2554663A true GB2554663A (en) 2018-04-11
GB2554663B GB2554663B (en) 2022-02-23

Family

ID=57570945

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1616687.8A Active GB2554663B (en) 2016-09-30 2016-09-30 Method of video generation

Country Status (2)

Country Link
US (1) US10462478B2 (en)
GB (1) GB2554663B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11259040B1 (en) * 2019-04-25 2022-02-22 Amazon Technologies, Inc. Adaptive multi-pass risk-based video encoding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100283861A1 (en) * 2009-05-07 2010-11-11 Canon Kabushiki Kaisha Image processing apparatus and image processing method
EP2579591A1 (en) * 2011-10-04 2013-04-10 Thomson Licensing Method of and device for encoding an HDR image, method of and device for reconstructing an HDR image and non-transitory storage medium
WO2014025294A1 (en) * 2012-08-08 2014-02-13 Telefonaktiebolaget L M Ericsson (Publ) Processing of texture and depth images
US20140112394A1 (en) * 2012-10-22 2014-04-24 Microsoft Corporation Band separation filtering / inverse filtering for frame packing / unpacking higher-resolution chroma sampling formats
WO2015193114A1 (en) * 2014-06-20 2015-12-23 Thomson Licensing Method and device for signaling in a bitstream a picture/video format of an ldr picture and a picture/video format of a decoded hdr picture obtained from said ldr picture and an illumination picture

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02248161A (en) * 1989-03-20 1990-10-03 Fujitsu Ltd Data transmission system
US7760943B2 (en) * 2003-10-02 2010-07-20 Hewlett-Packard Development Company, L.P. Method to speed-up Retinex-type algorithms
US8947465B2 (en) * 2004-12-02 2015-02-03 Sharp Laboratories Of America, Inc. Methods and systems for display-mode-dependent brightness preservation
US8687087B2 (en) * 2006-08-29 2014-04-01 Csr Technology Inc. Digital camera with selectively increased dynamic range by control of parameters during image acquisition
JP4961982B2 (en) * 2006-12-07 2012-06-27 ソニー株式会社 Solid-state imaging device, driving method of solid-state imaging device, and imaging device
EP2198618A2 (en) * 2007-10-08 2010-06-23 Nxp B.V. Video decoding
US7777804B2 (en) * 2007-10-26 2010-08-17 Omnivision Technologies, Inc. High dynamic range sensor with reduced line memory for color interpolation
CN102388612B (en) * 2009-03-13 2014-10-08 杜比实验室特许公司 Layered compression of high dynamic range, visual dynamic range, and wide color gamut video
JP2011010108A (en) * 2009-06-26 2011-01-13 Seiko Epson Corp Imaging control apparatus, imaging apparatus, and imaging control method
JP5342969B2 (en) * 2009-09-10 2013-11-13 富士フイルム株式会社 Imaging apparatus and imaging method
US8447132B1 (en) * 2009-12-09 2013-05-21 CSR Technology, Inc. Dynamic range correction based on image content
US9230312B2 (en) * 2010-01-27 2016-01-05 Adobe Systems Incorporated Methods and apparatus for performing tone mapping on high dynamic range images
JP5762002B2 (en) * 2011-01-06 2015-08-12 キヤノン株式会社 Imaging device
US11089247B2 (en) * 2012-05-31 2021-08-10 Apple Inc. Systems and method for reducing fixed pattern noise in image data
US10255888B2 (en) * 2012-12-05 2019-04-09 Texas Instruments Incorporated Merging multiple exposures to generate a high dynamic range image
KR102039464B1 (en) * 2013-05-21 2019-11-01 삼성전자주식회사 Electronic sensor and control method of the same
CN105324683B (en) * 2013-06-27 2018-10-16 万睿视影像有限公司 X-ray imaging device with the cmos sensor being embedded in TFT tablets
JP2015115922A (en) * 2013-12-16 2015-06-22 オリンパス株式会社 Imaging apparatus and imaging method
JP2015133633A (en) * 2014-01-14 2015-07-23 株式会社東芝 Solid state imaging device
US10298854B2 (en) * 2014-05-30 2019-05-21 Apple Inc. High dynamic range video capture control for video transmission
JP6562250B2 (en) * 2015-06-08 2019-08-21 パナソニックIpマネジメント株式会社 Imaging device and imaging module
US20160381274A1 (en) * 2015-06-25 2016-12-29 Novatek Microelectronics Corp. Image Sensing Module

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100283861A1 (en) * 2009-05-07 2010-11-11 Canon Kabushiki Kaisha Image processing apparatus and image processing method
EP2579591A1 (en) * 2011-10-04 2013-04-10 Thomson Licensing Method of and device for encoding an HDR image, method of and device for reconstructing an HDR image and non-transitory storage medium
WO2014025294A1 (en) * 2012-08-08 2014-02-13 Telefonaktiebolaget L M Ericsson (Publ) Processing of texture and depth images
US20140112394A1 (en) * 2012-10-22 2014-04-24 Microsoft Corporation Band separation filtering / inverse filtering for frame packing / unpacking higher-resolution chroma sampling formats
WO2015193114A1 (en) * 2014-06-20 2015-12-23 Thomson Licensing Method and device for signaling in a bitstream a picture/video format of an ldr picture and a picture/video format of a decoded hdr picture obtained from said ldr picture and an illumination picture

Also Published As

Publication number Publication date
GB201616687D0 (en) 2016-11-16
US20180098084A1 (en) 2018-04-05
US10462478B2 (en) 2019-10-29
GB2554663B (en) 2022-02-23

Similar Documents

Publication Publication Date Title
JP6609056B2 (en) System for reconstruction and encoding of high dynamic range and wide color gamut sequences
CN109068139B (en) Method, apparatus and computer-readable storage medium for in-loop reshaping
CN107409213B (en) Content adaptive perceptual quantizer for high dynamic range images
EP2748792B1 (en) Image processing for hdr images
KR101368120B1 (en) Method and system for weighted encoding
CN107771392B (en) Real-time content adaptive perceptual quantizer for high dynamic range images
US9911179B2 (en) Image decontouring in high dynamic range video processing
JP2018525905A (en) System for encoding high dynamic range and wide color gamut sequences
US8897581B2 (en) Guided post-prediction filtering in layered VDR coding
EP3069513B1 (en) Pre-dithering in high dynamic range video coding
US9979973B2 (en) Encoding and decoding methods for adapting the average luminance of high dynamic range pictures and corresponding encoder and decoder
JP7043164B2 (en) Methods and Devices for Encoding Both High Dynamic Range Frames and Impose Low Dynamic Range Frames
Debattista et al. Optimal exposure compression for high dynamic range content
CN110770787B (en) Efficient end-to-end single-layer reverse display management coding
US20180048892A1 (en) High dynamic range color conversion correction
US10462478B2 (en) Method of video generation
EP3054417A1 (en) Method and apparatus for encoding and decoding high dynamic range images
Mantiuk et al. Perception-inspired high dynamic range video coding and compression
WO2019071045A1 (en) High dynamic range color conversion correction
WO2019071046A1 (en) High dynamic range color conversion correction

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)

Free format text: REGISTERED BETWEEN 20220929 AND 20221005