WO2017121549A1

WO2017121549A1 - Frequency based prediction

Info

Publication number: WO2017121549A1
Application number: PCT/EP2016/080236
Authority: WO
Inventors: Kenneth Andersson; Per Hermansson
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2016-01-12
Filing date: 2016-12-08
Publication date: 2017-07-20
Also published as: AR107351A1

Abstract

The present invention relates to decoder and encoder and methods thereof relating to encoding and decoding of a group of image elements in a frame of an encoded video sequence. The method comprising: providing a first prediction of the group of image elements according to a first prediction mode; providing a second prediction of the group of image elements according to a second prediction mode; modifying the second prediction responsive to frequency content of the second prediction to generate a second modified prediction; and generating a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction.

Description

FREQUENCY BASED PREDICTION

BACKGROUND

Technical field

[0001] The solution presented herein generally relates to video data management, and more particularly to methods of encoding and decoding of video sequences. The solution also relates to encoder, decoder and computer program products.

Background

[0002] Video compression is about reducing and removing redundant information from the video data. Typically information from neighboring image elements or pixels, both within a picture but also from previous coded pictures, are used to make a prediction of the video.

Because the compression process is lossy, i.e., you lose information about the video sequence, the reconstructed video will always differ from the original video in some way. A major goal of any video codec standard is to provide tools to hide or minimize those distortions while still maintaining a high compression ratio to get the size of the video file as small as possible.

[0003] Pixel or image element prediction is an important part of video coding standards such as H.261 , H.263, MPEG-4 and H.264 (ITU-T Rec. H.264 and ISO/IEC 14496-10,

"Advanced Video Coding," 2003). In H.264 there are three pixel prediction methods utilized, namely intra, inter and bi-prediction. Intra-prediction provides a spatial prediction of the current block from previously decoded pixels of a current frame. Inter-prediction gives a temporal prediction of the current block using a corresponding but displaced block in a previously decoded frame.

[0004] In state of the art video codecs intra-prediction is an important method for creating a prediction of image elements for the current block. Since the intra-coding tends to transport most of the signal energy in the video bit stream, any improvements on the prediction and coding methods is important for the reduction of the bits needed when compressing a video sequence.

[0005] Intra-prediction uses reference image elements neighboring the current block to predict blocks within the same frame. The order in which the blocks are encoded is from the upper left corner and then row-wise through the whole frame. Therefore, already encoded image elements in the frame will be to the upper left of the next block. Intra-prediction takes this into consideration when using the image elements to the left and above the block to predict image elements within the block. In the latest standard, HEVC, the intra-prediction consists of three steps: reference image element array construction, image element prediction, and postprocessing. Intra-prediction can be classified into two categories: Angular prediction methods and DC/planar prediction methods. The first category is illustrated in figure 1 and is supposed to model structures with directional edges. The numbers correspond to a respective intra prediction mode for a direction. E.g. mode 10 is horizontal prediction and mode 26 is vertical prediction. The second category estimates smooth image content.

[0006] The idea of reusing blocks within the same frame to remove redundant data has also later been proven efficient for screen content coding. Intra-Block Copy (IntraBC) is a method in state of the art video codecs where a block in an image is predicted as a displacement from already reconstructed blocks in the same image. It removes redundancy from repeating patterns, which typically occur in text and graphic regions, and therefore IntraBC is today mostly used for compressing screen content and computer graphics. The cost of encoding time increases compared to intra-prediction because of the search involved in intra-block matching. The most similar block in a specified search area next to the current block is found by comparing the blocks with some metric, where the calculation of sum of squared error or difference (SSD) is often included in the metric. This method is similar to the inter-prediction method in HEVC, where blocks from other reference frames are reused to predict blocks in the current frame, the major difference being that in IntraBC the referenced blocks comes from within the same frame as the current block.

[0007] More specifically, in intra-prediction, image elements neighboring the current block are used to create a prediction of the current block according to the intra-prediction mode. In intra-block copy prediction, reference image elements positioned relative to the current block by a block vector are copied to create a prediction of the current block. In inter-prediction, reference image elements, positioned relative to the current block by a motion vector, from previously decoded pictures are copied directly or an interpolated version is used to predict the current block. Inter-prediction also allows bi-prediction of two independent reference blocks using two independent motion vectors, the reference blocks, potentially interpolated, are then combined. In bi-prediction, half of the prediction (weight equal to 0.5) is taken from one reference block and the other half is taken from another reference block. The intra- and inter- predictions can be re-generated on the decoder side because the intra-prediction mode and the displacement vector are typically included with the coded bit stream.

[0008] In the current state of the art video codecs template matching is a technique for the encoder to be able to reference a block of previous coded samples without having to signal a displacement vector for indicating the position. For this to work a template area of image elements neighboring the current block is selected by both the encoder and decoder using predetermined information, which could, e.g., be signaled in a slice header or Picture Parameter Set (PPS). A search area of a size that has also been pre-determined, e.g., from a slice header or from a PPS or defined during a decoding process in a codec specification, is searched. For each location in the search area, an error metric is computed between the image elements at the search location and the image elements in the template area. The location that resulted in the lowest error metric is then selected as the final location, and the image elements at the location will then be used for creating a prediction of the current block. This process is performed by both the encoder and decoder to ensure that the same image elements are used for the prediction.

[0009] In template matching, both the encoder and the decoder determine from which reference image elements the current block shall be predicted. Template matching is used to find previously coded blocks that are similar to the current one by finding locations where the neighboring image elements are similar to the neighboring image elements of the current block. Image elements from the found location can then be used without having to send a

displacement vector to indicate the position of the reference block.

[0010] Multiple reference pictures may be used for inter-prediction with a reference picture index to indicate which of the multiple reference pictures is used. In the P-type of inter encoding, only single directional prediction is used, and the allowable reference pictures are managed in list 0. However, in the B-type of inter encoding, two lists of reference pictures are managed, list 0 and list 1. In such B-type pictures, single directional prediction using either list 0 or list 1 is allowed, or bi-predictions using an average of a reference picture from list 0 and another reference picture from list 1 may be used.

[001 1] The weighted prediction in H.264 represents a weight for respectively bi-directional prediction and also a DC offsets for the weighted combination in the slice header. The general formula for using weighting factors in inter-prediction is:

P = ((w₀P₀ + w_lP_l)n Shift) + DC , (1 ) where P₀ and w₀ respectively represent the list 0 initial predictor and weighting factor, and where P_l and w_l respectively represent the list 1 initial predictor and weighting factor. DC represents an offset that is defined per frame basis, Shift represent a shifting factor, and

□ Shift represents right shift by Shift . In the case of bi-directional prediction w₀ = w_l = 0.5.

[0012] PCT publication WO 2004/064255, titled "Mixed Inter/lntra Video Coding of

Macroblock Partitions" and filed 6 January 2004 suggests a hybrid intra-inter bi-predictive coding mode that allows both intra and inter frame predictions to be combined together for hybrid-encoding a macroblock. In this hybrid coding, an average of selected intra and inter- predictions or a differently weighted combination of the intra and inter-predictions is used. The hybrid coding suggested in WO 2004/064255 basically uses a summing of the two input intra and inter-predictions or uses slice-specific weights. Thus, the same weight is applied to all pixels in all macroblocks of a slice that is used as inter and/or intra-prediction. Such an approach is far from optimal from an image quality point of view.

[0013] Further, intra-prediction can only predict simple structures in the original block as only one row and one column of an image elements are used from the neighboring blocks. Thus, intra-prediction provides useful low frequency information. It is not possible to represent more complex structures and high frequency information, however, using the intra-prediction modes (current angular directions, planar, and dc predictions) in state of the art video codecs. Template matching and intra-block copy can retain more structure and higher frequency information but will often lead to large discontinuous at the border between the current block and neighboring blocks.

[0014] In addition, bi-prediction works best when there is a luminance change that spans several pictures. Bi-prediction can calculate the average luminance of the current picture by calculating a weighted average of image elements from a picture before the current picture and image elements from a picture after the current picture. Picture adaptive weighting with offset can be used to compensate for a luminance change in one reference picture such that it is better aligned with the luminance of the current frame. However the problem with that is that it cannot selectively weight a prediction differently for different frequency ranges. So, for example, low frequencies, which mostly are affected by luminance change, cannot be compensated specifically without affecting high frequencies.

[0015] For at least these reasons, alternate solutions are desired to improve the encoding and decoding of video sequences. SUMMARY

[0016] The solution presented herein addresses these problems by combining predictions of groups of image elements, where at least one prediction is modified responsive to its frequency content. By combining the two predictions in this manner, the solution presented herein improves the prediction of more complex structures in the original group while minimizing artificial edges introduced at the group boundaries. Further, the solution presented herein allows for compensation of issues with decoding/encoding imaging elements having low/high frequency components without affecting the image elements with high/low frequency

components.

[0017] One embodiment provides a method of decoding a group of image elements in a frame of an encoded video sequence. The method comprises providing a first prediction of the group of image elements according to a first prediction mode, and providing a second prediction of the group of image elements according to a second prediction mode. The method further comprises modifying the second prediction responsive to frequency content of the second prediction to generate a second modified prediction, generating a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction.

[0018] Another exemplary embodiment provides a decoder comprising a first prediction circuit, a second prediction circuit, a modification circuit, and a decoding circuit. The first prediction circuit is configured to provide a first prediction of the group of image elements according to a first prediction mode. The second prediction circuit is configured to provide a second prediction of the group of image elements according to a second prediction mode. The modification circuit is configured to modify the second prediction responsive to frequency content of the second prediction to generate a second modified prediction. The decoding circuit is configured to generate a decoded version of the group of image elements using a

combination of the first prediction and the second modified prediction. [0019] Another exemplary embodiment provides a computer program product stored in a non-transitory computer readable medium for controlling a decoder. The computer program product comprises software instructions which, when run on the decoder, causes the decoder to provide a first prediction of the group of image elements according to a first prediction mode, and provide a second prediction of the group of image elements according to a second prediction mode. The software instructions further cause the decoder to modify the second prediction responsive to frequency content of the second prediction to generate a second modified prediction, and generate a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction.

[0020] Another exemplary embodiment provides a decoder apparatus comprising a first prediction module, a second prediction module, a modification module, and a decoding module. The first prediction module is configured to provide a first prediction of the group of image elements according to a first prediction mode. The second prediction module is configured to provide a second prediction of the group of image elements according to a second prediction mode. The modification module is configured to modify the second prediction responsive to frequency content of the second prediction to generate a second modified prediction. The decoding module is configured to generate a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction.

[0021] Another exemplary embodiment provides a method of encoding a group of image elements in a frame of a video sequence. The method comprises estimating a first prediction of a first group of image elements according to a first prediction mode, and estimating a second prediction for each of a plurality of second group of image elements. The method further comprises modifying each of the second predictions responsive to frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, where each of the second group of image elements corresponds to a second prediction mode. The method further comprises determining a plurality of candidate predictions, each candidate prediction comprising a combination of the first prediction and one of the plurality of second modified predictions, and selecting the candidate prediction having a better performance parameter than the other candidate predictions. The method further comprises encoding the first group of image elements as an identifier of the location of the first group of image elements or of the first prediction mode, and encoding the second group of image elements as an identifier of the second prediction mode or an identifier of the location of the second group of image elements corresponding to the selected candidate prediction.

[0022] Another exemplary embodiment provides an encoder comprising a first prediction circuit, a second prediction circuit, a modification circuit, and an evaluation circuit. The first prediction circuit is configured to estimate a first prediction of a first group of image elements according to a first prediction mode. The second prediction circuit is configured to estimate a second prediction for each of a plurality of second group of image elements. The modification circuit is configured to modify each of the second predictions responsive to frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, where each of the second group of image elements corresponds to a second prediction mode. The evaluation circuit is configured to determine a plurality of candidate predictions, each candidate prediction comprising a combination of the first prediction and one of the plurality of second modified predictions, and select the candidate prediction having a better performance parameter than the other candidate predictions. The evaluation circuit is also configured to encode the first group of image elements as an identifier of the location of the first group of image elements or the first prediction mode, and to encode the second group of image elements as an identifier of the second prediction mode or an identifier of the location of the second group of image elements corresponding to the selected candidate prediction.

[0023] Another exemplary embodiment provides a computer program product stored in a non-transitory computer readable medium for controlling an encoder. The computer program product comprises software instructions which, when run on the encoder, causes the encoder to estimate a first prediction of a first group of image elements according to a first prediction mode, and estimate a second prediction for each of a plurality of second group of image elements. The software instructions further cause the encoder to modify each of the second predictions responsive to frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, where each of the second group of image elements corresponds to a second prediction mode. The software instructions further cause the encoder to determine a plurality of candidate predictions, each candidate prediction comprising a combination of the first prediction and one of the plurality of second modified predictions, and select the candidate prediction having a better performance parameter than the other candidate predictions. The software instructions further cause the encoder to encode the first group of image elements as an identifier of the location of the first group of image elements or the first prediction mode, and encode the second group of image elements as an identifier of the second prediction mode or an identifier of the location of the second group of image elements corresponding to the selected candidate prediction.

[0024] Another exemplary embodiment provides an encoder apparatus comprising a first prediction module, a second prediction module, a modification module, and an evaluation module. The first prediction module is configured to estimate a first prediction of a first group of image elements according to a first prediction mode. The second prediction module is configured to estimate a second prediction for each of a plurality of second group of image elements. The modification module is configured to modify each of the second predictions responsive to frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, where each of the second group of image elements corresponds to a second prediction mode. The evaluation module is configured to determine a plurality of candidate predictions, each candidate prediction comprising a combination of the first prediction and one of the plurality of second modified predictions, and select the candidate prediction having a better performance parameter than the other candidate predictions. The evaluation module is further configured to encode the first group of image elements as an identifier of the location of the first group of image elements or the first prediction mode, and encode the second group of image elements as an identifier of the second prediction mode or an identifier of the location of the second group of image elements corresponding to the selected candidate prediction.

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] Figure 1 shows an example of intra-prediction.

[0026] Figure 2 shows a decoding method according to one exemplary embodiment.

[0027] Figure 3 shows a block diagram of a decoder according to one exemplary embodiment.

[0028] Figure 4 shows a block diagram of a decoding apparatus according to one exemplary embodiment.

[0029] Figure 5 shows an encoding method according to one exemplary embodiment.

[0030] Figure 6 shows a block diagram of an encoder according to one exemplary embodiment.

[0031] Figure 7 shows a block diagram of an encoding apparatus according to one exemplary embodiment.

[0032] Figure 8 shows a device according to one exemplary embodiment.

[0033] Figure 9 shows an example of inter-prediction using bi-prediction.

[0034] Figure 10 shows an example of the solution presented herein.

[0035] Figure 1 1 shows an example of directional weighting according to one exemplary embodiment.

[0036] Figure 12 shows an example of an embodiment that relies on edge metric comparisons.

DETAILED DESCRIPTION [0037] In the solution presented herein, a video sequence or bit stream comprises one or multiple, e.g., at least two, frames or pictures. Each frame is composed of a series of one or more slices, where such a slice includes one or more macroblocks of image elements. In the following discussions, the expression "image element" is used to denote the smallest element of a frame or picture in a video sequence. Such an image element has associated image element properties, e.g., color (in the red, green, blue, RGB, space) and/or luminance (Y) and chrominance (Cr, Cb or sometimes denoted U, V). A typical example of an image element is a pixel or a sample of a frame or picture.

[0038] The image elements are organized into groups of image elements. The expression "group of image elements" denotes any of the prior art known partitions of frames and slices into collections of image elements that are handled together during decoding and encoding. It will be appreciated that the expression "group of image elements" may be used interchangeably with "block." Generally, such a group is a rectangular (MxN) or square (MxM) group of image elements. In the art such a grouping is generally denoted as a macroblock in the video compression standard. An exemplary macroblock has a size of 16x16 image elements, and may comprise multiple so-called sub-macroblock partitions, e.g., partitions having 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4 image elements. The 8x8 sub-macroblock partition is often denoted as a sub-macroblock or sub-block. In HEVC, the correspondence term to macroblock is Coding Tree Unit (CTU) and the size is 64x64. A CTU may be divided by a quadtree, which leaves are called Coding Units (CUs). The size of the CUs may be 64x64, 32x32, 16x16, 8x8, or 4x4. A CU may further be divided into Prediction Units (PUs) and Transform Units (TUs), where a PU corresponds to a prediction block and a TU corresponds to a residual block. In the following discussions, the expression "group of image elements" is used to denote any such macroblock, sub-macroblock or partition size, PU, or any other grouping of image elements used in video compression unless explicitly notified otherwise. A typical example could be that the luminance component of a PL⁾ comprises 64x64 image elements, and that the associated chrominance components are spatially sub-sampled by a factor of two in the horizontal and vertical directions to form 32x32 blocks.

[0039] The following describes decoding and encoding operations in terms of combinations of first and second predictions, where the second prediction is modified responsive to its frequency content. In some embodiments, the modification comprises filtering the second prediction. In other embodiments, the modification comprises weighting the second prediction. It will be appreciated that these combinations rely on the second prediction being filtered, weighted, or both before combining. Further, the expression "weighting" as used herein may refer to a matrix of weights, one for each image element of the corresponding prediction, where the size of the matrix of weights is the same as the size of the block being weighted. In other embodiments, the weighting is a constant (or a matrix of identical weights) such that each image element is weighted by the same amount.

[0040] Figure 2 shows a flow diagram illustrating an exemplary method 100 of decoding a group of image elements of an encoded video sequence using a decoder. To that end, the decoder provides a first prediction of the group of image elements according to a first prediction mode (block 102), and provides a second prediction of the group of image elements according to a second prediction mode (block 104). For example, the decoder may provide the first and/or second predictions according to intra-block prediction or inter-block prediction. In some embodiments, the method 100 may further include first determining the first and second prediction modes, e.g., from information provided with received video sequences. The decoder then modifies the second prediction responsive to frequency content of the second prediction to generate a second modified prediction (block 106), and generates a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction (block 108). In some embodiments, the decoder modifies the second prediction by filtering the second prediction, e.g., high-pass filtering the second prediction, before the combination. In other embodiments, the decoder modifies the second prediction by filtering and weighting the second prediction before the combination. It will further be appreciated that some embodiments may also modify the first prediction, e.g. by filtering and/or weighting the first prediction, and that the subsequent combination may be a combination of the first prediction and the second modified prediction, or of the first modified prediction and the second modified prediction. Further, when the prediction error is provided by the encoder with the bit stream, the decoder adds the prediction error to the decoded version to form the final decoded block.

[0041] Those skilled in the art will appreciate that any weighting used to weight a prediction may include a weight for each element of the corresponding group of image elements. In some embodiments, all of the weights may be the same. However, in other embodiments discussed in further detail below, at least some of the weights are different. For example, when image elements near the edge of neighboring group(s) are used to predict a current group of image elements, the weights for the image elements near the edge of neighboring group(s) of image elements may be greater than the weights for the image elements farther away from the edge of the neighboring group(s). Further, while in some embodiments a second weighting (for the second prediction) is greater than the first weighting (for the first prediction), e.g., when the second prediction is provided according to inter-block prediction, in other embodiments the first weighting (for the first prediction) is greater than the second weighting (for the second prediction), e.g., when decoded image elements used for the second prediction have been filtered. In still other embodiments, an additional weight scaling factor, e.g., received with the encoded video sequence, may be used to further adjust the first weighting and/or the second weighting. Such a weight scaling factor may have a value between 0 and a positive non-zero constant, e.g., 1 , and is configured to emphasize the first/second weighting over the second/first weighting to further control an influence of the first/second prediction on the generated decoded version. [0042] In some embodiments, decoder 200 in Figure 3 may implement the method 100 of Figure 2. In this embodiment, the decoder 200 comprises a first prediction circuit 210, second prediction circuit 220, weighting circuit 230, and decoding circuit 240. The first prediction circuit 210 provides a first prediction of the group of image elements according to a first prediction mode (e.g., block 102 of Figure 2), and the second prediction circuit 220 provides a second prediction of the group of image elements according to a second prediction mode (e.g., block 104 of Figure 2). The modification circuit 230 modifies the second prediction responsive to frequency content of the second prediction to generate a second modified prediction (e.g., block 106 of Figure 2). The decoding circuit 240 generates a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction (e.g., block 108 of Figure 2). Decoder 200 may optionally include a mode circuit 250 configured to determine the first prediction mode and/or the second prediction mode, e.g., responsive to information provided in the video sequence, by determining an intra-prediction mode or a location identifier for a decoded version of another group of image elements in the frame of the encoded video sequence.

[0043] It will be appreciated that other devices may implement the method 100 of Figure 2. For example, the decoding apparatus 300 shown in Figure 4 may use the illustrated first prediction module 310, second prediction module 320, weighting module 330, decoding module 340, and optional mode module 350 to implement method 100. Those of skill in the art will also readily recognize that the method 100 described herein may be implemented as stored computer program instructions for execution by one or more computing devices, such as microprocessors, Digital Signal Processors (DSPs), FPGAs, ASICs, or other data processing circuits. The stored program instructions may be stored on machine-readable media, such as electrical, magnetic, or optical memory devices. The memory devices may include ROM and/or RAM modules, flash memory, hard disk drives, magnetic disc drives, optical disc drives and other storage media known in the art. For example, method 100 may be implemented using a decoding processor comprising software instructions that when run on the decoding processor cause the decoding processor to execute the method 100 of Figure 2.

[0044] Figure 5 shows a flow diagram illustrating an exemplary method 400 of encoding a first group of image elements in a frame of a video sequence using an encoder. To that end, the encoder estimates a first prediction of a first group of image elements according to a first prediction mode (block 402), and estimates a second prediction for each of a plurality of second group of image elements (block 404). The encoder modifies each of the second predictions responsive to frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, wherein each of the second group of image elements corresponds to a second prediction mode (block 406). The encoder determines a plurality of candidate predictions, each candidate prediction comprising a combination of the first prediction and one of the plurality of second modified predictions (block 408), and selects the candidate prediction having a better performance parameter than the other candidate predictions (block 410). Exemplary performance parameters include, but are not limited to, the Sum of Squared Differences (SSD), or the Sum of Absolute Differences (SAD), between the original and candidate predictions, which may be referred to as the prediction error, where smaller values of SSD (or SAD) represent better performance. The encoder encodes the first group of image elements as an identifier of the location of the first group of image elements or of the first prediction mode (block 412), and encodes each of the second group of image elements as an identifier of the second prediction mode or an identifier of the location of the second group of image elements corresponding to the selected candidate prediction (block 414). Further, if the encoder determines a prediction error, the encoder may include the prediction error in the bit stream sent to the decoder.

[0045] As with the decoder embodiments, in some embodiments, the encoder modifies the second predictions by filtering the second predictions, e.g., high-pass filtering the second predictions, before the combinations. In other embodiments, the decoder modifies the second predictions by filtering and weighting the second predictions before the combinations. It will further be appreciated that some embodiments may also modify the first prediction, e.g. by filtering and/or weighting the first prediction, and that the subsequent combination may be a combination of the first prediction and the second modified prediction, or of the first modified prediction and the second modified prediction.

[0046] Also as with the decoding operations, those skilled in the art will appreciate that any weighting used to weight a prediction may include a weight for each element of the

corresponding group of image elements. In some embodiments, all of the weights may be the same. However, in other embodiments discussed in further detail below, at least some of the weights are different. Further, an additional weight scaling factor, e.g., received with the encoded video sequence, may be used to further adjust a first weighting (for the first prediction) and/or a second weighting (for the second prediction). Such a weight scaling factor may have a value between 0 and a positive non-zero constant, e.g., 1 , and is configured to emphasize the first/second weighting over the second/first weighting to further control an influence of the first/second prediction on the combination.

[0047] In some embodiments, encoder 500 in Figure 6 may implement the method 400 of Figure 5. In this embodiment, the encoder 500 comprises a first prediction circuit 510, second prediction circuit 520, modification circuit 530, and an evaluation circuit 540. The first prediction circuit 530 estimates a first prediction of a first group of image elements according to a first prediction mode (e.g., block 402 in Figure 5), and the second prediction circuit 520 estimates a second prediction for each of a plurality of second group of image elements (e.g., block 404 in Figure 5). The modification circuit 530 modifies each of the second group of image elements responsive to a frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, where each of the second group of image elements corresponds to a second prediction mode (e.g., block 406 of Figure 5). The evaluation circuit 540 then generates the encoded information. More particularly, the evaluation circuit 540 determines a plurality of candidate predictions, where each candidate prediction comprises a combination of the first prediction and one of the plurality of second modified predictions (e.g., block 408 in Figure 5). The evaluation circuit 540 selects the candidate prediction having a better performance parameter than the other candidate predictions (e.g., block 410 in Figure 5). The encoder 540 encodes the first group of image elements as an identifier of the first prediction mode or of the location of the first group of image elements (e.g., block 412 in Figure 5). The encoder also encodes each second group of image elements as an identifier of the second prediction mode or an identifier of the location of the second group of image elements corresponding to the selected candidate position (e.g., block 414 in Figure 5). In some embodiments, the encoder may include an optional mode selection circuit 550 configured to select the first prediction mode, e.g., by selecting an intra-prediction mode or a location identifier for a decoded version of another group of image elements in the frame of the encoded video sequence, and by selecting the second prediction mode by selecting the intra- prediction mode or the location identifier for the decoded version of another group of image elements in the frame of the encoded video sequence.

[0048] It will be appreciated that other devices may implement the method 400 of Figure 5. For example, the encoding apparatus 600 shown in Figure 7 may use the illustrated weighting module 610, first prediction module 620, second prediction module 630, evaluation module 640, and optional mode module selection 650 to implement method 400. Those of skill in the art will also readily recognize that the method 400 described herein may be implemented as stored computer program instructions for execution by one or more computing devices, such as microprocessors, Digital Signal Processors (DSPs), FPGAs, ASICs, or other data processing circuits. The stored program instructions may be stored on machine-readable media, such as electrical, magnetic, or optical memory devices. The memory devices may include ROM and/or RAM modules, flash memory, hard disk drives, magnetic disc drives, optical disc drives and other storage media known in the art. For example, method 400 may be implemented using an encoding processor comprising software instructions that when run on the encoding processor cause the encoding processor to execute the method 400 of Figure 5.

[0049] The encoder and/or decoder devices disclosed herein may be comprised in any device, including but not limited to a tablet, personal computer, mobile telephone, set-top box, camera, machine-to-machine device, etc. Figure 8 shows one exemplary device 700. As used herein, the term "mobile terminal" may include a cellular radiotelephone with or without a multiline display; a Personal Communication System (PCS) terminal that may combine a cellular radiotelephone with data processing, facsimile, and data communications capabilities; a Personal Digital Assistant (PDA) that can include a radiotelephone, pager, Internet/intranet access, web browser, organizer, calendar, and/or a global positioning system (GPS) receiver; and a conventional laptop and/or palmtop receiver or other appliance that includes a

radiotelephone transceiver. Mobile terminals may also be referred to as "pervasive computing" devices.

[0050] In video coding a current block is decoded by producing a set of image elements representing a prediction of the current block. The prediction is based on image elements that previously have been decoded in the current picture. The predicted image elements are combined with a residual to form the decoded image elements. In the solution presented herein, it is described that the final set of predicted image elements can be produced from two predictions, e.g., two sets of predicted image elements, that are combined based on the frequency content of the predictions, e.g., using different weights for the low and high frequency part of each set of image elements.

[0051] An example of where this can be applied is in the use of bi-prediction, as shown in Figure 9. Where a first set of predicted image elements are created from previously decoded image elements produced using the first motion vector, and another set of predicted image elements are produced using a second motion vector, then both sets of image elements can be combined using separate weights for the low and high frequency components. [0052] As an example, the low frequency component can be calculated by taking the average of all predicted image elements for a predicted block, and the high frequency component can be calculated by removing the average from all image elements of the predicted block. A low frequency component block is determined by combining the average of both sets of predicted image elements using one set of weights. A high frequency component block is also produced by combining the set of image elements after having its respective average removed (or modified) using a different set of weights than used for the low frequency component. The final set of predicted image elements is then produced by adding the image elements representing the low frequency component block and the high frequency component block together.

[0053] The following summarizes various embodiments of the solution presented herein. Following these embodiment summaries is a second providing a more detailed explanation for each embodiment. It will be appreciated that Embodiment 1 provides the primary and most general embodiment, while all remaining embodiments stem from at least Embodiment 1.

[0054] In Embodiment 1 , a first prediction of the current group of image elements provided according to a first prediction mode is combined with a second modified prediction, where the second modified prediction is a modified version of a second prediction of the current group of image elements provided according to a second prediction mode. As discussed in further detail below, the modification may comprise a filtering of the second prediction and/or a weighting of the second prediction.

[0055] In Embodiment 2, which builds from Embodiment 1 , the first and second predictions are the same, where the second modified prediction for the current block to be decoded is produced by weighting the second prediction based on frequency content of the second prediction, e.g., by weighting only the low frequency content of the second prediction. In one embodiment, the first prediction is also modified so that a first modified prediction is produced by weighting the first prediction so that at least one weight used to modify the second prediction for one range of frequencies differs from at least one weight used to modify the first prediction for other frequencies. For example, the decoded block may be determined by:

P(x,y) = P₁ (x,y)(l - w) + P_h (x,y) w (2) where P(x, y represents an image element of the decoded block, (x,y) represents a spatial coordinate where x represents horizontal and y represents vertical, P_t represents the low frequency part of one prediction, P_h represents the high frequency part of the other prediction, w represents a weight for the second prediction.

[0056] In Embodiment 3, which builds from previous embodiments, the decoded block is produced based on a frequency weighted combination of at least the first and second predictions, where the weighting of each prediction is based on frequency content of the image elements of the respective first and second predictions so that at least one weight is used for one range of frequencies to weight the image elements of the predictions and a different weight is used for other frequencies to weight the image elements of the other prediction. Figure 10 shows an exemplary implementation. For example, the decoded block may be determined by:

P (x, y) = P_{l t} (x, y)(l - w_l ) + P_{2 l} (x, y) w_x + i> _¾ (x,y)(l - w₂ ) + Ρ₂ (x, y) w₂ (3) where P(x,y) represents an image element of the decoded block, (x,y) represents a spatial coordinate where x is horizontal and y is vertical, P represents the low frequency part of the first prediction, P_{l h} represents the high frequency part of the first prediction, P_{2 l} represents the low frequency part of the second prediction, P_{2 h} represents the high frequency part of the second prediction, w₁ represents a weight for the low frequency image elements, and w₂ represents a weight for the high frequency image elements. The benefit of this embodiment is that the amount of low frequency information to use from the predictions may be selectively combined with the amount of high frequency information to use from the predictions to determine the decoded block.

[0057] This embodiment may be implemented using a fixed point implementation. One example of a fixed point implementation of this used an accuracy of 1/2^ΛΑ for the weights w₁ and w₂ as follows: p(^x>y)^{= p}v(^x>y)((^{1 1 A})-^wi)^{+ p}2A^x>y)^wi^+pi,h(^x>y){( ^A)-^w2)+^p2,_h(^x>y)^w2

+ (1D (A))U (A + l) where A represents a factor that determine the accuracy of the weighting, and w₁ and w₂ represent weightings that can have a value between 0 and 2^A .

[0058] In Embodiment 4, which builds from previous embodiments, the decoded block is produced based on a combination of at least two frequency weighted single predictions. For example, to determine the decoded block where the weighting for frequency is specific for the first and second predictions, the following may be used:

P (x, y) = w₃ P_lt (x,y)(\-w_l) + P_lh(x,y)w_l) + (\-w₃) P_2l (x, y)(l-w₂) + Ρ₂ (x, y) w₂ (5) where P(x,y) represents an image element of a decoded block, (x,y) represents a spatial coordinate where x is horizontal and y is vertical, P represents the low frequency part of a first prediction, P_lh represents the high frequency part of the first prediction, P₂₁ represents the low frequency part of a second prediction, P_{2 h} represents the high frequency part of the second prediction, w_l represents a weight for the first prediction, w₂ represents a weight for the second prediction, and w₃ represents a weight to combine the first and second prediction as prior art. In one example, w₃ is 0.5.

[0059] In Embodiment 5, which builds from previous embodiments, the weighting

(modification) used to combine two single predictions or two predictions of different frequency ranges varies depending on the location inside the prediction. As an example the individual weights for the individual image elements can be determined from an average weight for the whole prediction and information generated by the prediction, e.g., the direction of the displacement vector or intra-prediction mode or range of the predicted image elements. One example of how determine pixel-specific weights is to determine one weight for predicted image elements which have low luminance (luma) values and another weight for predicted image elements that have higher luminance (luma) values. This can make the prediction focus more on predicted image elements with high luminance values than predicted image elements with low luminance values.

[0060] In Embodiment 6, which builds from previous embodiments, the second prediction (and in some cases the first prediction) may be modified by removing the high frequency component to generate a block of low frequency image elements by setting the image elements of the low frequency block to the average value of the image elements in the prediction.

Similarly, the second prediction (and in some cases the first prediction) may be modified by removing the low frequency component to generate a block of high frequency image elements by setting each image element of the high frequency block to the difference between the corresponding image element and the average value.

[0061] In Embodiment 7, which builds from previous embodiments, the second prediction (and in some cases the first prediction) may be modified by removing the high frequency component to generate a block of low frequency image elements by low-pass filtering the second (and/or first) prediction and setting each image element of the low frequency block to the corresponding image element of the low-pass filtered prediction. Similarly, the second prediction (and in some cases the first prediction) may be modified by removing the low frequency component to generate a block of high frequency image elements by high-pass filtering the second (and/or first) prediction and setting each image element of the high frequency block to the difference between the corresponding image element of the second (or first) prediction and the high-pass filtered image elements.

[0062] In Embodiment 8, which builds from Embodiment 7, the filtering is frequency selective in at least one specific spatial direction to modify frequency component(s) in the second prediction. For example, the frequency selective filtering may apply a low pass-filter only horizontally, only vertically, or only diagonally. This can be useful for combining a first and a second prediction when the uncertainty of the predictions is different in different spatial directions, e.g., vertically, horizontally, diagonally, etc., pending, for example, on the dominating spatial direction of the texture in the first prediction. Another example of a prediction with potentially different uncertainty in different directions is when the first prediction is produced by intra-prediction. Assume for example horizontal intra-prediction. In that case, the first prediction is derived by extrapolating from a column of decoded image elements to the left of current block. Such a prediction has both high and low frequencies in the direction orthogonal to the direction of the prediction, but only low frequencies in the direction of the prediction. A second prediction can potentially provide high frequencies in the direction of the prediction. In this case, one can use different weighting for frequencies in the direction of prediction than for other frequencies.

[0063] A variant of this embodiment is to apply one frequency selective filter in a horizontal direction and also apply a frequency selective filter in a vertical direction. In this way one can weight frequency contributions in one direction differently compared to frequency contributions in another direction.

[0064] In Embodiment 9, which builds from previous embodiments, a second modified prediction of low frequency image elements is determined by setting the respective image elements of the low frequency block (the second modified prediction) to the corresponding image element to a fit of the image elements of the second prediction with one or several basis functions. In that case, a block of high frequency image elements is determined by setting each image element of the high frequency block to the difference between the corresponding image element of the second prediction and the image elements resulting from the fit of the basis functions.

[0065] A basis function can be a 1 D or 2D basis function. A 1 D basis function is a function that only varies in one spatial direction. A 2D basis function can also only vary in one spatial direction but it's also capable of varying more flexibly, e.g., non-separable, for example varying with distance from the center. Typically, separable 1 D basis functions are used to construct a sufficient approximation of the 2D basis function.

[0066] One example of a basis function is a polynomial basis function. In that case, coefficients of the polynomial basis functions are determined by a least square fit with the image elements of the second prediction. For a first order 2D polynomial, the polynomial can be described by the coefficients a , b , and c using 2D base functions X and Y that vary linearly horizontally and vertically, respectively. The fit of predicted image elements can be defined as:

a + bX + cY (6) where X = [1 2 3 .. N;...; 1 2 3 ... N] and Y = [1 1 .. 1 ; 2 2 .. 2; M M .. M]; where N corresponds to the horizontal size of the second prediction and M corresponds to the vertical size of the second prediction, a represents a coefficient corresponding to DC (average value), b represents a coefficient corresponding to the horizontal gradient, and c represents a coefficient corresponding to the vertical gradient.

[0067] Another example of basis functions is to use a subset of a DCT (Discrete Cosine Transform) basis function. It could be other spatial transforms as well. First a forward transform using the subset of basis functions is applied to the second prediction. The coefficients of that transform can then be applied to an inverse forward transform to derive the image elements of the fit of the predicted image elements. One example is to determine the subset of basis functions to use based on the scale factors defined for inverse quantization of transform coefficients of the current block. The transform basis functions could be the same as one of the transform basis functions defined for transform coding of the prediction error elsewhere in the video coder/decoder. In that case, quantization and inverse quantization may be part of the process in another variant of the embodiment.

[0068] In Embodiment 10, which builds from previous embodiments, one of the predictions is an intra-predicted prediction (predicted from previously decoded image elements of the current picture) and the other prediction is an inter-predicted prediction (predicted from previously decoded image elements of a picture other than the current picture).

[0069] In Embodiment 1 1 , which builds from previous embodiments, both of the predictions are inter-predicted predictions (predicted from previously decoded image elements of a picture other than the current picture).

[0070] In Embodiment 12, which builds from previous embodiments, one of the predictions is an intra-predicted prediction that uses extrapolation, and the other prediction is an intra- predicted prediction that uses copying or interpolation.

[0071] In Embodiment 13, which builds from previous embodiments, at least one of the predictions is obtained by interpolation.

[0072] In Embodiment 14, which builds from previous embodiments, at least one frequency- specific weight or weighting is obtained from decoding a bit stream. The weight or weighting can be defined on different levels in the bit stream, sequence, picture, slice or block size, block level, etc. The weight is typically defined to be between 0 and a positive non-zero constant, e.g., 1 .

[0073] In Embodiment 15, which builds from previous embodiments, at least one frequency- specific weight or weighting is defined implicitly. In this case, the weight or weighting can be defined on different levels, sequence, picture, slice or block size, block level, etc. One example of implicit weighting is to derive the weight or weighting from scaling factors used in residual coding. For example, for the case when different scaling factors are used in inverse quantization of transform coefficients, one can determine the weights or weighting based on the value of the scaling factors so that a higher weight is given to frequency components that have a higher scaling factor than the frequency components that have a lower scaling factor.

[0074] In Embodiment 16, which builds from previous embodiments, the approach is enabled by a frequency-based enabled flag, e.g., such as obtained from decoding a bit stream. The frequency-based enable flag can be defined on different levels in the bit stream, sequence, picture, slice or block size, block level, etc.

[0075] In Embodiment 17, which builds from previous embodiments, the second modified prediction is generated by filtering some frequency components out of the second prediction, e.g., by low-pass or high-pass filtering the second prediction.

[0076] In one exemplary embodiment, an encoder starts with evaluating which combination of first and second predictions minimizes the rate-distortion cost out of all possible (or a subset) of first and second predictions for the current block. The rate-distortion cost is, e.g., evaluated as SSD + λ*Γ3ΐβ. Where the SSD is the mean squared error of the reconstructed image elements (e.g., the final prediction plus the prediction error, or the final prediction if no prediction error is provided), λ is a constant that is calculated based on the quantization parameter (QP) used for the current block, and rate is the number of bits signaled in the bit-stream required to represent the current block. Based on the first and second predictions, the encoder determines a low frequency first prediction, a high frequency first prediction, a low frequency second prediction, and a high frequency second prediction.

[0077] In one variant, the encoder creates a first modified prediction by combining the low frequency first prediction and the low frequency second prediction using a weight from a subset of weights. The encoder also creates a second modified prediction by combining the high frequency first prediction and the high frequency second prediction using another weight from a subset of weights. Both weights are derived by selecting the weights with the lowest rate- distortion cost. Then a new modified prediction is determined by combining the first and second modified predictions. The rate-distortion cost is calculated using the SSD between the new modified prediction and the corresponding block of the original source image (before encoding), and adding the bit cost multiplied by lambda.

[0078] In another variant, the encoder calculates the rate-distortion cost of creating a first modified prediction by combining the low frequency first prediction with the high frequency first prediction using a weight from a subset of weights, where the weight is determined by calculating the SSD between the first modified prediction and the corresponding block of the original source image (before encoding) and adding the bit cost multiplied by lambda. The encoder does the same to determine a second modified prediction by combining the low frequency second prediction with the high frequency second prediction using a second weight. Then a new modified prediction is determined by combining the first and second modified predictions. When all weights have been evaluated, the encoder then selects the weights for the respective predictions with the lowest rate-distortion value.

[0079] In both variants, the encoder then signals, for each block, both the prediction mode and the frequency specific weights that represent how the low and high frequencies of the respective predictions should be used to derive respective modified predictions that then are combined to determine a new modified prediction of the current block, i.e., the decoded block.

[0080] In another embodiment, a decoder starts with, for each block, determining a first prediction mode and a second prediction mode and two frequency-specific weights, using information that is decoded from the bit stream. The decoder creates a first prediction of the current block using the first prediction mode and creates a second prediction of the current block using the second prediction mode. A low frequency first prediction and a high frequency first prediction are determined based on a first weight. Similarly, a low frequency second prediction and high frequency second prediction are determined based on a second weight. [0081] In one variant, the decoder determines a first modified prediction by a weighted combination of the low frequency first prediction with the low frequency second prediction according to the first weight. A second modified prediction is created by a weighted combination of the high frequency first prediction with the high frequency second prediction using a second weight.

[0082] In another variant the decoder determines a modified first prediction by a weighted combination of the low frequency first prediction with the high frequency first prediction according to the first weight. Similarly, the decode determines a second modified prediction by a weighted combination of the low frequency second prediction with the high frequency first prediction according to the second weight.

[0083] In both variants the first modified prediction and the second modified prediction are then combined into a new modified prediction of image elements, e.g., the decoded block. It should be noted that both the encoder and decoder have been described herein without any coding of the prediction error. When prediction error is provided in the bit stream by the encoder, it is added to the prediction to form the decoded block

[0084] Various elements disclosed herein are described as some kind of circuit, e.g., a mode circuit, modification circuit, first prediction circuit, second prediction circuit, decoding circuit, a mode selection circuit, an evaluation circuit, etc. Each of these circuits may be embodied in hardware and/or in software (including firmware, resident software, microcode, etc.) executed on a controller or processor, including an application specific integrated circuit (ASIC).

[0085] Further, the solution presented herein may be used for still picture encoding as well.

[0086] Another exemplary embodiment considers the possibility of creating more complex predictions by combining different prediction methods. An example of such a method is to combine image element-based intra-prediction and block-based intra-block copy prediction. A similar approach may be used when inter-block prediction is used instead of intra-block copy prediction. Such a combination could remove redundant information with the copying in the block based prediction and at the same time use the well-established method of image element- based intra-prediction. These can be weighted together, in a way that some error metric like SSD or rate distortion is decreased and the visual appearance is improved.

[0087] For such a prediction method, a correlation exists between the angle of the intra- prediction mode and the angle of the weights used to combine image elements from both predictions. The image element-based intra-prediction is interpolating in the block with certain angle that makes it better along the edges from where it is interpolated. An example of how to assign weights depending on subsets of the intra-prediction modes is given in the following table (Table 1 ). This weight assignment technique may be defined in a standard so the encoder and the decoder use the same weightings. See Figure 11 , where the different directional weight matrices for different subsets of intra-prediction modes are used when combining image elements that are produced from intra-prediction and image elements produced using intra- block copy. In Figure 1 1 , the black color corresponds to the case where only the intra-predicted image elements are considered, and the white color to the case where only image elements from IntraBC predicted block, respectively.

Table 1

[0088] Another example of assigning weightings to different intra-prediction modes is to classify the modes into horizontal and vertical modes. In this case a set of weights that are varying in the horizontal direction (weight matrix 2) is used together with the horizontal intra modes (modes 2-17). Similarly another set of weights that are varying in the vertical direction (weight matrix 3) are used for the vertical intra modes (modes 18-34). For the two non-angular intra-prediction modes DC and planar (mode 0 and 1 ) another set of weights could be used that puts more weights for the top and left edges (weight matrix 1 ). A variant of this is to reuse the same weights as either the horizontal or vertical intra-prediction modes for the DC and Planar modes.

[0089] In one exemplary embodiment, search strategies may be used to determine the best prediction technique. One exemplary search strategy comprises an independent search strategy. In this case, the best image element-based intra-prediction block (intra-prediction mode) and Intra-Block Copy block (location) is found separately to select the one with the least distortion or the least rate distortion. The result is then combined with the appropriate weight matrix according to which subset the intra-prediction mode belongs to.

[0090] Another exemplary search strategy comprises a dependent search strategy. In this case, the Intra-Block Copy block (location) is found depending on which combined block gave the least distortion or the least rate distortion. Hence, the Intra-Block Copying is dependent on the Intra-prediction (intra-prediction mode) and weight matrix according to which subset the intra-prediction mode belongs to. So the procedure is as follows:

• Choose the image element-based intra-block (intra-prediction mode) as usual by taking the one that gives the least distortion or the least rate distortion.

• For all block based intra-blocks in a certain search range (locations), add the image element-based and block-based blocks together with appropriate weight matrix.

• Select the location that gives a combination with the least distortion or the least rate distortion.

[0091] Another exemplary search strategy comprises a dependent search exhaustive strategy. In this case, the dependent search is as described above: look at all combinations of Intra-prediction mode and Intra-Block Copy block and choose the one with the least distortion or the least rate distortion of the combined block.

[0092] Another exemplary search strategy comprises a dependent search with template matching. In this case, the normal Intra-Block Copy method is replaced with the Template Matching method, where the K best blocks are found. The K best block is determined based on distortion for the template area. Then, the best location of the K best block is selected with the least distortion or the least rate distortion of the combined block.

[0093] In one example the first prediction is used as a template to at least partly determine the location of the samples to use for the second prediction. Below is some example code where the location for the second prediction completely is determined by searching in a neighbourhood of previously reconstructed samples around the current block. Same principle can apply to search in a neighbourhood of previously reconstructed samples in another frame/picture. So in this example no location information for the second prediction needs to be signaled to the decoder and the decoder performs same steps to derive the combined prediction. The combined prediction has in this example the DC (average value) from the first prediction added to a weighted average of the high frequency components from the first and second prediction. The high frequency components are derived by removing the DC first prediction (the average value of first prediction) from the first prediction and the DC second prediction (the average value of the second prediction) from the second prediction.

[0094] // 1 . try to find a block that best matches piPred (a block with known prediction mode) by searching for to the left or above the current block

// first find sample

// then find out how much can be searched to the left and above

Pel *lumaRec = pcPredYuv->getAddr( compID, uiAbsPartldx );

Int maxLeft = pcCU->getCUPelX();

Int searchLeftX = min((lnt)uiWidth*3,maxl_eft); Int searchLeftToX = searchLeftX - min((lnt)uiWidth,maxl_eft);

Int search LeftY = 0;

Pel *lumaLeft = lumaRec-searchLeftX;

Int maxAbove = pcCU->getCUPelY();

Int searchAboveX = min((lnt)uiWidth*3,maxLeft);

Int searchAboveY = min((lnt)uiHeight*3, maxAbove);

const TComSPS* pSPS = pcCU->getSlice()->getSPS();

Int maxRight = pSPS->getPicWidthlnLumaSamples() - pcCU->getCUPelX();

Int searchAboveToX = searchAboveX + min((lnt)uiWidth*4, maxRight);

Int searchAboveToY = searchAboveY - min((int)uiHeight, maxAbove);

Pel *lumaAbove = lumaRec-searchAboveX-searchAboveY*uiStride;

Double bestMatch = uiWidth*uiHeight*1023*1023;

Int bestPosX = 0; // 0 corresponds to the X position of current block

Int bestPosY = 0; // 0 corresponds to the Y position of current block

Int bestPredDC = 0;

Int bestBlockCopyDC = 0;

// search in region to the left

Pel* blockCopyLeft = lumaLeft;

for( Ulnt uiY2 = 0; uiY2 < (searchLeftY+1 ); uiY2++ )

{

for( U lnt uiX2 = 0; uiX2 < searchLeftToX; uiX2++ )

{

Double theMatch = 0;

Double predDC = 0.0;

Double blockCopyDC = 0.0;

Pel* pPred = piPred; Pel* blockCopy = blockCopyl_eft+uiX2;

for( Ulnt uiY = 0; uiY < uiHeight; uiY++ )

{

for( Ulnt uiX = 0; uiX < uiWidth ; uiX++ )

{

Int diff = pPred[ uiX ]-blockCopy[uiX];

predDC += pPred[ uiX ];

blockCopyDC += blockCopy[ uiX ];

theMatch += dif diff;

}

pPred += uiStride; blockCopy +=uiStride;

}

// remove difference based on difference in DC (average value)

Double DCPred = predDC/(Double)(uiHeight*uiWidth);

Double DCBIockCopy = blockCopyDC/(Double)(uiHeight*uiWidth);

Double DCerr = DCPred-DCBIockCopy;

DCerr = DCerr*DCerr;

// remaining is AC error

theMatch -= (Double)(uiHeight*uiWidth)*DCerr;

if(theMatch < bestMatch)

{

bestMatch = theMatch;

bestPosX = (lnt)uiX2;

bestPosY = (lnt)uiY2;

bestPredDC = lnt(DCPred+0.5);

bestBlockCopyDC = lnt(DCBIockCopy+0.5);

}

} blockCopyLeft += uiStride;

} bool bestMatchAbove = false;

// search in region above

Pel* blockCopyAbove = lumaAbove;

for( Ulnt uiY2 = 0; uiY2 < searchAboveToY; uiY2++ )

{

for( U lnt uiX2 = 0; uiX2 < searchAboveToX; uiX2++ )

{

Double theMatch = 0;

Double predDC = 0.0;

Double blockCopyDC = 0.0;

Pel* pPred = piPred;

Pel* blockCopy = blockCopyAbove+uiX2;

for( Ulnt uiY = 0; uiY < uiHeight; uiY++ )

{

for( Ulnt uiX = 0; uiX < uiWidth ; uiX++ )

{

Int diff = pPred[ uiX ]-blockCopy[uiX];

predDC += pPred[ uiX ];

blockCopyDC += blockCopy[ uiX ];

theMatch += dif diff;

}

pPred += uiStride; blockCopy +=uiStride;

}

// remove difference based on difference in DC (average value) Double DCPred = predDC/(Double)(uiHeight*uiWidth); Double DCBIockCopy = blockCopyDC/(Double)(uiHeight*uiWidth);

Double DCerr = DCPred-DCBIockCopy;

DCerr = DCerr*DCerr;

// remaining is AC error

theMatch -= (Double)(uiHeight*uiWidth)*DCerr;

if(theMatch < bestMatch)

{

bestMatch = theMatch;

bestPosX = (lnt)uiX2;

bestPosY = (lnt)uiY2;

bestMatchAbove = true;

bestPredDC = lnt(DCPred+0.5);

bestBlockCopyDC = lnt(DCBIockCopy+0.5);

}

blockCopyAbove += uiStride;

} validPos = false;

if(bestMatch < uiWidth*uiHeight*1023*1023)

{

validPos = true;

}

if(validPos)

{

// now derive the combined prediction as the DC part (low frequency part, variable bestPredDC) of the first prediction and an pixel average of the high frequency first prediction (first prediction - bestPredDC) and the high frequency second prediction (second prediction - bestBlockCopyDC (average value of block copy prediction)) , where the weight for the first prediction is larger closer to its reference samples when the first prediction is an intra prediction,

Pel* pPred = piPred;

Pel* blockCopy;

Int bestPosCurrBlockX;

Int bestPosCurrBlockY;

if(bestMatchAbove)

{

bestPosCurrBlockX = -(search AboveX-bestPosX);

bestPosCurrBlockY = -(searchAboveY-bestPosY);

blockCopyLeft = lumaAbove;

blockCopy = blockCopyl_eft+bestPosX+bestPosY*uiStride;

}

else

{

bestPosCurrBlockX = -(searchLeftX-bestPosX);

bestPosCurrBlockY = 0;

blockCopyLeft = lumaLeft;

blockCopy = blockCopyl_eft+bestPosX+bestPosY*uiStride;

}

Double theMatch = 0;

for( U lnt uiY = 0; uiY < uiHeight; uiY++ )

{

for( Ulnt uiX = 0; uiX < uiWidth; uiX++ )

{

Int combinedPrediction = bestPredDC + (((pPred[uiX]-bestPredDC)*wMat[uiY]

[uiX]+(blockCopy[uiX]-bestBlockCopyDC)*(maxweight-wMat[uiY] [uiX]) + theRound) » theShift); // divide by 8 pPred[uiX] = combinedPrediction ;

}

pPred += uiStride; blockCopy +=uiStride;

}

[0095] Another exemplary embodiment relies on error metrics. Typically, the error metric includes both a term related to the cost to encode the block multiplied by lambda, which is added to a distortion term that reflects how good the decoded image elements resemble the corresponding original source image elements, e.g., rate distortion. Typically, SSD and SAD are the distortion metrics used to determine which intra-prediction mode to select, and which location to predict from is decided by SSD or SAD. To emphasize the selection of locations of combinations that don't contribute to block artifacts, one can combine the SSD or SAD with an edge metric that measures the distortion along edges of the predicted block. A combined metric can be given by: SSD+k*SSDedge, where k is a constant that determines the relative importance of the edge metric compared to the normal metric. The constant k may also be different for different positions along the block boundary. One example is to adapt k such that it is larger for positions where image elements close to respective position on respective side of the block border do not vary so much as compared to the case where the image elements vary a lot.

[0096] One example is to do determine the edge metric on image elements along the right edge of the predicted block and the original block respectively, as shown in Figure 12. In Figure 12, the proposed edge metric compares the value of an image element on the right edge of the block with the neighbor directly to the right, e1-e2, and then compares this value with a difference computed on corresponding image elements from the original source. The edge metric can be defined as the sum of squared differences along the block boundary, where each difference is provided by comparing the original difference between e1 and e2 on the original source with the difference between e1 and e2 on the predicted block. If the e2 image element not is available due to that the corresponding image element not has been coded yet, the corresponding image element from the original is used instead.

[0097] The present invention may, of course, be carried out in other ways than those specifically set forth herein without departing from essential characteristics of the invention. The present embodiments are to be considered in all respects as illustrative and not restrictive, and all changes coming within the meaning and equivalency range of the appended claims are intended to be embraced therein.

Claims

1. A method of decoding a group of image elements in a frame of an encoded video sequence, the method comprising:

providing a first prediction of the group of image elements according to a first prediction mode;

providing a second prediction of the group of image elements according to a second

prediction mode;

modifying the second prediction responsive to frequency content of the second prediction to generate a second modified prediction; and

generating a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction.

2. A decoder comprising:

a first prediction circuit configured to provide a first prediction of the group of image

elements according to a first prediction mode;

a second prediction circuit configured to provide a second prediction of the group of image elements according to a second prediction mode;

a modification circuit configured to modify the second prediction responsive to frequency content of the second prediction to generate a second modified prediction; and a decoding circuit configured to generate a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction.

3. The decoder of claim 2, wherein the modification circuit is configured to modify the second prediction by applying a frequency filter to the second prediction to modify one or more frequency components in the second prediction to generate the second modified prediction.

4. The decoder of claim 3, wherein the modification circuit is configured to apply the frequency filter by applying a high pass filter to the second prediction to remove low frequency components from the second prediction to generate the second modified prediction.

5. The decoder of any of claims 2-4, wherein the modification circuit is configured to apply the frequency filter by applying the frequency filter in one spatial direction of the second prediction to generate the second modified prediction.

6. The decoder of any of claims 2-5, wherein the modification circuit is configured to apply the frequency filter by applying a first frequency filter in a first spatial direction and by applying a second frequency filter in a second spatial direction to modify frequency components in the second prediction to generate the second modified prediction.

7. The decoder of any of claims 2-6, wherein the modification circuit is configured to modify the second prediction by:

determining a second weighting based on the frequency content of the second prediction; and

applying the second weighting to the second prediction to generate the second modified prediction or to the filtered second prediction to generate the second modified prediction.

8. The decoder of claim 7, wherein the frequency content of the second prediction comprises one or more frequency ranges, and wherein the second weighting comprises a different second weight for each of the one or more frequency ranges.

9. The decoder of claim 8, wherein the frequency content of the second prediction comprises low frequency content and high frequency content such that the second prediction comprises low frequency image elements and high frequency image elements, wherein the second weighting comprises at least a second low frequency weight and a second high frequency weight, and wherein the modification circuit is configured to apply the second weighting to the second prediction by:

applying the second low frequency weight to each low frequency image element in the second prediction to generate a low frequency portion of the second modified prediction; and

applying the second high frequency weight to each high frequency image element in the second prediction to generate a high frequency portion of the second modified prediction.

10. The decoder of any of claims 2-9, wherein the modification circuit is further configured to apply a first weighting to the first prediction to generate a first weighted prediction, wherein the decoding circuit is further configured to generate the decoded version of the group of image elements using a combination of the first weighted prediction and the second modified prediction.

1 1 . The decoder of claim 10, wherein the modification circuit is further configured to apply a second weighting to the second modified prediction to generate a second weighted prediction, and wherein the decoding circuit is configured to generate the decoded version by using a combination of the first weighted prediction and the second weighted prediction.

12. The decoder of any of claims 2-1 1 , wherein the frequency content of the first prediction comprises low frequency content and high frequency content such that the first prediction comprises low frequency image elements and high frequency image elements, wherein the first weighting comprises at least a first low frequency weight and a first high frequency weight, and wherein the modification circuit is configured to apply the first weighting to the first prediction by: applying the first low frequency weight to each low frequency image element in the first prediction to generate a low frequency portion of the first weighted prediction; and applying the first high frequency weight to each high frequency image element in the first prediction to generate a high frequency portion of the first weighted prediction.

13. The decoder of any of claims 2-12:

wherein the decoder is further configured to receive a weight scaling factor having a value between 0 and a positive non-zero constant with the encoded video sequence, said weight scaling factor configured to emphasize one of the first and second weightings over the other of the first and second weightings to further control an influence of one of the first and second predictions on the generated decoded version; and wherein the modification circuit is further configured to use the weight scaling factor to adjust at least one of the first and second weightings.

14. The decoder of any of claims 2-13, wherein the first prediction circuit is configured to provide the first prediction by providing the first prediction of the group of image elements according to one of an intra-block prediction and an inter-block prediction, and wherein the second prediction circuit is configured to provide the second prediction by providing the second prediction of the group of image elements according to one of an intra-block prediction and an inter-block prediction.

15. The decoder of any of claims 2-14 further comprising a mode circuit configured to: determine the first prediction mode by determining an intra-prediction mode or a location identifier for a decoded version of another group of image elements in the frame of the encoded video sequence; and

determine the second prediction mode by determining the intra-prediction mode or the

location identifier for the decoded version of another group of image elements in the frame of the encoded video sequence.

16. The decoder of any of claims 2-15, wherein the first and second predictions are the same, and wherein the first and second prediction modes are the same.

17. The decoder of any of claims 2-15, wherein the first prediction differs from the second prediction, and wherein the first prediction mode differs from the second prediction mode.

18. The decoder of any of claims 2-17, wherein the decoder is comprised in a device.

19. The decoder of claim 18, wherein the device comprises one of a tablet, personal computer, mobile telephone, set-top box, and camera.

20. A computer program product stored in a non-transitory computer readable medium for controlling a decoder, the computer program product comprising software instructions which, when run on the decoder, causes the decoder to:

provide a first prediction of the group of image elements according to a first prediction mode; provide a second prediction of the group of image elements according to a second

prediction mode;

modify the second prediction responsive to frequency content of the second prediction to generate a second modified prediction; and generate a decoded version of the group of image elements using a combination of the first prediction and the second modified prediction.

21 . A decoder apparatus comprising:

a first prediction module configured to provide a first prediction of the group of image

elements according to a first prediction mode;

a second prediction module configured to provide a second prediction of the group of image elements according to a second prediction mode;

a modification module configured to modify the second prediction responsive to frequency content of the second prediction to generate a second modified prediction; and a decoding module configured to generate a decoded version of the group of image

elements using a combination of the first prediction and the second modified prediction.

22. A method of encoding a group of image elements in a frame of a video sequence, the method comprising:

estimating a first prediction of a first group of image elements according to a first prediction mode;

estimating a second prediction for each of a plurality of second group of image elements; modifying each of the second predictions responsive to frequency content of the

corresponding second group of image elements to generate a plurality of second modified predictions, wherein each of the second group of image elements corresponds to a second prediction mode;

determining a plurality of candidate predictions, each candidate prediction comprising a combination of the first prediction and one of the plurality of second modified predictions; selecting the candidate prediction having a better performance parameter than the other candidate predictions;

encoding the first group of image elements as an identifier of the location of the first group of image elements or of the first prediction mode; and

encoding the second group of image elements as an identifier of the second prediction

mode or an identifier of the location of the second group of image elements corresponding to the selected candidate prediction.

An encoder comprising:

a first prediction circuit configured to estimate a first prediction of a first group of image

elements according to a first prediction mode;

a second prediction circuit configured to estimate a second prediction for each of a plurality of second group of image elements;

a modification circuit configured to modify each of the second predictions responsive to frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, wherein each of the second group of image elements corresponds to a second prediction mode;

an evaluation circuit configured to:

determine a plurality of candidate predictions, each candidate prediction comprising a combination of the first prediction and one of the plurality of second modified predictions;

select the candidate prediction having a better performance parameter than the other candidate predictions;

encode the first group of image elements as an identifier of the location of the first group of image elements or the first prediction mode; and encode the second group of image elements as an identifier of the second prediction mode or an identifier of the location of the second group of image elements corresponding to the selected candidate prediction.

24. The encoder of claim 23, wherein the modification circuit is configured to modify each of the second predictions by applying a frequency filter to each of the second predictions to modify one or more frequency components in the second predictions to generate the plurality of second modified predictions.

25. The encoder of claim 24 wherein the modification circuit is configured to apply the frequency filter by applying a high pass filter to each of the second predictions to remove low frequency components from the second predictions to generate the plurality of second modified predictions.

26. The encoder of any of claims 23-25, wherein the modification circuit is configured to apply the frequency filter by applying the frequency filter in one spatial direction of each of the second predictions to generate the plurality of second modified predictions.

27. The encoder of any of claims 23-26, wherein the modification circuit is configured to apply the frequency filter by applying a first frequency filter in a first spatial direction and to apply a second frequency filter in a second spatial direction to modify frequency components in each of the second predictions to generate the plurality of second modified predictions.

28. The encoder of any of claims 23-27, wherein the modification circuit is configured to modify the second prediction by: determining a second weighting based on the frequency content of the second predictions; and

applying the second weighting to each of the second predictions to generate the plurality of second modified predictions or to the filtered second prediction to generate the plurality of second modified predictions.

29. The encoder of claim 28, wherein the frequency content of the second predictions comprise one or more frequency ranges, and wherein the second weighting comprises a different second weight for each of the one or more frequency ranges.

30. The encoder of embodiment 29 wherein the frequency content of the second predictions comprise low frequency content and high frequency content such that the second predictions comprise low frequency image elements and high frequency image elements, wherein the second weighting comprises at least a second low frequency weight and a second high frequency weight, and wherein the modification circuit is configured to apply the second weighting to each of the second predictions by:

applying the second low frequency weight to each low frequency image element in each of the second predictions to generate a low frequency portion of the plurality of second modified predictions; and

applying the second high frequency weight to each high frequency image element in each of the second predictions to generate a high frequency portion of the plurality of second modified predictions.

31 . The encoder of any claims 23-30, wherein the modification circuit is further configured to apply a first weighting to the first prediction to generate a first weighted prediction, wherein each candidate prediction comprises a combination of the first weighted prediction and one of the plurality of second modified prediction.

32. The encoder of claim 31 , wherein the modification circuit is further configured to apply a second weighting to the second modified prediction to generate a second weighted prediction, wherein each candidate predication comprises a combination of the first weighted prediction and the second weighted prediction.

33. The encoder of any of claims 23-32, wherein the frequency content of the first prediction comprises low frequency content and high frequency content such that the first prediction comprises low frequency image elements and high frequency image elements, wherein the first weighting comprises at least a first low frequency weight and a first high frequency weight, and wherein the modification circuit is configured to apply the first weighting to the first prediction by: applying the first low frequency weight to each low frequency image element in the first prediction to generate a low frequency portion of the first weighted prediction; and applying the first high frequency weight to each high frequency image element in the first prediction to generate a high frequency portion of the first weighted prediction.

34. The encoder of any of claims 23-33, wherein the encoder is further configured to signal a weight scaling factor having a value between 0 and a positive non-zero constant with the encoded video sequence, said weight scaling factor configured to emphasize one of the first and second weightings over the other of the first and second weightings to further control an influence of one of the first and second predictions on the combination.

35. The encoder of any of claims embodiments 53-64 wherein the first prediction circuit is configured to provide the first prediction by providing the first prediction of the group of image elements according to one of an intra-block prediction and an inter-block prediction, and wherein the second prediction circuit is configured to provide the second prediction by providing the second prediction of the group of image elements according to one of an intra-block prediction and an inter-block prediction.

36. The encoder of claim 35, further comprising a mode selection circuit configured to: select the first prediction mode by selecting an intra-prediction mode or a location identifier for a decoded version of another group of image elements in the frame of the encoded video sequence; and

select the second prediction mode by selecting the intra-prediction mode or the location identifier for the decoded version of another group of image elements in the frame of the encoded video sequence.

37. The encoder of any of claims 23-36, wherein the first and second predictions are the same, and wherein the first and second prediction modes are the same.

38. The encoder of any of claims 23-36, wherein the first prediction differs from the second prediction, and wherein the first prediction mode differs from the second prediction mode.

39. The encoder of any of claims 23-38, wherein the encoder is comprised in a device.

40. The encoder of claim 39, wherein the device comprises one of a tablet, personal computer, mobile telephone, set-top box, and camera.

41 . A computer program product stored in a non-transitory computer readable medium for controlling an encoder, the computer program product comprising software instructions which, when run on the encoder, causes the encoder to:

estimate a first prediction of a first group of image elements according to a first prediction mode;

estimate a second prediction for each of a plurality of second group of image elements; modify each of the second predictions responsive to frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, wherein each of the second group of image elements corresponds to a second prediction mode;

determine a plurality of candidate predictions, each candidate prediction comprising a

combination of the first prediction and one of the plurality of second modified predictions;

select the candidate prediction having a better performance parameter than the other

candidate predictions;

encode the first group of image elements as an identifier of the location of the first group of image elements or the first prediction mode; and

encode the second group of image elements as an identifier of the second prediction mode or an identifier of the location of the second group of image elements corresponding to the selected candidate prediction.

An encoder apparatus comprising:

first prediction module configured to estimate a first prediction of a first group of image elements according to a first prediction mode;

second prediction module configured to estimate a second prediction for each of a plurality of second group of image elements; a modification module configured to modify each of the second predictions responsive to frequency content of the corresponding second group of image elements to generate a plurality of second modified predictions, wherein each of the second group of image elements corresponds to a second prediction mode;

an evaluation module configured to: