WO2021093956A1

WO2021093956A1 - A device and method for image processing

Info

Publication number: WO2021093956A1
Application number: PCT/EP2019/081337
Authority: WO
Inventors: Sean Moran; Pierre MARZA; Steven George MCDONAGH; Sarah PARISOT; Gregory Slabaugh
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2021-05-20
Also published as: CN114830169A

Abstract

An image processing device for transforming an image, the device comprising a processor configured to implement a trained algorithm to process a received first image and generate a set of parameters in dependence thereon, and to generate a second image by applying to the first image a local parametric filter taking the set of parameters as input. The device may therefore predict a set of filter parameters needed to apply the local parametric filter to the first image in order to generate a second image of improved image quality.

Description

A DEVICE AND METHOD FOR IMAGE PROCESSING

FIELD OF THE INVENTION

This invention relates to image processing, specifically enhancement or transformation using parametric filters.

BACKGROUND

Human digital artists and photographers can improve the aesthetic quality of digital photographs through manual image enhancements and retouching. The resulting images are intended to be visually appealing and offer perceptual improvements over the original inputs. Manual enhancement methods constitute a variety of post-processing editing options such as image sharpening tools and colour adjustments. Professional grade software tools such as Photoshop and Lightroom allow application of these modifications through both interactive and semi-automated methods. In addition to elementary global tools such as contrast enhancement and brightening, advanced editing functionality is also available through both local and adaptive image adjustments.

Figures 1 (a) and 1(b) show two different types of local filter available, for example in Adobe Lightroom and Photoshop. Figure 1 (a) shows a graduated filter which applies a linear adjustment to a selected region using three parallel lines 102, 104, and 106. Figure 1(b) shows a radial filter which applies a local adjustment in a radial manner indicated by the ellipse 108.

Manual enhancement remains challenging for lay-users who may lack appropriate skills and understanding to improve their images pleasingly and effectively. The resulting image quality is highly dependent on both skill and subjective aesthetic judgements of the end user. Even with sufficiently high skill, a significant amount of manual editing time is often still required to reach pleasing enhancement results. Semi-automated tools may somewhat expedite the process by only requiring adjustment of few hyperparameters, yet results can be highly sensitive to parameter values, again necessitating experience and expert user understanding. Additionally, semi-automated methods are commonly based on hard-coded heuristic rules encapsulating human perception rules of thumb, such as enhancing details or stretching image contrast. This in turn can lead to method brittleness and low-quality end results.

The existing approaches to automated image enhancement problems mainly use data-driven, learning based approaches. Such models typically employ fully-supervised learning from pairs of input / output images that are defined as the before and after images, as edited by an expert digital artist or photographer. Early work in this domain typically extracted handcrafted features such as intensity distributions from input images and then learns appropriate mappings from low-level image features to enhancement-tool parameter setting values accordingly. This strategy has been previously applied to single, global editing tools such as global tone and colour adjustments. More recent learning-based work has alternatively employed Deep Convolutional Neural Networks (CNNs) to instead learn image features automatically. Learning features layer-wise encodes a range of low-level to high-level image features that attempt to capture semantic information.

In this field CNNs enable context dependent photo edits. Deep CNNs have led to clear improvements for photo retouching, and CNNs have been utilized for spatially varying colour mapping based on semantic information together with handcrafted global and local features, such as that described in Yan et al., "Automatic photo adjustment using deep neural networks", ACM Transactions on Graphics (TOG), 35.2 (2016): 11., and for colour constancy, such as that described in Hu et al., “Exposure: A white-box photo postprocessing framework”, ACM Transactions on Graphics (SIGGRAPH), 2017, where semantic understanding helps in resolving estimation ambiguity. CNNs have also been trained to predict local affine transforms in bilateral space such as that described in Gharbi et al., “HDRNet: Deep bilateral learning for real-time image enhancement”, ACM Trans. On Graphics (SIGGRAPH), 2017, which can serve as an approximation to edge-aware image filters and colour / tone adjustments. The work of Gharbi et al. learns local affine colour transformations per RGB channel, and scaling maps are computed on a low resolution version of the image. However, no global adjustment is performed as part of this method. The recent photo post-processing framework of Hu and colleagues [Hu et al. 2018, "Exposure: A white-box photo post-processing framework", ACM Transactions on Graphics (TOG), 37.2 (2018): 26.], predicts global retouching curves in RGB space. However, this approach is restricted to monotonic curves, which in turn limits flexibility as local adjustments cannot be made, based on image spatial regions. Deep Photo Enhancer such as described in Chen et al., “Deep photo enhancer: Unpaired learning for image enhancement from photographs with GANs”, in CVPR, 2018, can learn enhancements using a Generative Adversarial Network (GAN) setup. However, this setup may suffer from lack of interpretability and must also account for the well-understood min/max game training issues commonly associated with GAN models. The work of Park et al. as described in Park et al., “Distort-and-recover: Colour enhancement using deep reinforcement learning”, in CVPR, 2018, alternatively employs Deep Reinforcement Learning to enhancement image adjustments. However, in their work only global image modifications can be applied (for example contrast, or saturation), limiting the expressiveness of the model. Conversely in a recent Deep Illumination Estimation paper, Wang et al., “Underexposed Photo Enhancement Using Deep Illumination Estimation”, in CVPR, 2019, a scaling luminance map is learned using an encoder-decoder setup. However, no global adjustment is performable. The learned mapping can be considered of high complexity and crucially depends on the regularization strategy employed. More recently, image enhancement has been learned in a multitask setting as described in Kong et al., "Multitask bilateral learning for real-time image enhancement", Journal of the Society for Information Display, 2019. In summary, there has been much recent interest in learning based approaches for the image enhancement problem, using neural network architectures.

It is desirable to develop automatic photo enhancement tools that can replace lay-user manual work or provide an improved manual-editing starting point for professional artists.

SUMMARY OF THE INVENTION

According to one aspect there is provided an image processing device for transforming an image, the device comprising a processor configured to implement a trained algorithm to process a received first image and generate a set of parameters in dependence thereon, and to generate a second image by applying to the first image a local parametric filter taking the set of parameters as input. The processing the first image uses a trained algorithm to automatically predict and generate a set of parametric filters to use to enhance the image which can be used to improve the image quality.

A parametric filter is a filter which can be defined using a finite number of parameters. A local parametric filter acts within a localised area of the image to which the filter is applied. The fewer the number of parameters needed to define the filter the more efficiently the values for those parameters can be predicted.

The local parametric filter may be configured to apply different kinds of transformation to different regions of the first image and the set of parameters includes one or more parameters governing where in the first image the filter is applied for a specific kind of transformation. The image processing may therefore enable the combination of multiple filters in multiple regions of the image in order to achieve the best resulting image quality.

A specific kind of transformation of the local parametric filter may be dependent on parameters input to the filter, and the set of parameters may include one or more parameters governing that specific kind of transformation of the filter. The use of local parametric filters means that these simple filters may be learned using a set of fewer parameters compared to standard models. Fewer parameters may also result in reduced memory usage and less computation needed during implementation.

The filter may be one of: a graduated filter, an elliptical filter and a polynomial filter. These filters are commonly used simple parametric filters which are straightforward to learn and may provide improved output images.

The filter may adjust one or more of hue, saturation, chrominance, luminance and colour channel balance. Adjustment to these characteristics of an image may result in improved image quality.

The first image may comprise data in a set of multiple colour channels and the filter may adjust fewer than all the colour channels. This may improve the speed of implementation of the image transformation due to only acting on the necessary colour channels in order to improve the image quality.

The trained algorithm may be a machine learned algorithm. Latest advances in machine learning, namely deep learning, may allow for more accurate prediction of the kinds of parametric filter which may be needed in order to produce an image with improved quality.

The trained algorithm may process the first image by identifying features therein and the device may be configured to generate the set of parameters in dependence on the identified features. By learning features it is possible to imitate the way in which a human operator may apply the filters, thus saving on human effort used to apply the same filters in the same way.

The device may comprise a memory, the memory storing in a non-transient way code executable by the processor to implement the trained algorithm and the filter. Storing the elements required locally may result in a more time and resource efficient implementation of the process to improve image quality.

The device may be configured to generate the second image by applying to the first image multiple local parametric filters each taking a subset of the set of parameters as input. Application of multiple local parametric filters each using a subset of parameters may allow for efficient application of filters to improve image quality. At least two of the multiple local parametric filters may have different filter functions from each other. Using multiple types of local parametric filters may result in an increased improvement to image quality opposed to using only one type of filter.

At least two of the multiple local parametric filters may have the same filter functions but are applied with greatest intensity at different locations in the first image. Application of the same local parametric filter function in multiple locations in the same image may provide an efficient application of filters to improve image quality. This may be achieved by focusing the different filters of the same type where they are each individually needed, rather than spreading one filter to include all areas where that type of filter is needed but also including areas of the image where it is not needed.

The first image may comprise data in a set of multiple colour channels, one of the multiple local parametric filters may adjust a first subset of the colour channels and a second of the multiple local parametric filters may adjust a second subset of the colour channels different from the first subset. Application of multiple local parametric filters each adjusting a subset of the colour channels may allow for efficient application of filters to improve image quality.

According to another aspect there is provided a method for transforming an image, the method comprising: receiving first image; processing the first image by a trained algorithm to generate a set of parameters in dependence thereon, and generating a second image by applying to the first image a local parametric filter taking the set of parameters as input. The processing the first image uses a trained algorithm to automatically predict and generate a set of parametric filters to use to enhance the image.

Prior to the said processing step, the method may comprise training the algorithm by a machine learning process. Latest advances in machine learning, namely deep learning, may allow for more accurate prediction of the kinds of parametric filter which may be needed in order to produce an image with improved quality.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:

Figure 1 (a) and 1 (b) show two different types of available parametric filter; Figure 2(a) schematically shows how an elliptical filter may be parametrized using five variables;

Figure 2(b) shows how to apply an elliptical filter to an image;

Figure 3(a) shows the standard equation of a rotated ellipse parametrized by five learnable parameters;

Figure 3(b) schematically shows how an elliptical filter may be parametrized using five variables;

Figure 3(c) shows how to apply an elliptical filter to an image;

Figure 4(a) shows a schematic diagram of a graduated filter and how it may be parametrized using three variables;

Figure 4(b) shows an example of a combination of two graduated filters to be applied to the blue channel (B channel) on the left, and that adjustment as applied to the image of the flowers on the right;

Figure 5(a) shows a cubic function dependent on pixel location within an image and pixel value, the equation defining the twenty parameters from A to T;

Figure 5(b) shows the action of the cubic filter of Figure 5(a) on the right as a heatmap based on the image on the left;

Figure 6 shows a schematic flow diagram of the regression neural network that may be used to compute the parameters of a parametric filter;

Figure 7 shows a schematic diagram of a filter-type specific fusion block;

Figure 8 shows a schematic diagram illustrating an example of fusing the output of filters using a first parallel approach;

Figure 9 shows a schematic diagram illustrating an example of fusing the output of filters using a different second parallel approach; Figure 10 shows a schematic diagram illustrating an example of fusing the output of filters using a sequential approach; and

Figure 11 shows a schematic diagram of an example of a camera configured to implement the image processor to process images taken by an image sensor in the camera.

DETAILED DESCRIPTION OF THE INVENTION

In order to achieve the benefits associated with fully-automated tools for image enhancement, there is described herein a learning-based method based on supervised-learning in the form of input and output (enhanced) image pairs.

The presently described image enhancement method is capable of enabling different types of local image edits and constitutes a method to collectively apply these in order to visually improve a provided input image. The proposed model is capable of enhancing images by learning to appropriately apply spatially local filters that draw inspiration from tools commonly used by human digital artists. Local filters afford additional fine-grained control over image edits and therefore afford refinements and detail enhancements that are not possible with global modifications alone. By constraining our model to learn how to utilize tools that are similar to those found in a digital artists' toolbox, it is possible to provide a natural form of model regularization and enable interpretable, intuitive adjustments that lead to visually pleasing results. Given suitable training data, accurate and recognizable reproduction of individual artistic styles may additionally be reproduced.

Forming part of this approach is the choice to frame this learning problem as the selection of one or more parametric filters that operate locally on the image. These learnable, parameterized components offer multiple advantages. Firstly, they align well with the intuitive, well-understood human artistic tools that are typically employed, and this naturally enables the generation of appealing results with image transforms that possess some degree of familiarity to human observers. Secondly, the use of parameterized local filters serves to both constrain model capacity and regularize the learning process, providing mitigation for over fitting and helping to produce pleasing results with only moderate memory space costs.

In the context of filters, a parametric filter is a filter that has a small set of parameters which are used to adjust an image. One example of such a parametric filter is an elliptical filter, which operates on an image based on the parameters of an ellipse. These parameters and an example use case are shown in Figure 2. This specific example is of image brightening with an elliptical filter, an example of a parametric filter. Figure 2(a) shows an elliptical filter, which is parametrized by five variables. These variables are: the center location (H, K), the rotation angle (T), the length of the semi-minor (B) and semi-major axes (A). Using these five adjustable parameters, the filter can be configured to have different positions and sizes, to achieve a variety of image enhancements. Inside the circle 108, image properties like brightness or colour can be adjusted using a linear scaling that goes from 100% at the center (H, K) to 0% at the edge of the circle 108. Outside the circle 108, the image may be left untouched. The parametrization of the filter allows filters that are expressive, yet simple-to-interpret as they are inspired by adjustments artists might make in photo editing software.

Figure 2(b) shows an example of using an elliptical filter to brighten a face. The specific properties assigned to the elliptical filter are applied to the face in the photograph according to the placement and dimensions of the ellipse 108. In this case the filter is a circle 108, which is a special case of an ellipse.

Additionally, the filters in the approach described herein are local in that they operate with fixed spatial extent. Similarly to the example in Figure 1 , the filter adjusts the image within the radius of the circle 108. Such processing differs from global filters, which transform all pixels in the image in the same way, independently of where they are located within the image.

In the presently described approach the parameters of the filters are predicted automatically using machine learning. One instantiation of such a learnable model is a neural network. During training the network learns how to predict filter parameters to minimize the error between the enhanced image and ground truth using source image and enhanced image pairs. When presented with a new unenhanced image at inference, the network predicts the filter parameters which are then used to adjust the image. The approach is implemented by a trained algorithm which comprises the result of training the network and enables the generation of a set of filter parameters, and the subsequent application of at least one local parametric filter using these parameters as input. That is, the trained algorithm is a machine learned algorithm.

The presently described approach makes use of deep learning by using a deep learning module for adjusting local image properties using parametric functions. A deep learning module is a reusable component of a neural network that performs a specific task. Modules can be inserted at various locations in neural networks. The presently described approach therefore can be combined with other existing networks to perform enhancement of an image. The module operates locally, i.e. with a limited spatial extent, on the image. This restricts the processing to particular regions of the image.

The use of parametric functions allows the described approach to have simple mathematical models that describe a filter’s effect on an image. These functions can be easily understood by a user as they can be visualized. They can potentially also be adjusted manually by a user in a post-processing step if required. The parametric functions implicitly provide a form of smoothness, or regularization, to the enhancement they produce. They can also be estimated efficiently by the neural network.

The present approach is described with reference to three examples of local parametric filters. However, it should be understood that the described method may be used in conjunction with many other such parametric filters.

The parameters determined to be learnable, or predictable scalar parameters, are denoted in uppercase bold font throughout the document to assist in the readers understanding.

The described approach is a deep learning method for learning the parameters of parametric functions that can locally adjust key image properties. Properties include colour channels such as, but not limited to, red (R), green (G), blue (B), as well as luminance, chrominance, hue, and saturation. In the examples that follow the adjustments are made per channel in RGB space, but the approach is not limited to that colour space only (for example, other colour spaces include CIELAB or LAB, YUV, HSL/HSV, and CMYK).

The presently described method comprises two key elements:

• The use of the local parametric filters.

• The use of a deep neural network to predict the parameters of these local parametric filters.

The learnt parametric function (also referred to herein as a "filter") can be any parametric function that can adapt image properties in a local area of the image. The parametric nature and the local action of the filter are key properties that distinguish the present method from typical methods.

In one embodiment of the adjustment performed, a filter adjusts each pixel’s values by multiplying the value of that pixel with a learnt scalar value. This scalar can assume any value between 0 and a pre-set upper bound (for example, 2, 3, 4, 5, etc.), and therefore can increase and decrease the selected adjustment on the given pixel. In other embodiments the adjustment could be performed using addition of an adjustment factor, or other such mathematical operations such as division, subtraction, or a combination thereof. One or more filters may be applied to the same image property. For example, there might be three filters operating on the luminance channel of the image in LAB colour space.

The filters may assume any parametric form so long as they adjust properties in a locally constrained region of the image. Here we describe three possible embodiments of the parametric local filters, each of which have a small number of parameters that a deep neural network may learn to predict.

The presently described approach will now be described in reference to two specific filter examples, the elliptical filter and graduated filter. In both cases there is what is called the 100% scaling area and the 0% scaling area. By this it is mean that within the 100% scaling area the predicted scaling factor S is applied 100%. This area may comprise a single pixel or a region containing multiple pixels. In other words, the pixel(s) in the 100% area are multiplied by S. In the 0% scaling area the pixels are multiplied by 1, which means there is no adjustment and the scaling factor S is applied 0% (i.e. not applied at all). The elliptical and graduated filters differ from each other in how the transition in the values of the scaling factors are made between scaling factor s and scaling factor 1 between the 100% and 0% scaling areas.

In the first example the parametric function is an elliptical function that multiplicatively adjusts pixels located inside or outside of the ellipse boundary 108. Figure 3(a) shows the standard equation of a rotated ellipse. The rotated ellipse 108 is parametrized by five learnable parameters that determine its position and orientation in an image. A defines the length of the semi-major axis. B defines the length of the semi-minor axis. H and K define the coordinates of the center of the ellipse. T defines the rotation angle of the semi-major axis Afrom horizontal in an anticlockwise direction.

In addition, a parameter S defines the scaling applied to pixels within or outside of the ellipse. Given the predicted scaling factor S, this scaling factor is decayed according to a linear or non-linear function that determines how the scaling factor is gradually modified the further the pixel is from the centre of the ellipse.

There are two possible versions of the elliptical filter. The first is when the 100% scaling area is at the centre of the ellipse 108, with the 0% scaling area outside of the ellipse. The second is when the 100% scaling area is outside of the ellipse and the 0% scaling area is at the centre of the ellipse. The second version is known as the inverted construction. The principle of the elliptical filter is to apply localized image adjustments in an elliptical manner across a region of a scene.

Figure 3(b) shows a schematic diagram of an elliptical parametric filter of the first version. The predicted scaling factor S is applied 100% at the centre (H, K). The scaling factor is then adjusted gradually until it reaches 1 at the boundary of the ellipse 108. Pixels outside the ellipse are not scaled (i.e. the scaling factor is 1).

Figure 3(c) shows an example of an elliptical (or radial) filter being applied to a photograph to adjust the redness of a man’s face. On the left is the image to be adjusted 302 with the boundary of the ellipse of the filter 108 shown on the image. On the right is a heat map 304 which shows the result of the filter on the red channel of the image 302. An arrow 306 indicates the area of the heat map 304 which represents the changes made within the filter ellipse in the red channel of the image 302 to adjust the redness of the man’s face.

In the second example the learnt parametric function is a graduated filter. The graduated filter is characterised by three parallel lines, top-most line 102, central line 104, and bottom-most Iine106, with the central line intercept C, the slope of the central line M, and the vertical offset between the lines O, defined as learnable parameters. In addition, a learnt scaling factor S is applied in the 100% scaling area. This 100% scaling area is either above the top-most Iine102 (when it may be referred to as the non-inverted version), or below the bottom-most line 106 (when it may be referred to as the inverted embodiment). Between the lines the scaling factor S is adjusted gradually in a linear manner until it reaches 1 at the boundary of the line furthest from the 100% scaling area (depending on the version used this may be the boundary of the bottom-most 106 or top-most 102 line). These learnable parameters are predicted by the deep learning network. The principle of the graduated filter is to gradually apply an adjustment (for example, a luminance change) in a linear manner across a local region of a scene.

Figure 4(a) shows a schematic diagram of a graduated parametric filter. The image height dimension is shown by the dotted line 402 on the left. The 100% scaling area is the area above the top-most line 102. In this region the selected adjustment with scale factor S is applied with scale factor S (for example, increasing or decreasing luminance). Between the lines the scale factor S is linearly adjusted to 50% at the central line 104, and to 0% at the bottom-most line 106. The steepness of that adjustment is dependent on the spacing between the lines, for example closer lines indicate a sharper adjustment. The 0% scaling area leaves pixels in that region untouched (i.e. the scaling factor is 1).

Figure 4(b) shows an example of a combination of two graduated filters to be applied to the blue channel (B channel) in the form of a heat map 404 on the left, and that adjustment as applied to the image of the flowers 406 on the right.

In a further example, the local parametric filter could be a cubic function dependent on pixel location and pixel value, the equation defining the parameters for which can be seen in Figure 5(a). There are twenty learnable parameters (denoted A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, and T), that define the four dimensional surface relating pixel coordinates and value to the adjustment factor that is added to the input image (for example, the values of luminance, colour, saturation etc.). Other versions of this filter could be characterized by an order of polynomial, for example linear, quadratic, quartic etc. However, the cubic filter is found to offer very good image quality while mitigating the concern of being too flexible, in essence overfitting to the data. The cubic filter function effectively defines a smooth surface over the image that smoothly varies the adjustment scale factors across the image. The surface can be applied to any type of pixel value e.g. intensity (luminance), colour, saturation, hue etc. In contrast to the graduated and radial filters there is no 100% and 0% areas for this particular filter. Furthermore, the computed adjustments are applied additively to the image in the best performing formulation of the cubic filter.

For all filter types, the output of each filter is a two-dimensional adjustment map of the same width and height as the image. At each pixel location in the adjustment map there is a scalar value produced by the filter, referred to as a scaling factor, which is used to directly replace or adjust that pixel in the corresponding image. There are many mechanisms for using the adjustment map to modify the image, the simplest being a multiplication with the corresponding pixels in the image. However, there are other methods which we discuss herein below. The adjustment map may be represented visually herein by a heat map.

Figure 5(a) shows the cubic filter equation, where x and y are the horizontal and vertical coordinates of the pixel within the image, and z is the pixel value (for example the luminance, hue, or saturation). The output of the function F(x,y,z) is added to the image to produce the adjusted image. This equation defines a smoothly varying four-dimensional surface.

Figure 5(b) shows the action of the cubic filter as a heat map 504 for the image 502 to the left. Different colours or temperatures on the scale in the heat map 504 can be used to indicate the strength of the adjustment applied to the image 502 by the cubic filter at each pixel location. In this example the adjustment is made in the blue channel of the image 502. Locations near to each other within the image are modified with more similar scaling factors, ensuring the resulting image has adjustments that are locally spatially smooth. That is to say that sharp changes in the level of application of the filter within the image are avoided.

Some specific examples of local parametric filters have been defined and discussed above. Each filter has a set of parameters that can be learnt, which define the specific form of that filter, and which leads to the best image enhancement result for the task at hand. For example, we can learn where to place a radial filter, its size, and the strength of the adjustment it makes in the image. The parameters are learnt by training a deep neural network based on a training dataset made up of example input and groundtruth pairs of images. In this scenario, the input image is the image to be adjusted, and the groundtruth image specifies the outcome which is desired to be achieved by applying one or more of the parameter filters to that input image.

There will now be described a specific deep learning framework which can be used to regress the filter parameters in this supervised learning setup.

Figure 6 shows a schematic diagram of the regression neural network that may be used to compute the parameters of a parametric filter. It should be appreciated that any regression neural network architecture could conceptually be used to infer the parameters, so the described method is not limited to using this particular example of the architecture.

The neural network receives a tensor or feature tensor consisting of convolutional neural network features of shape H x W x C. Where H is the height spatial dimension, W is the width spatial dimension, and C is the channel (feature) dimension. This feature tensor can be supplied by any backbone neural network that is capable of producing a set of convolutional feature maps, such as a U-Net, a ResNet etc. Once given these features the method applies an alternating set of convolutional and max pooling operations to downsample the feature tensor in the spatial dimensions H and W, giving a feature map of shape H/128, W/128, C. This feature map captures a global, holistic understanding of the image content. The downsampled feature tensor is input into a global average pooling layer that collapses the spatial dimensions to H=1, W=1 , followed by a fully connected layer that regresses the N parameters of the parametric function.

Figure 6 shows a flow chart 600 of the regression neural network that predicts (and learns) the N parameters of a parametric filter. A convolutional feature map 602 is input from a backbone network (e.g. a U-Net as described in Ronneberger et al. , "U-net: Convolutional networks for biomedical image segmentation", International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015, or a ResNet as described in He, Kaiming et al. "Deep Residual Learning for Image Recognition", CVPR, 2016). The feature map is passed through a sequence of convolutional layers and max pooling operators 604 that have the effect of downsampling the feature map 606. A global average pooling layer 608 collapses the spatial dimensions and the resulting feature map 610 is fed into a fully connected layer 612 that regresses the N parameters.

An image in any colour space (for example, HSV, RGB, LAB, etc.) has three channels. There may be one or more local filters applied to each channel. To mix together the contributions of multiple filters of the same type per channel, this approach proposes a filter fusion architecture, an example of which is shown in Figure 7. In a simple example of this fusion architecture, the output of the filters are multiplied together. The outcome of this multiplication is then applied to the original image to perform the overall modification. In another example the output might be added to the original image. In a further example the fusion might take place by learning how to fuse together the different filters, for example by using learnt scalars which are applied per filter.

Figure 7 shows a schematic diagram of a filter-type specific fusion block. Fusing the output of filters of the same type. Deep neural network architecture for fusing together the image enhancement contributions of multiple filters for a given channel of an image. In this example, three filters 702, 704, and 706, are learnt for the R channel of an RGB image. Mixing of the filters, in this case, is performed by multiplying together 708 the scaling factors for each filter on a pixel-by-pixel basis.

An image might also have multiple different types of filters per channel: for example, three elliptical filters and two graduated filters, and one cubic filter operating on the same channel (e.g. the R channel in RGB colour space). The contributions of this mixture of different filter types is applied in two steps. First, the output of filters of the same type are combined using the fusion described previously and shown in the example of Figure 7. Second, given the fused results from filters of the same type, the method then mixes those contributions to get an overall adjustment to apply to the image. This mixing could take place in a parallel or sequential fashion, differentiated by whether the modifications are applied all at once to the image (a parallel approach), or in a pipeline where one filter-type specific modification is applied after another, with results from one step feeding into the next (a sequential approach). In the parallel approach, one example may be an unweighted multiplication of the fused output from each filter type, an example which is shown in Figure 8. The mixing could also take place by learning a set of scalars for combining the fused results before the multiplication or addition. It should be appreciated that there are many alternative ways of combining the different contributions of multiple filters. For example, an addition of the fused output from each filter type could be used to obtain one overall adjustment map to be applied to the input image. Alternatively or additionally, the fused results from a subset of the filter types (e.g. graduated, elliptical) could be multiplied together. Following this, the fused result from another filter type (e.g. cubic) could be added to the input image and this adjusted image then multiplied by the fused adjustment map arising from the graduated and elliptical filters. It should eb understood that fusion functions (addition, multiplication, subtraction, division, etc), could be used in any combination to obtain the desired adjustment map. Figure 9 shows an example of combining different filter results using different types of fusion.

Figure 8 illustrates in a schematic diagram an example of fusing the output of filters using a parallel approach. In this example, the fused output from three different filter types, elliptical 802, graduated 804, and cubic 806; received from a filter type specific fusion block 808; are combined to form an overall image adjustment map. This overall adjustment is obtained by multiplying together 810 the adjustment maps from the three different filters. The adjustment map obtained is then applied to the image (in this case by a final multiplication) to obtain the enhanced result 812.

Figure 9 illustrates in a schematic diagram an example of fusing the output of filters using a different parallel approach. In this example, the fused output from three different filter types, elliptical 802, graduated 804, and cubic 806; received from the filter type specific fusion block 808; are combined to form an overall image adjustment map. This overall adjustment is obtained by multiplying together 902 the two adjustment maps 802 and 804 from one or more elliptical and graduated filters. The result 904 is multiplied 906 by the image given 908 by adding 910 the cubic adjustment map 806 to the input image 912.

Alternatively, the fused results per filter type could be applied individually to the image in a sequential manner, that is one at a time as shown in Figure 10. For example, the fused result of three elliptical filters could be applied to the input image, the result of this then adjusted by the fused result of two graduated filters (i.e. applied to the output image from the previous application of the elliptical filters), and the result of this then adjusted with the cubic filter. That is, each fusion result of a subsequent filter type is applied to the image, one after the other, until all of the filters have been applied to the image in some form or combination. Figure 10 illustrates in a schematic diagram an example of fusing the output of filters using a sequential approach. In this example, the fused output from three different filter types, elliptical 802, graduated 804, and cubic 806; received from the filter type specific fusion block; are combined by sequentially applying each to the input image. The output of one adjustment is fed into the next adjustment.

There is also the possibility of combining sequential and parallel fusion approaches. For example, the image adjusted by the adjustment map of the cubic filter could be fed into a parallel fusion architecture that combines the elliptical and graduated adjustment maps multiplicatively. The result of which could then be multiplied by the enhanced image from the cubic filtering stage.

Figure 11 shows a schematic diagram of an example of a camera configured to implement the image processor to process images taken by an image sensor 1102 in the camera 1101. Such a camera 1101 typically includes some onboard processing capability. This could be provided by the processor 1104. The processor 1104 could also be used for the essential functions of the device. The camera typically also comprises a memory 1103.

The transceiver 1105 is capable of communicating over a network with other entities 1110, 1111. Those entities may be physically remote from the camera 1101. The network may be a publicly accessible network such as the internet. The entities 1110, 1111 may be based in the cloud. In one example, entity 1110 is a computing entity and entity 1111 is a command and control entity. These entities are logical entities. In practice they may each be provided by one or more physical devices such as servers and datastores, and the functions of two or more of the entities may be provided by a single physical device. Each physical device implementing an entity comprises a processor and a memory. The devices may also comprise a transceiver for transmitting and receiving data to and from the transceiver 1105 of camera 1101. The memory stores in a non-transient way code that is executable by the processor to implement the respective entity in the manner described herein.

The command and control entity 1111 may train the artificial intelligence models used in each module of the system. This is typically a computationally intensive task, even though the resulting model may be efficiently described, so it may be efficient for the development of the algorithm to be performed in the cloud, where it can be anticipated that significant energy and computing resource is available. It can be anticipated that this is more efficient than forming such a model at a typical camera. In one implementation, once the deep learning algorithms have been developed in the cloud, the command and control entity can automatically form a corresponding model and cause it to be transmitted to the relevant camera device. In this example, the system is implemented at the camera 1101 by processor 1104.

In another possible implementation, an image may be captured by the camera sensor 1102 and the image data may be sent by the transceiver 1105 to the cloud for processing in the system. The resulting target image could then be sent back to the camera 1101 , as shown at 1112 in Figure 11.

Therefore, the method may be deployed in multiple ways, for example in the cloud, on the device, or alternatively in dedicated hardware. As indicated above, the cloud facility could perform training to develop new algorithms or refine existing ones. Depending on the compute capability near to the data corpus, the training could either be undertaken close to the source data, or could be undertaken in the cloud, e.g. using an inference engine. The system may also be implemented at the camera, in a dedicated piece of hardware, or in the cloud.

It should be appreciated that the term enhancement is a known term or art in the field of image processing and does not necessarily imply any improvement in an objective sense. That is, the image may be enhanced without a definitive improvement in objective or subjective qualities of the image being discernible. The term ‘transforming’ the image has therefore been used synonymously with the term ‘enhancing’ the image throughout the present document.

The above proposed method has the following advantages:

The approach may be automated through machine learning. The described approach uses the latest advances in machine learning, namely deep learning, to automatically predict parametric filters to enhance an image. The automation afforded through machine learning saves human effort in adjusting images.

The approach may result in high image quality. The local parametric filters learned by the described approach produce high image quality.

The approach provides a human inspired and interpretable image editing process. The filters in this approach are inspired by image editing tools provided to users in software programs like Photoshop and Lightroom. The filters applied to the image can be easily understood by a user. This differs from many neural network methods that act as a “black box”. Thai is, by taking an image as input and producing an enhanced image as output, but with limited exposure of the inner workings of the network to the user. Human interpretable image editing has several advantages. The learned filters can be visualized so the user can understand how the described approach is enhancing the image. Further, it provides a mechanism for the user to adjust the processing manually as a post-process by interacting with the predicted filters.

The approach results in parameter efficiency. Neural networks typically rely on many weights to transform the input to the output. The simple parametric filters can be learned with a set of fewer weights compared to standard neural network models, yet in some implementations, this may still achieve superior results. Practically, fewer weights may result in reduced memory usage and computation during inference. That is, the neural network has a set of model ‘parameters’ or model weights. These parameters are learned during model training. A trained model is able to predict the correct filter parameters that are used by a parametric image filter to enhance an image.

The approach provides image processing with built-in regularization. The parametric filters are spatially smooth, providing gradual adjustment to the image. Therefore, the adjustment applied to the image is also smooth and avoids obvious seams or irregularities.

The approach provides a reusable module. The described approach can be considered a reusable neural network module. It could be combined in a myriad of ways into other networks to enhance an image.

The approach provides an extensible image adjustment process. Three parametric filters are specifically considered herein. However, the framework is extensible to other parametric filters, particularly those which operate locally on the image. For example, the frameworks may also be combined with global filters as well.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1. An image processing device (1101 ) for transforming an image, the device comprising a processor (1104) configured to implement a trained algorithm to process a received first image and generate a set of parameters (A, B, C, T, H, O, K, M) in dependence thereon, and to generate a second image by applying to the first image a local parametric filter taking the set of parameters as input.

2. An image processing device as claimed in claim 1 , wherein the local parametric filter is configured to apply different kinds of transformation to different regions of the first image and the set of parameters includes one or more parameters governing where in the first image the filter is applied for a specific kind of transformation.

3. An image processing device as claimed in claim 1 or 2, wherein a specific kind of transformation of the local parametric filter is dependent on parameters input to the filter, and the set of parameters includes one or more parameters governing that specific kind of transformation of the filter.

4. An image processing device as claimed in any preceding claim, wherein the filter is one of: a graduated filter, an elliptical filter and a polynomial filter.

5. An image processing device as claimed in any preceding claim, wherein the filter adjusts one or more of hue, saturation, chrominance, luminance and colour channel balance.

6. An image processing device as claimed in any preceding claim, wherein the first image comprises data in a set of multiple colour channels and the filter adjusts fewer than all the colour channels.

7. An image processing device as claimed in any preceding claim, wherein the trained algorithm is a machine learned algorithm.

8. An image processing device as claimed in any preceding claim, wherein the trained algorithm processes the first image by identifying features therein and the device is configured to generate the set of parameters in dependence on the identified features.

9. An image processing device as claimed in any preceding claim, wherein the device comprises a memory, the memory storing in a non-transient way code executable by the processor to implement the trained algorithm and the filter.

10. An image processing device as claimed in any preceding claim, wherein the device is configured to generate the second image by applying to the first image multiple local parametric filters each taking a subset of the set of parameters as input.

11. An image processing device as claimed in claim 10, wherein at least two of the multiple local parametric filters have different filter functions from each other.

12. An image processing device as claimed in claim 10 or 11, wherein at least two of the multiple local parametric filters have the same filter functions but are applied with greatest intensity at different locations in the first image.

13. An image processing device as claimed in any of claims 10 to 12, wherein the first image comprises data in a set of multiple colour channels, one of the multiple local parametric filters adjusts a first subset of the colour channels and a second of the multiple local parametric filters adjusts a second subset of the colour channels different from the first subset.

14. A method for transforming an image, the method comprising: receiving first image; processing the first image by a trained algorithm to generate a set of parameters (A, B, C, T, H, O, K, M) in dependence thereon, and generating a second image by applying to the first image a local parametric filter taking the set of parameters as input.

15. A method as claimed in claim 14, the method comprising, prior to the said processing step, training the algorithm by a machine learning process.