GB2585722A

GB2585722A - Image manipulation

Info

Publication number: GB2585722A
Application number: GB1917038.0A
Authority: GB
Inventors: Saá-Garriga Albert; Vandini Alessandro; Larreche Antoine; Rabbani Shah Javed; Sharma Prashant
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2019-05-17
Filing date: 2019-11-22
Publication date: 2021-01-20
Anticipated expiration: 2039-11-22
Also published as: GB201917038D0; GB2585722B; GB201906992D0

Abstract

The application concerns a computer-implemented image manipulation apparatus and method 200 configured to receive an input image 202 and a desired style to be transferred to the input image. The method can obtain a representation 204 of the input image selected from a plurality of stored representations of a plurality of images, wherein each said representation comprises data describing a set of image features. The method can modify at least one of the set of image features in the obtained representation to correspond to the input image and/or the desired style to produce a modified representation 207, and render a reference image 209 based on the modified representation. A manipulated image is generated by performing a style transfer operation 210 on the input image using the rendered reference image.

Description

Image Manipulation The present invention relates to image manipulation.

Image manipulation involves editing images using computer vision and graphics algorithms. Style transfer is a known type of image manipulation that aims to apply a desired style to an input image while preserving the original content of the input image. For example, the input image may be manipulated to adopt the appearance or visual style of another image, called the "reference image".

If a reference image is significantly different from an input image then the results of the style transfer can be of reduced quality. For example, a reference image comprising a nature landscape will not be compatible with a city landscape input image. It is therefore desirable to have a reference image that has good visual similarities to the input image.

Embodiments of the present invention aim to address at least one of the above problems and provide improved image manipulation in an efficient manner.

Given a style transfer algorithm that requires a reference image, embodiments can generate an optimal reference image for a particular input image using a statistical system. This can provide better results for a given input image and automatically provide an optimal reference for the input image.

Embodiments can obtain a statistical representation of a given input image. Embodiments can create a modified version of that statistical representation that is closer to the style of a selected desired transfer style by applying properties of a selected desired style to the statistical representation of the input image. Embodiments can then render a new image using the modified statistical representation. The rendered image can be used as a reference image in a style transfer process applied to the input image in order to provide improved results. Thus, instead of directly using a selected style transfer image as a reference image, as is conventional, embodiments can generate/render a synthetic version of the input image and use that as a reference image when a style transfer operation is performed. For instance, when the input image comprises a face, the synthetic rendering can comprise a 3D rendering of the face.

According to a first aspect of the present invention there is provided a computer-implemented image manipulation method comprising: receiving an input image; receiving a desired style to be transferred to the input image; obtaining a representation of the input image selected from a plurality of stored representations of a plurality of images, wherein each said representation comprises data describing a set of image features, modifying at least one of the set of image features in the obtained representation to correspond to the input image and/or the desired style to produce a modified representation; rendering a reference image based on the modified representation, and generating a manipulated image by performing a style transfer operation on the input image using the reference image.

The representations may comprise statistical representations, wherein the image features of the representations may comprise common features of the plurality of images identified by a content analysis method performed on the plurality of images. The content analysis method may involve a statistical system that builds the statistical representations of the plurality of images. The statistical system can comprise a machine learning technique that learns a distribution of the identified common features across the plurality of images, e.g. by using a dimensionality reduction process or the like.

The plurality of images may comprise a dataset of example images of a particular type. The input image may be of a same type as the plurality of images. Examples of the type can comprise portrait, landscape, etc. The set of image features may comprise principal features that change across the plurality of images/dataset, such as components of deformations (e.g. shape, pose and expression) for face/portrait type of images.

The step of rendering the reference image may comprise a reverse of the process used to generate the statistical representations of the images. The reference image may comprise a synthetic rendering of the input image. For example, the reference images may comprise a 3D synthetic rendering/version of a photographed face in the input image.

The step of obtaining the representation of the input image may comprise finding a said representation amongst the plurality of representations that has a greatest visual similarity to the input image, e.g. using a similarity measurement such as Nearest Neighbours.

The desired style may comprise a set of image features, and each of the image features of the desired style may have an associated value. The desired style may be based on a style image that provides the value for each of the image features. Each of the image features of the obtained representation may have an associated value describing a property of the image feature (e.g. lighting conditions; pose of subject; identity of subject; morphological factors; emotion, etc). The step of modifying the at least one of the set of image features in the obtained representation may comprise modifying the value of the image feature of the obtained representation to correspond to a said value of a corresponding said image feature in the input image and/or the desired style.

The method may further comprise outputting the manipulated image, e.g. for storage or transfer.

According to another aspect of the present invention there is provided apparatus configured to perform image manipulation, the apparatus comprising: a processor configured to: receive an input image; receive a desired style to be transferred to the input image; obtain a representation of the input image selected from a plurality of representations of a plurality of images, wherein each said representation comprises data describing a set of image features, modify at least one of the set of image features in the obtained representation to correspond to the input image and/or the desired style to produce a modified representation; render a reference image based on the modified representation, and generate a manipulated image by performing a style transfer operation on the input image using the reference image.

The apparatus may comprise a mobile computing device, such as a smartphone.

According to another aspect of the present invention there is provided computer readable medium (or circuitry) storing a computer program to operate an image manipulation method substantially as described herein.

Correct lighting configurations can be fundamental for capturing studio quality selfies. Post-processing techniques have been proposed for assisting novice users to edit selfies light properties. Both Al-based (Sun, Tiancheng, et al. "Single Image Portrait Relighting." arXiv preprint arXiv:1905.00824 (2019)) and non Al-based techniques have been explored for this task. Similarly, make-up transfer techniques have been introduced for improving selfies. These techniques are based on Generative Adversarial Networks (Chang, Huiwen, et al. "Pairedcyclegan: Asymmetric style transfer for applying and removing makeup." Proceedings of the IEEE CVPR(2018)) which transfers a reference make-up style to the desired selfie. In addition, landscape editing methods using stylization allow realistic transfer of a reference style to a desired image (Li, Yijun, et al. "A closed-form solution to photorealistic image stylization." ECCV. 2018). In some embodiments a proposed color editing algorithm may be able to transfer color properties from a reference image to a desired image obtaining realistic image manipulation results for both selfie and landscape photographs. For this purpose, the algorithm may transfer colors only between regions of the reference and desired images that are semantically similar. This semantic information may be further used to select only suitable reference images for a given input image. Additionally, an upscaling algorithm based on linear regression which allows real-time computation of the solution on embedded devices may be used.

In some cases, the color editing algorithm may be based on mass transport. Given a color distribution from a reference image and a distribution from a desired image, the mass transport problem can comprise the minimization of the cost that occurs in modifying the reference distribution in order to match the desired distribution. This formulation has been used On Bonneel, Nicolas, et al. "Sliced and radon wasserstein barycenters of measures." Journal of Mathematical Imaging and Vision 51.1 (2015): 22-45, and Pitie, Francois, Anil C. Kokaram, and Rozenn Dahyot. "N-dimensional probability density function transfer and its application to color transfer." Tenth IEEE International Conference on Computer Vision (ICCV105) Volume 1. Vol. 2. IEEE, 2005) for performing color transfer in 3D and N-Dimensions. In some cases, the system can incorporate new dimensions to the mass transport problem, which can involve certain employed dimensions, i.e. maps, etc. An upsampling technique may also be used to speed up the computation of the mass transport. The upsampling can allow the mass transport to be performed on low resolution inputs, making the algorithm tractable for real-time mobile applications. The result of the mass transport can therefore be upscaled to the original input resolution by performing a multivariate linear regression. Alternative models have been proposed (Chen, Jiawen, et al. "Bilateral guided upsampling." ACM Transactions on Graphics (TOG) 35.6 (2016): 203) for learning affine transformations. The method can be divided into 3 different stages.

The maps used can include color channels, surface normal, position and semantic segmentation. Shu, Zhixin, et al. "Portrait lighting transfer using a mass transport approach." ACM Transactions on Graphics (TOG) 37.1 (2018): 2 provides details regarding the surface normal and position maps. Depending on the use case, different semantic maps are used. Each semantic map is represented by a confidence score in contrast to a binary representation (Li, Yijun, et al. "A closed-form solution to photorealistic image stylization." Proceedings of the European Conference on Computer Vision (ECCV). 2018). This factor can improve results in the presence of hard boundaries. In a selfie, 3 facial semantic maps can be considered: lips, right eye region and left eye region. These segmentations maps can be generated by combining color segmentation with morphological operators on detected face landmarks. In a landscape, 5 semantic maps can be considered: person, nature, man-made, sky and water. During candidate proposal, automatic candidate proposal can assess the user to select only suitable reference images for the given input image, e.g. nature landscape are not compatible with city landscape. Existing approaches don't provide a solution to this problem, and so two different methods for selfie and landscape images may be used. For a selfie. lighting similarity between input and reference can be the criteria considered for selecting candidate. The lighting similarity score may be based on comparing shadow regions between the two images. For this purpose, the face can be divided into four regions, i.e. top/down and left/right. Color segmentation can be applied to these region based on statistical thresholding. The number of masked pixels per region can be used to classify the image in one of the shadow orientations. In a landscape, candidates are selected according to an image retrieval results. A histogram of classes in the segmentation image can be calculated and the reference images is scored according to the distance to that histogram.

In view of the features described herein, the computational time of the mass transport can be decreased up to 360x. Despite the computational cost of combining all the steps, the method can present an interactive Ul with limited latency for the user. It can calculate asynchronously the maps generation of input and reference images and precompute the static ones. If the user includes a new input image then the method can calculate its maps while he selects the preferred reference (with a margin of 200ms). In the case of a new reference the user will have to wait 200ms. To allow users to interact with existing reference images the method can enable a sketch tool to draw on top of the images. This will just modify the color maps and a recalculation of the segmentation maps will not be needed. The solution may be extended to videos. For instance, for Keyframe 1, scene transfer can be performed; for Frame 2...X, regression can be performed; for Keyframe 2, scene transfer can be performed; for Frame X+2...Y, regression; for Keyframe 3, scene transfer, followed by regression.

The results of the mass transport of the initial frame can be propagated to the consecutive ones. The propagation can be done from key frame to key frame. A scene change detector can be triggered to separate key frames.

As mentioned above, the method can divide execution time in different steps. The execution time can be divided into different steps. These steps can be parallelized, making the app run in interaction time. The solution can accelerate the process to more than 360x (even considering that we added five new maps to the mass transport). The method can provide a real-time on-device color editing tool that produces realistic results. The pipeline can speed up considerably, i.e. 360x, similar state-of-the-art solutions enabling an interactive Ul with limited latency for the user. For this purpose, an upscaling algorithm based on linear regression can be used. In addition, the mass transport can be enriched with new maps that incorporate image semantic information. Different color spaces and map's weight could be further investigated to allow the user to have more control on the obtained results.

In some cases in a map generation step (which can include probabilistic segmentation), maps can be extracted from input images. After this, in a candidate proposal step (which can include SoftMax segregation and histogram distance computations), suitable references are automatically selected for the image input. After this in a real-time color transfer step (which can include resealing, SMART subsampling, Mass transport and regression processes), the maps and the references can be used for color transferring.

In some cases, light conditions from a reference can be transferred to the input. The input image can be received by a portrait geometry and light estimation step, which can involve 3D face geometry, highlight removal, segmentation and shadow analysis processes. Then, intelligent candidates may be selected. Then, portrait light editing may be performed to produce the final output, which can involve resealing, mass transport and regression processes In some cases, one or more of the following techniques may be used: add segmentation maps to the colour transfer transformations; the segmentation may be included in the transformation as logits so that each pixel can have the probability of belonging to many classes (this can reduce hard boundaries in the transformation); acceleration of the process 360x times by subscaling 20x the original input and upscale the mass transport results to the final resolution using a linear regression mode; make input and reference maps calculations asynchronous so the final app can remain interactive; some image retrieval techniques; solutions extended to video (propagating the results of mass transport of the initial frame to the consecutive ones; the propagation can be done from key frame to key frame, and a scene change detector can be triggered to separate key frames); best reference selection an be executed every keyframe).

In some cases through use of: 1) a statistical system, the method may be able to create a representation of a given input or reference; 2) an image retrieval system, the method may be able to create/find a representation similar to the given input or reference. The representation may be semantically parable with each of the components representing information about the input. Thus, the method/system can be used to find relatable references, or create new references through the statistical system by manipulating the original input to create a modified version of it that is closer to the desired.

In some cases, an intelligent candidate generation may use a metric. An example of such metrics can involve the input image have a scene recognition process applied to it. A scene simplification technique may then be used, where the image (including inputs such as a candidate line) is processed by a VGG16 network to produce a semantic line. Alternatively, a code vector process (D1, D2, DN) may then be used. Alternatively, segmentation and class distribution histogram processes may then be used. Finally, a K-NN matching process may be used to produce output images.

According to the present invention, there is provided a method and apparatus as set forth in the appended claims. Other features of the invention will be apparent form the dependent claims, and the description which follows.

For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the accompanying diagrammatic drawings in which: Figure 1 schematically illustrates an example computing device configured to execute an embodiment; Figure 2 is a flowchart outlining example steps performed by an embodiment, including generating a reference image for a style transfer operation; Figure 3 is a flowchart further showing steps involved in generating the reference image; Figure 4 illustrates an example reference image being generated, and Figure 5 illustrates an example use case.

Figure 1 is a block diagram of a computing device 100 configurable to execute embodiments of the image manipulation method. The device will normally comprise, or be associated with, at least a processor 102, memory 104, a wireless communication unit 106 and a user interface component 108. Examples of suitable devices include mobile telephones/smartphones, tablet computers and other types of, typically mobile/portable/handheld, computing/communications devices. The user component interface 108 may comprise a touchscreen in some embodiments. Other components and features of the device will be well-known to the skilled person and need not be described herein in detail.

Figure 2 is a flowchart showing steps of an embodiment of the image manipulation method 200 that can be performed by means of software instructions being executed by the device 100. It will be appreciated that at least some of the steps shown in the Figures herein may be re-ordered or omitted. One or more additional steps may be performed in some cases. Further, although the steps are shown as being performed in sequence in the Figures, in alternative embodiments at least some of them may be performed concurrently. It will also be understood that embodiments can be implemented using any suitable software, programming language, data editors, etc, and may be represented/stored/processed using any suitable data structures and formats.

The method will typically be invoked when there is a need for the device 100 to generate a manipulated image. For instance, a user may cause an application executing on the device 100 to start performing the method 200. The method may be implemented as a feature of a multi-function application (e.g. a photo gallery, image sharing or image editing application), or it may be a stand-alone application. The method may display a user interface on the touchscreen 108 of the device that can prompt the user for input and also display outputs of the method.

At step 202 the user selects an input image that is to be used to generate a manipulated image. The user may select an image from a plurality of stored images. The images may be stored in the memory 104 of the device and/or in at least one other storage location (e.g. an external storage device, such as a removable storage media), or may be accessed over a network connection (e.g. using a Cloud-based service). The input image may be selected in any suitable manner. For example, the user may be presented with thumbnail versions of the plurality of images that can be chosen via the touchscreen 108, or the user may search for a specific image in some other manner, e.g. by using text-based tags, time/date stamps, etc, associated with the images. Alternatively, the user may use a camera application or the like to produce a new input image that is selected for manipulation.

The user can also select a desired transfer style to be applied to the input image at step 202. In some cases the desired transfer style may be based on image features/properties (e.g. lighting properties, colour properties, shadow properties, etc) of an existing image. Each image feature can have an associated value that can be processed so that embodiments may manipulate the input image so that such properties of the desired style are applied to it (so that it gains a similar appearance or visual style to the desired style) whilst preserving original content of the input image.

Figure 5 shows an example of the method 200 in use on the device 100, including a user interface that is useable to select a desired style to be transferred to an input image 502. It will be understood that the form and design of the user interface shown herein is exemplary only and many variations are possible.

The user interface can include at least one interactive element 504 (e.g. menu, slider, numerical input box, etc) that can allow the desired style (and, optionally, at least one other parameter used by the method 200) to be selected and/or changed by the user. For example, the interactive element may present the user with a selection of styles that are labelled textually and/or graphically. In one example, the desired styles may comprise a set of different lighting conditions, e.g. conditions representing lighting conditions at different hours/times of the day.

Thus, each of the desired styles may comprise data describing preset modifications of the image features, i.e. containing at least one value that can be used to replace a value associated with the image feature in the representation of input image (the input feature vector representation).

In other embodiments the user may select a desired style based on a transfer image that he selects from a plurality of stored style transfer images. Thumbnails of these may be displayed for selection at the user interface item 504. The plurality of images may be of a same/similar type as the input image. In one example the input image 502 comprises a portrait of a human face, which may be a selfie taken by the user, and the style transfer images 504 comprise other portrait images of human faces. The style transfer images may be a subset of a larger library of images that are selected by the method. In some embodiments the style transfer images may be selected as ones having a same/similar type, or category, as the input image. For example, the type of the input image may be identified as a "portrait". The identification of the type of an image may either be done automatically by the method (e.g. using image recognition techniques, including scene classification algorithms, such as CNN, DNN or the like), and/or may involve user input (e.g. the user selects the type of the image from a menu of options). At least some of the images in the library that are associated with data identifying their type as a portrait may then be made available for selection via the user interface in this example.

After the user selects a particular desired style using the interface item 504 an output image 508 is generated. The output image 508 comprises a manipulated version of the input image 502 having a style of the selected desired style applied to it by means of an embodiment of the method described herein.

It will be understood that the method 200 can be used with a different type of image. For example, the input image can comprise a natural landscape and the plurality of selectable style transfer images can comprise images of various types of natural landscapes. Specific examples of style transfer images can be labelled as night, day, water, snow, sunrise and summer. Upon a user selecting a particular style image (and, optionally, setting at least one other parameter), an embodiment can generate an output image comprising a manipulated version of the input landscape image having a style of the selected style image applied to it.

Thus, it will be understood that embodiments are not limited to portraits and landscapes and other types of input images and style transfer images. A non-limiting set of examples of other image types that embodiments can process includes: landscapes, interiors, sporting events, performances, groups, animals, or even various sub-types of these.

Returning to Figure 2, at step 204 embodiments can obtain data comprising a representation of the input image. The representation may be based on statistical data that has been built for a type/category (e.g. portrait, landscape, etc) of image that corresponds to the type of input image. The building of the representation data may be performed by at least one other computing device prior to the execution of the image manipulation method 200 on the device 100, with resulting data then being made available to the image manipulation method (e.g. transferred to and stored in the memory 104).

To build statistical representation data for a particular type of image, a large number of representative examples of images of that type may be obtained from any suitable source, such as an image library. These images can be analysed using a statistical system being executed on the at least one other computing device that can use a content analysis method to identify common features of the images. For example, for portrait type images, face inverse rendering can be used to identify common image features, such as eyes, nose, lips, etc. For landscape type images, the common features can include particular type of mountains, trees, bodies of water, and so on.

A known example of how to build a statistical representation is described in Li, Tianye et al: "Learning a model of facial shape and expression from 4D scans", ACM Transactions on Graphics, 36(6), 194:1-194:17, 2017/11/20, the contents of which are incorporated herein by reference. That publication gives an example of how to generate a statistical representation/model for face type images that is achieved by learning the principal components of deformations of a face (e.g. shape, pose and expression) based on an acquired 3D face dataset. It will be understood that representation data can be built in a similar manner for different types of images by parameterizing them through specialized image features (e.g. orientation of gradients, manufactured features, physical models, etc). For instance, any suitable method of content analysis, such as oriented gradients, crafted features, edge detection, etc, can be used to identify features of any type of image. Embodiments can use a reversible type of model/method to generate the representation data so that it can also be used to categorize input images and generate reference images (as will be described below).

The system used to build the image type representation data may comprise a machine learning technique that learns the distribution of the identified features across the examples, e.g. by means of a dimensionality reduction process, or the like. Some embodiments use Principal Component Analysis (PCA) for this, but it should be understood that other suitable techniques, such as Disentangled VAEs Auto-encoders, Sparse coding, etc, can be used. The system can output data describing the main features of the example images and the associated distributions across those features. Thus, the representation data can comprise a set of identified features, and data describing common examples of each of those features found in the example image dataset. A simplified example of the representation data is illustrated in the table below: Image type: Portrait Identified Features Values Eye colour Blue 1, blue 2, green 1, green 2, etc Eye size EX1 x EY1, EX2 x EY2, etc Mouth size MX1 x MY1, MX2 x MY2, etc Once a statistical representation(s) of one or more types of images has/have been built in the above manner, the resulting data can be provided to/accessed by a statistical system 203 that is part of the image manipulation method 200 being executed by the device 100. That statistical system 203 can receive the input image 202 (that is of one of the represented types) and output a statistical representation 204 of it in the relevant feature space of the representations/model (e.g. as a vector of floats). Embodiments can use any method of similarity measurement (such as Nearest Neighbours) to find potential candidates within the representations that are closely related to the input image in terms of visual similarity of features.

Thus, embodiments can select the examples of features from the representation data that most closely resemble those of the input image. Embodiments can also measure where any difference lies between that chosen example and the input image.

A simplified example of the statistical representation 204 of a particular input image (of the portrait type and based on the example table above) is illustrated in the table below: Representation of input image (type: portrait) Feature Value Eye colour Blue 2 Eye size EX2 x EY2 Mouth size MX1 x MY1 In some embodiments the representation of the input image may comprise a vector of numbers that represents where the input image lies in the distribution of features extracted by the statistical system. In the example where the statistical system is based on face features (e.g. eyes, lips, etc) the features can be directly linked to the 3D face of the user present in the picture.

In other cases, this can be based on different features (e.g. lighting conditions, general mood of subject, etc) that do not link directly to 3D features. Additional information can be gathered through the statistical representation of the image and this may be done automatically as the input image is selected. This can also be done prior to that, e.g. as the image is saved in a gallery. The statistical representation of the input can be a vector of floats that is much smaller in size than the image and so can be stored with minimal cost in terms of memory.

Figure 3 schematically illustrates examples of items 202, 204, 206 and 207 of Figure 2 for a face/portrait type input image. Features (e.g. morphological factors, emotion deformation, lighting, skin parameters, pose parameters) that were identified in the image type representation data based on a statistical sample of portraits/faces are identified in the input image. Values describing these identified features are then extracted to obtain the input image representation. Specific modifications based on the expected result (e.g. cinematic lighting, sharper pose, emotion correction,etc.) can then be made to at least some of these values.

Figure 4 illustrates an example of using a statistical system, such as face inverse rendering techniques where face properties, such as light, are determined from an input image. These values are modified in order to create a suitable reference image given a transfer style input (e.g. keep identity and morphology while changing the lighting and emotions). Typically, the method will render one reference image based on the selected configurations/settings, although it is possible to generate more than one, based on different configurations/settings (e.g. step 208A), in some cases, at least one of which can then be selected by the user for further processing.

Thus, the statistical system used to build and process the representation data can be composed of a machine learning system that learns the distribution of the selected/identified features across examples (done before execution of the image manipulation method 200), and a generative model that the image manipulation method 200 can use to obtain a representation of the input image 202 and render a reference image based on it (this can be the same system in face/portrait type image embodiments at least).

Returning to Figure 2, at step 206 embodiments can modify the obtained statistical representation of the input image so that it is closer to the input image and/or the style transfer image selected at the step 202. For instance, if the value of the feature "eye colour in the obtained representation of the input image is "Blue2" and the value of the corresponding feature in the input image (or the desired style) is "Blue1" then the value of the feature "eye colour?' in the modified statistical representation may be modified to "Bluel". Embodiments may also edit intrinsic features of the representations of the image, such as light properties, for example. This can allow embodiments to subsequently render a synthetic image of the person with the different light conditions, for instance, and use that as a reference image.

A simplified example of a modified statistical representation 207 of the input image (of the portrait type and based on the example table above) is illustrated in the table below: Modified representation of input image (type: portrait) Feature Modified value Eye colour Blue 1 (modified -1) Eye size EX2 x EY2 (unmodified) Mouth size MX2 x MY2 (modified) The modified representation 207 output by step 206 can comprise a vector of floats that represents where the input image fits on the learned representation of the features. These numbers can then be tweaked manually (e.g. by means of user interface items) if they differ too much from the expected result.

At step 208 embodiments can use the modified input representation to render a reference image 209. The rendering may comprise a reverse of the process used to generate the statistical representation of the image. An example of a reversible process suitable for portrait types images is face rendering, which can result in a 3D rendered synthetic version of the photographed face in the input image 202. Embodiments may involve differential rendering in a machine learning framework.

At step 210 embodiments can use the reference image 209 to perform a style transfer operation on the input image. Examples of suitable style transfer operations that can use the input image and rendered reference image as inputs include GANs, Generative Models, or Histogram Transfer, but it will be understood that other techniques may be used. For instance, some embodiments may use trained neural networks to perform the style transfer operation. Following the completion of the style transfer step 210 a user may perform a further action using the manipulated image, e.g. select it for storage or sharing using a menu, for example.

It is understood that according to an exemplary embodiment, a computer readable medium storing a computer program to operate a method according to the foregoing embodiments is provided.

Attention is directed to any papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.

All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.

Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

The invention is not restricted to the details of the foregoing embodiment(s). The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

Claims

CLAIMS1. A computer-implemented image manipulation method (200) comprising: receiving an input image (202); receiving a desired style to be transferred to the input image; obtaining a representation (204) of the input image selected from a plurality of stored representations of a plurality of images, wherein each said representation comprises data describing a set of image features; modifying at least one of the set of image features in the obtained representation to correspond to the input image and/or the desired style to produce a modified representation (207); rendering a reference image (209) based on the modified representation, and generating a manipulated image by performing a style transfer operation (210) on the input image using the reference image.
2. A method according to claim 1, wherein the plurality of representations comprise a respective plurality of statistical representations, and wherein the image features of the statistical representations comprise common features of the plurality of images identified by a content analysis method performed on the plurality of images.
3. A method according to claim 2, wherein the content analysis method is performed by a statistical system to generate the statistical representations of the plurality of images.
4. A method according to claim 3, wherein the statistical system comprises a machine learning technique that learns a distribution of the identified common features across the plurality of images.
5. A method according to claim 4, wherein the machine learning technique comprises a dimensionality reduction process.
6. A method according to any of claims 2 to 5, wherein the plurality of images comprise a dataset of example images of a particular type, and the set of image features comprise principal features that change across the plurality of images in the dataset.
7. A method according to any of claims 3 to 6, wherein the step of rendering the reference image (209) comprises a reverse of the process used to generate the statistical representations of the plurality of images.
8. A method according to claim 7, wherein the reference image comprises a synthetic rendering of the input image.
9. A method according to claim 8, wherein the input image comprises a face and the synthetic rendering comprises a 3D rendering of the face.
10. A method according to any preceding claim, wherein the step of obtaining the representation (204) of the input image comprises finding a said representation amongst the plurality of stored representations that has a greatest visual similarity to the input image.
11. A method according to any preceding claim, wherein: each of the image features of the obtained representation has an associated value describing a property of the image feature; the desired style comprises a set of image features, and each of the image features of the desired style has an associated value, and wherein the step of modifying the at least one of the set of image features in the obtained representation may comprise modifying the value of the image feature of the obtained representation to correspond to a said value of a corresponding said image feature in the input image and/or the desired style.
12. A method according to claim 11, wherein the desired style is based on a style image that provides the value for each of the image features of the desired style.
13. Apparatus (100) configured to perform image manipulation, the apparatus comprising: a processor (102) configured to: receive an input image; receive a desired style to be transferred to the input image; obtain a representation of the input image selected from a plurality of stored representations of a plurality of images, wherein each said representation comprises data describing a set of image features, modify at least one of the set of image features in the obtained representation to correspond to the input image and/or the desired style to produce a modified representation; render a reference image based on the modified representation, and generate a manipulated image by performing a style transfer operation on the input image using the reference image.
14. Apparatus according to claim 13, wherein the comprises a mobile computing device, a smartphone or a tablet.
15. A computer readable medium storing a computer program to operate an image manipulation method according to any of claims 1 to 12.