WO2015028713A1

WO2015028713A1 - An image enhancement apparatus and method

Info

Publication number: WO2015028713A1
Application number: PCT/FI2014/050650
Authority: WO
Inventors: Mikko Tammi; Arto Lehtiniemi
Original assignee: Nokia Corporation
Priority date: 2013-08-30
Filing date: 2014-08-27
Publication date: 2015-03-05
Also published as: GB201315502D0; GB2518144A

Abstract

An apparatus comprising: means for analysing at least two images to determine at least one region common to the at least two images; means for determining at least one parameter associated with a motion of at least one region; means for determining at least one playback signal to be associated with the at least one region; and means for processing the at least one playback signal based on the at least one parameter. The apparatus may produce cinemagraphs with associated audio or haptic signals.

Description

AN IMAGE ENHANCEMENT APPARATUS AND METHOD

Field The present invention relates to providing additional functionality for images. The invention further relates to, but is not limited to, display apparatus providing additional functionality for images displayed in mobile devices. More particularly, the invention relates to providing audio processing functionality for visual animations, and further relates to, but is not limited to, display apparatus providing audio enabled visual data for animating and displaying in mobile devices.

Background

Many portable devices, for example mobile telephones, are equipped with a display such as a glass or plastic display window for providing information to the user. Furthermore such display windows are now commonly used as touch sensitive inputs. In some further devices the device is equipped with transducers suitable for generating audible feedback. Images and animated images are known. Animated images or cinemagraph images can provide the illusion that the viewer is watching a video. The cinemagraph are typically still photographs in which a minor and repeated movement occurs. These are particularly useful as they can be transferred or transmitted between devices using significantly smaller bandwidth than conventional video.

Statement

According to an aspect, there is provided a method comprising: analysing at least two images to determine at least one region common to the at least two images; determining at least one parameter associated with a motion of at least one region; determining at least one playback signal to be associated with the at least one region; and processing the at least one playback signal based on the at least one parameter. Determining at least one parameter associated with a motion of at least one region may comprise: determining a motion of the at least one region; and determining at least one parameter based on the motion of the at least one region.

The at least one parameter may comprise at least one of: a motion periodicity; a motion direction; a motion speed; and a motion type. Determining at least one playback signal to be associated with the at least one region may comprise determining at least one playback signal based on the at least one parameter.

Determining at least one playback signal based on the at least one parameter may comprise: determining at least two playback signals based on the at least one parameter; receiving an input to select one of the at least two playback signals; and selecting one of the at least two playback signals based on the input.

Determining at least one playback signal based on the at least one parameter may comprise: determining for at least one playback signal at least one motion parameter value; and determining the at least one motion parameter value is within a determined distance of the at least one parameter.

Processing the at least one playback signal based on the at least one parameter may comprise at least one of: spatial processing the at least one playback signal based on the at least one parameter; combining the at least one playback signal to a recorded at least one audio signal based on the at least one parameter; and signal processing the at least one playback signal based on the at least one parameter.

Spatial processing the at least one playback signal based on the at least one parameter may comprise modifying the audio field of the at least one playback signal to move based on the motion of the at least one region.

The method may further comprise: displaying at least one image of the at least two images; and synchronising and outputting the processed at least one playback signal. The at least one playback signal may comprise at least one of: at least one audio signal; and at least one tactile signal.

Processing the at least one playback signal based on the at least one parameter may comprise at least one of: determining within the playback signal at least one audio object; and spatially processing the at least one audio object based on the at least one parameter such that the at least one audio object follows the motion of the at least one region.

According to a second aspect there is provided an apparatus comprising: means for analysing at least two images to determine at least one region common to the at least two images; means for determining at least one parameter associated with a motion of at least one region; means for determining at least one playback signal to be associated with the at least one region; and means for processing the at least one playback signal based on the at least one parameter.

The means for determining at least one parameter associated with a motion of at least one region may comprise: means for determining a motion of the at least one region; and means for determining at least one parameter based on the motion of the at least one region.

The at least one parameter may comprise at least one of: a motion periodicity; a motion direction; a motion speed; and a motion type.

The means for determining at least one playback signal to be associated with the at least one region may comprise means for determining at least one playback signal based on the at least one parameter.

The means for determining at least one playback signal based on the at least one parameter may comprise: means for determining at least two playback signals based on the at least one parameter; means for receiving an input to select one of the at least two playback signals; and means for selecting one of the at least two playback signals based on the input.

The means for determining at least one playback signal based on the at least one parameter may comprise: means for determining for at least one playback signal at least one motion parameter value; and means for determining the at least one motion parameter value is within a determined distance of the at least one parameter.

The means for processing the at least one playback signal based on the at least one parameter may comprise at least one of: means for spatial processing the at least one playback signal based on the at least one parameter; means for combining the at least one playback signal to a recorded at least one audio signal based on the at least one parameter; and means for signal processing the at least one playback signal based on the at least one parameter.

The means for spatial processing the at least one playback signal based on the at least one parameter may comprise means for modifying the audio field of the at least one playback signal to move based on the motion of the at least one region. The apparatus may further comprise: means for displaying at least one image of the at least two images; and means for synchronising and outputting the processed at least one playback signal. The at least one playback signal may comprise at least one of: at least one audio signal; and at least one tactile signal.

The means for processing the at least one playback signal based on the at least one parameter comprises at least one of: means for determining within the playback signal at least one audio object; and means for spatially processing the at least one audio object based on the at least one parameter such that the at least one audio object follows the motion of the at least one region.

According to a third aspect there is provided an apparatus comprising at least one processor and at least one memory including computer code for one or more programs, the at least one memory and the computer code configured to with the at least one processor cause the apparatus to at least: analyse at least two images to determine at least one region common to the at least two images; determine at least one parameter associated with a motion of at least one region; determine at least one playback signal to be associated with the at least one region; and process the at least one playback signal based on the at least one parameter.

Determining at least one parameter associated with a motion of at least one region may cause the apparatus to: determine a motion of the at least one region; and determine at least one parameter based on the motion of the at least one region.

The at least one parameter may comprise at least one of: a motion periodicity; a motion direction; a motion speed; and a motion type. Determining at least one playback signal to be associated with the at least one region may cause the apparatus to determine at least one playback signal based on the at least one parameter.

Determining at least one playback signal based on the at least one parameter may cause the apparatus to: determine at least two playback signals based on the at least one parameter; receive an input to select one of the at least two playback signals; and select one of the at least two playback signals based on the input.

Determining at least one playback signal based on the at least one parameter may cause the apparatus to: determine for at least one playback signal at least one motion parameter value; and determine the at least one motion parameter value is within a determined distance of the at least one parameter.

Processing the at least one playback signal based on the at least one parameter may cause the apparatus to perform at least one of: spatial processing the at least one playback signal based on the at least one parameter; combining the at least one playback signal to a recorded at least one audio signal based on the at least one parameter; and signal processing the at least one playback signal based on the at least one parameter.

Spatial processing the at least one playback signal based on the at least one parameter may cause the apparatus to modify the audio field of the at least one playback signal to move based on the motion of the at least one region. The apparatus may further be caused to: display at least one image of the at least two images; and synchronise and output the processed at least one playback signal.

The at least one playback signal may comprise at least one of: at least one audio signal; and at least one tactile signal.

Processing the at least one playback signal based on the at least one parameter may cause the apparatus to perform at least one of: determine within the playback signal at least one audio object; and spatially process the at least one audio object based on the at least one parameter such that the at least one audio object follows the motion of the at least one region.

According to a fourth aspect there is provided an apparatus comprising: an analyser configured to analyse at least two images to determine at least one region common to the at least two images; a motion determiner configured to determine at least one parameter associated with a motion of at least one region; a playback determiner configured to determine at least one playback signal to be associated with the at least one region; and a processor configured to process the at least one playback signal based on the at least one parameter. The motion determiner may be configured to: determine a motion of the at least one region; and determine at least one parameter based on the motion of the at least one region.

The at least one parameter may comprise at least one of: a motion periodicity; a motion direction; a motion speed; and a motion type. The playback determiner may be configured to determine at least one playback signal based on the at least one parameter. The playback determiner may be configured to: determine at least two playback signals based on the at least one parameter; receive an input to select one of the at least two playback signals; and select one of the at least two playback signals based on the input. The playback determiner may be configured to: determine for at least one playback signal at least one motion parameter value; and determine the at least one motion parameter value is within a determined distance of the at least one parameter.

The processor may comprise at least one of: a spatial processor configured to spatial process the at least one playback signal based on the at least one parameter; a combiner configured to combine the at least one playback signal to a recorded at least one audio signal based on the at least one parameter; and a signal processor configured to signal process the at least one playback signal based on the at least one parameter.

The spatial processor may be configured to modify the audio field of the at least one playback signal to move based on the motion of the at least one region.

The apparatus may further comprise: a display configured to display at least one image of the at least two images; and a synchroniser configured to synchronise and output the processed at least one playback signal.

The processor may comprise at least one of: an audio object determiner configured to determine within the playback signal at least one audio object; and a spatial processor configured to spatially process the at least one audio object based on the at least one parameter such that the at least one audio object follows the motion of the at least one region.

A computer program product stored on a medium may cause an apparatus to perform the method as described herein.

An electronic device may comprise apparatus as described herein. A chipset may comprise apparatus as described herein. Summary of Figures

For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:

Figure 1 shows schematically an apparatus suitable for employing some embodiments;

Figure 2 shows schematically an example audio enhanced cinemagraph generator; Figure 3 shows a flow diagram of the operation of the audio enhanced cinemagraph generator as shown in Figure 2 according to some embodiments;

Figure 4 shows schematically a video analyser as shown in Figure 2 according to some embodiments;

Figure 5 shows a flow diagram of the operation of the video analyser as shown in Figure 4 according to some embodiments;

Figure 6 shows schematically an audio-haptic processor as shown in Figure 2 according to some embodiments;

Figure 7 shows a flow diagram of the operation of the audio-haptic processor as shown in Figure 6 according to some embodiments.

Description of Example Embodiments

The concept of embodiments of the application is to combine audio signals and/or haptic signals to cinemagraphs (animated images) during the generation of cinemagraphs or animated images. This can be implemented in the example shown herein by generating and embedding metadata including audio effect signals or links to the audio effect signal (or haptic effect signals or links to the haptic effect signals) using at least one of intrinsic and synthetic audio (haptic) signals in such a manner that the generated cinemagraph is enhanced by the audio and/or haptic effect.

High quality photographs and videos are known to provide a great way to relive an experience. Cinemagraphs or animated images are seen as an extension of a photograph and produced using postproduction techniques. The cinemagraph provides a means to enable motion in an object common or mutual between images or in a region of an otherwise still or static picture. For example the design or aesthetic element allows subtle motion elements while the rest of the image is still. In some cinemagraphs the motion or animation feature is repeated. In the following description and claims the term object, common object, or region can be considered to refer to any element, object or component which is shared (or mutual) across the images used to create the cinemagraph or animated object. For example the images used as an input could be a video of a moving toy train against a substantially static background. In such an example the object, subject, common object, region, or element can be the toy train which in the animated image provides the dynamic or subtle motion element whilst the rest of the image is still. It would be understood that the common object or subject may not be substantially identical from frame to frame. However typically there is a large degree of correlation between subsequent image objects as the object moves or appears to move. For example the object or subject of the toy train can appear to move to and from the observer from frame to frame in such a way that the train appears to get larger/smaller or the toy train appears to turn away from or to the observer by the toy train profile changing.

In other words the size, shape and position of the region of the image identified as the subject, object or element can change from image to image, however within the image there is a selected entity which from frame to frame has a degree of correlation (as compared to the static image components which have substantially perfect correlation from frame to frame).

A cinemagraph can in some ways be seen as a potential natural progression of image viewing from greyscale (black and white photography) to colour, colour to high- resolution colour images, fully static to regional motion within the photograph. However reliving an experience can be seen as being incomplete without audio, and cinemagraphs at present cannot render audio and/or tactile effects with the images.

The problem therefore is how is to enable an apparatus to easily and without significant skilled and experienced input from the user to generate a cinemagraph or animated image such that the audio and/or tactile effect can be associated with it.

Typically a cinemagraph (or motion photograph or animated image) is constructed from a video sequence, in which audio is likely to be available or associated with it. However when attempting to try to tie the audio in the motion photograph, the recorded audio in the scene as a whole cannot be tied to the motion image, rather the attached audio should be selected and processed selectively.

It would be understood that a cinemagraph can normally be understood to have a repeatable, subtle, motion element (or subject or object). However in some situations the audio can be attached to non-repeatable object or motion element within an animated image or photograph. For example adding a lightning/thunder sound to a motion photograph. Similarly in some embodiments the audio clip or signal can be a single instance play element within a visual motion element animated scene. The concept as described in embodiments herein is to analyse the movement occurring in the images which are used to generate the cinemagraph and based on the characteristics or parameters determined additional features, such as audio or tactile effect selection or processing can be determined. In some embodiments the additional features use can be made very simple for the user (for example providing a user interface switch to turn the feature on/off) and different options can be easily added to enhance the experience.

For example as described in further detail in embodiments herein additional features to enhance the Cinemagraph experience can be at least one of: spatial sound scene processing; haptic (vibra) effect addition; music object modification; movement matching to an audio scene from stored or retrieved audio database (for example a movie catalogue).

With respect to Figure 1 a schematic block diagram of an example electronic device 10 or apparatus on which embodiments of the application can be implemented. The apparatus 10 is such embodiments configured to provide improved image experiences.

The apparatus 10 is in some embodiments a mobile terminal, mobile phone or user equipment for operation in a wireless communication system. In other embodiments, the apparatus is any suitable electronic device configured to process video and audio data. In some embodiments the apparatus is configured to provide an image display, such as for example a digital camera, a portable audio player (mp3 player), a portable video player (mp4 player). In other embodiments the apparatus can be any suitable electronic device with touch interface (which may or may not display information) such as a touch-screen or touch-pad configured to provide feedback when the touch-screen or touch-pad is touched. For example in some embodiments the touch-pad can be a touch-sensitive keypad which can in some embodiments have no markings on it and in other embodiments have physical markings or designations on the front window. The user can in such embodiments be notified of where to touch by a physical identifier - such as a raised profile, or a printed layer which can be illuminated by a light guide. The apparatus 10 comprises a touch input module or user interface 11 , which is linked to a processor 15. The processor 15 is further linked to a display 12. The processor 15 is further linked to a transceiver (TX/RX) 13 and to a memory 16. In some embodiments, the touch input module 11 and/or the display 12 are separate or separable from the electronic device and the processor receives signals from the touch input module 11 and/or transmits and signals to the display 12 via the transceiver 13 or another suitable interface. Furthermore in some embodiments the touch input module 11 and display 12 are parts of the same component. In such embodiments the touch interface module 11 and display 12 can be referred to as the display part or touch display part.

The processor 15 can in some embodiments be configured to execute various program codes. The implemented program codes, in some embodiments can comprise such routines as audio signal parsing and decoding of image data, touch processing, input simulation, or tactile effect simulation code where the touch input module inputs are detected and processed, effect feedback signal generation where electrical signals are generated which when passed to a transducer can generate tactile or haptic feedback to the user of the apparatus, or actuator processing configured to generate an actuator signal for driving an actuator. The implemented program codes can in some embodiments be stored for example in the memory 16 and specifically within a program code section 17 of the memory 16 for retrieval by the processor 15 whenever needed. The memory 15 in some embodiments can further provide a section 18 for storing data, for example data that has been processed in accordance with the application, for example pseudo-audio signal data.

The touch input module 11 can in some embodiments implement any suitable touch screen interface technology. For example in some embodiments the touch screen interface can comprise a capacitive sensor configured to be sensitive to the presence of a finger above or on the touch screen interface. The capacitive sensor can comprise an insulator (for example glass or plastic), coated with a transparent conductor (for example indium tin oxide - ITO). As the human body is also a conductor, touching the surface of the screen results in a distortion of the local electrostatic field, measurable as a change in capacitance. Any suitable technology may be used to determine the location of the touch. The location can be passed to the processor which may calculate how the user's touch relates to the device. The insulator protects the conductive layer from dirt, dust or residue from the finger.

In some other embodiments the touch input module can be a resistive sensor comprising of several layers of which two are thin, metallic, electrically conductive layers separated by a narrow gap. When an object, such as a finger, presses down on a point on the panel's outer surface the two metallic layers become connected at that point: the panel then behaves as a pair of voltage dividers with connected outputs. This physical change therefore causes a change in the electrical current which is registered as a touch event and sent to the processor for processing.

In some other embodiments the touch input module can further determine a touch using technologies such as visual detection for example a camera either located below the surface or over the surface detecting the position of the finger or touching object, projected capacitance detection, infra-red detection, surface acoustic wave detection, dispersive signal technology, and acoustic pulse recognition. In some embodiments it would be understood that 'touch' can be defined by both physical contact and 'hover touch' where there is no physical contact with the sensor but the object located in close proximity with the sensor has an effect on the sensor.

The touch input module as described here is an example of a user interface input. It would be understood that in some other embodiments any other suitable user interface input can be employed to provide an user interface input, for example to select an object, item or region from a displayed screen. In some embodiments the user interface input can thus be a keyboard, mouse, keypad, joystick or any suitable pointer device.

The apparatus 10 can in some embodiments be capable of implementing the processing techniques at least partially in hardware, in other words the processing carried out by the processor 15 may be implemented at least partially in hardware without the need of software or firmware to operate the hardware.

The transceiver 13 in some embodiments enables communication with other electronic devices, for example in some embodiments via a wireless communication network.

The display 12 may comprise any suitable display technology. For example the display element can be located below the touch input module and project an image through the touch input module to be viewed by the user. The display 12 can employ any suitable display technology such as liquid crystal display (LCD), light emitting diodes (LED), organic light emitting diodes (OLED), plasma display cells, Field emission display (FED), surface-conduction electron-emitter displays (SED), and Electrophoretic displays (also known as electronic paper, e-paper or electronic ink displays). In some embodiments the display 12 employs one of the display technologies projected using a light guide to the display window. With respect to Figure 2 an example audio enhanced cinemagraph generator is shown. Furthermore with respect to Figure 3 the operation of the example audio enhanced cinemagraph generator as shown in Figure 2 is further described.

In some embodiments the audio enhanced cinemagraph generator comprises a camera 101 or is configured to receive an input from a camera 101 . The camera 101 can be any suitable video or image capturing apparatus. The camera 101 can be configured to capture images and pass the image or video data to a video processor 103. In some embodiments the camera block 101 can represent any suitable video or image source. For example in some embodiments the video or images can be retrieved from a suitable video or image storing memory or database of images. The images can be stored locally, for example within the memory of the audio enhanced cinemagraph apparatus, or in some embodiments can be stored external to the apparatus and received for example via the transceiver.

In some embodiments the audio enhanced cinemagraph generator comprises a user interface input 100 or is configured to receive a suitable user interface input 100. The user interface input 100 can be any suitable user interface input. In the following examples the user interface input is an input from a touch screen sensor from a touch screen display. However it would be understood that the user interface input in some embodiments can be at least one of: a mouse or pointer input, a keyboard input, and a keypad input. The user interface input 100 is shown with respect to some embodiments in Figure 8 which shows the displayed user interface display at various stages of the cinemagraph generation stage.

The example audio enhanced cinemagraph generator can in some embodiments comprise a video processor 103. The video processor 103 can be configured to receive the image or video data from the camera 101 , analyse and process the video images to generate image motion/animation.

Furthermore as shown in Figure 2 the video processor 103 can be configured to receive input signals from the user interface input 100. For example in some embodiments the user interface input 100 can be configured to open or select the video from the camera 101 to be processed.

The operation of selecting the video is shown in Figure 3 by step 201 .

In some embodiments the video processor 103 comprises a video analyser 105. The video analyser 105 can be configured to receive the video or image data selected by the user interface input 100 and perform an analysis of the image to determine any objects or regions of the image which have meaningful motion, in other words whether there are any objects or regions with a periodicity suitable for generating a cinemagraph.

With respect to Figure 4 an example video analyser 105 is shown according to some embodiments. Furthermore with respect to Figure 5 a flow diagram of the operation of the example video analyser 105 as shown in Figure 4 is described in further detail. In some embodiments the video analyser 105 comprises an image motion determiner 351 . The image motion determiner 351 is configured to receive the video images.

The operation of receiving images from the camera or memory is shown in Figure 5 by step 451 .

In some embodiments the image motion determiner 351 is configured to analyse the video images to determine regions with motion. In some embodiments the image making determiner 351 can be configured to determine regions with periodicities which lend themselves to create meaningful cinemagraphs. In other words to determine regions with motion and whether the region would be suitable for use as a subject region for the cinemagraph.

In some embodiments the determined regions with motion (and filtered determined regions) can be displayed to the user to be selected as part of the cinemagraph generation operation.

The operation of analysing the images for regions of motion is shown in Figure 5 by step 453. In some embodiments the image motion determiner 351 can further be configured to analyse the images and specifically the determined motion regions to determine parameters or characteristics associated with the regions. For example in some embodiments the image motion determiner 351 can be configured to output the determined regions to at least one of a motion periodicity determiner 353, a motion direction determiner 355, and a motion speed/type determiner 357.

In some embodiments the video analyser 105 comprises a motion periodicity determiner 353. The motion periodicity determiner 353 can be configured to receive the output from the image motion determiner 351 and determine the region motion periodicity. It would be understood that in some embodiments the motion periodicity can be associated with an open loop periodicity in other words where the region does not return to the same location as the initial region location as well as closed loop periodicity where the region returns substantially to the same location as the initial region location. The periodicity of the motion can be output as a parameter associated with the region and in some embodiments as a time interval where motion occurs within a time period defined by the image capture length.

The operation of determining the region motion periodicity is shown in Figure 5 by step 455.

In some embodiments the video analyser comprises a motion direction determiner 355. The motion direction determiner 355 can be configured to receive the determined motion regions from the image motion determiner 351 and be configured to determine the direction of the motion of the region. The direction of the motion can be output as a parameter associated with the region and in some embodiments associated with a time value or time interval. For example in some embodiments the motion direction determiner 255 can be configured to determine the motion direction being a first direction for a first interval and in a further direction for a further interval. The operation of determining the region motion directionality is shown in Figure 5 by step 457.

In some embodiments the video analyser comprises a motion speed/type determiner 357. The motion speed/type determiner 357 can be configured to receive the region determined by the image motion determiner 351 and be further configured to determine the speed or type of motion associated with the region. The speed/type of the motion can then be output as a parameter associated with the region and furthermore in some embodiments associated with a time value or time interval associated with the region.

The operation of determining the motion speed/type for the region is shown in Figure 5 by step 459.

The operation of outputting the determined (or filtered) regions and characteristics associated with the regions such as for example period, direction and speed/type of motion is shown in Figure 5 by step 461 .

The operation of analysing the images to determine motion (and parameters such as periodicity) and determining objects or regions which have motion (and are able to create meaningful cinemagraphs) is shown in Figure 3 by step 202. A meaningful cinemagraph can be considered to refer to image motion regions with suitable motion (and in some embodiments suitable for adding accompaniments such as audio/tactile effects) that do not annoy the observer.

As discussed herein the video analyser 105 can in some embodiments output the determined objects or regions to the user interface to display to the user such that the user can select one of the objects or regions. In some other embodiments the video analyser 105 can select one of the determined objects or regions according to any suitable selection criteria.

The user interface input 100 can then be configured to provide an input to select one of the objects or regions or region for further processing. The selected (by the user or otherwise) region can then be passed to the region processor 107.

The operation of (the user) selecting of one of the regions is shown in Figure 3 by step 203.

In some embodiments the video processor 103 comprises a region processor 107. The region processor can be configured to receive the selected region and perform region processing on the image data in such a way that the output of the region processor is suitable cinemagraph video or image data.

For example in some embodiments the region processor 107 can perform at least one of the following processes, video stabilisation, frame selection, region segmentation, and overlay of motion segments on static background. In some embodiments the region processor 107 can perform object detection.

Furthermore in some embodiments from the object or region selected there can be more than one time period or frame group or frame range suitable for providing animation. For example within a region there can be temporal periodicities at two or more different times from which one of the time or frame groups are selected or picked. The picked or selected frames are shown in the time-line below the region. This for example can be illustrated with respect to an image based example where the object or region shows a toy train. The train completes one full circle which is captured in the first 30 frames of the video. The train then is static or does nothing for the next 100 frames. Then the train reverses for the next 30 frames and completes the circle in the reverse direction. So for a given region there are two 30 frame length periods from which each of the 30 frame length train motions can be possible candidates.

It will therefore be understood that the images or frames to be analysed may be associated with each other in some way. That is the images or frames may come from the same video stream or from the same sequence of images that have been captured, such as multiple snapshots based on the same camera lens view or cinemagraph like application. The operation of region processing on the selected image data is shown in Figure 3 by step 204.

In some embodiments the region processor 107 and the video processor 103 can output the processed video or image data to the synchroniser 109.

In some embodiments the apparatus comprises an audio signal source 102. The audio signal source 102 can in some embodiments comprise a microphone or microphones. In such embodiments the microphone or microphones output an audio signal to an audio/haptic processor 111 . It would be understood that in some embodiments the microphone or microphones are physically separated from the audio/haptic processor 111 and pass the information via a communications link, such as a wired or wireless link.

In some embodiments the audio signal source 102 comprises an audio/haptic database. In such embodiments the audio/haptic database can output an audio signal to the audio/haptic processor 111 . The audio/haptic database can be any suitable database or linked audio/haptic signal database. For example the audio/haptic database can, in some embodiments, be a database of audio/haptic clips or signals stored on the Internet or within 'the cloud'. Furthermore in some embodiments the audio database can be a database or collection of audio/haptic clips, signals or links to audio/haptic signals stored within the memory of the apparatus.

In some embodiments the user interface input 100 can be configured to control the audio/haptic processor 111 to select a suitable audio and/or haptic file or source.

The operation of the user selecting one of the audio and/or haptic files is shown in Figure 3 by step 205. In some embodiments the audio/haptic processor 111 can be configured to look up or select the audio/haptic signal or link from the audio/haptic source 102 based on the motion detected by the video analyser 105. In some embodiments the audio enhanced cinemagraph generator comprises a audio/haptic processor 111 . The audio/haptic processor 111 can in some embodiments be configured to select or receive the audio/haptic signal which is processed in a suitable manner. In some embodiments the audio/haptic processor 111 could be configured to process the audio/haptic signal based on the motion detected by the video analyser 105. For example as described herein in some embodiments the audio/haptic processor 111 selects the audio/haptic signal to be associated with the video region. Furthermore in some embodiments the audio/haptic processor 111 can be configured to modify spatial audio content based on the motion detected by the video analyser, for example to match the movement on the video region. In some embodiments the audio/haptic processor 111 can be configured to add or select haptic signals to generate suitable haptic effects based on the motion detected by the video analyser, for example to match the movement on the video region with haptic effects on the display. In some embodiments the audio/haptic processor 111 can be configured to modify audio or music objects based on the motion detected by the video analyser. In some embodiments the audio/haptic processor 111 can be configured to modulate the pitch of the audio/haptic signal that is being attached based on the motion detected by the video analyser, for example a motion of an object could be smooth periodic rather than jerky and in such situation the audio/haptic processor 111 can be configured to modulate the overall periodicity of the audio according to detected motion.

For example in some embodiments the audio/haptic processor 111 can be configured to perform a beat/tempo/rhythm estimation on the audio/haptic signal and select regions of the audio/haptic signal for looping in the cinemagraph based on the beat calculation values.

The processing of audio and the selection and outputting of candidate regions for the cinemagraph is shown in Figure 3 by step 206.

In some embodiments the user interface input 100 can be used to select, from the candidate regions, a region to be output to the synchroniser 104. The operation of the user selecting one option is shown in Figure 3 in step 207. With respect to Figure 6 an example audio/haptic processor 111 is shown according to some embodiments. Furthermore with respect to Figure 7 an example operation of the audio/haptic processor 111 as shown in Figure 6 is shown in further detail.

In some embodiments the audio/haptic processor 111 comprises a candidate determiner 301 . The candidate determiner 301 can be configured to receive the video analysis input, in other words the motion parameters determined by the video analyser 105.

The operation of receiving the video analysis input is shown in Figure 7 by step 401 .

In some embodiments the candidate determiner 301 can be configured to filter a search space of audio/tactile signals based on the motion parameters received. In some embodiments the candidate determiner comprises a database of available or candidate audio signals and/or tactile signals, wherein the audio/tactile signals have associated parameters (such as beat, duration, energy) which can be used as locating parameters on a search space used by the candidate determiner 301 to locate at least one candidate audio/tactile signal based on the motion parameters.

In some embodiments filtering process can be performed by an entity other than the candidate determiner 301 . For example in some embodiments the candidate determiner 301 can be configured to output the video analysis input motion parameters to the audio/haptic source 102, which then searches the database of audio/haptic signals and which then returns suitably matched audio/tactile signals or links to suitable audio/tactile signals to the candidate determiner.

Any suitable searching or filtering operation can be performed. For example in some embodiments an 'N'-dimensional space is searched where each axis in the 'N'- dimensional space represents a specific motion parameter (for example direction, speed, periodicity) and each potential candidate audio/tactile signal is located within the 'N'-dimensional space.

The operation of filtering the search space based on the motion parameters for the audio signal or tactile signal is shown in Figure 7 by step 403.

As described herein in some embodiments a tactile signal to be generated by the apparatus based on the motion analysis can be selected. For example where the video shows in a region a person hitting the wall with a hammer, then the motion analysis shows a sudden jolt or stop when the hammer hits the wall. The sudden directional and speed change of the region can generate parameters which are used by the audio/haptic processor to select an audio/haptic signal with a sudden transient characteristic and in some embodiments a short strong vibration effect is selected to be generated by the apparatus when the hammer hits the wall. Similarly a video showing a region which approaches the camera and then goes past at a steady speed the motion analysis can indicated constant motion parameters which the audio/haptic processor can select an audio/haptic signal with an constant characteristic, for example a car noise or train noise. In some embodiments where the apparatus has sufficient audio capture or record capacity it would be understood that the audio/haptic processor can be configured to select at least a portion of the audio signals recorded at the time of the video recording.

In some embodiments the filtered audio signal/tactile signal candidates are presented to the user, for example via a user interface display.

The operation of presenting the candidate signals is shown in Figure 7 by step 405.

In such embodiments the candidate determiner 301 can be further configured to receive a user interface input and select a candidate or at least one candidate audio/tactile signal based on the user interface input.

The operation of selecting the candidate audio/tactile signal based on the user interface input is shown in Figure 7 by step 407. It would be understood that the use of user input to assist in the selection of candidate audio/tactile signals is an optional operation and that as described herein the selection can be determined automatically (for example by selecting the nearest matching audio/tactile signal), semi-automatically (for example by displaying a number of near searched audio/tactile signals based on the motion parameters), or manually (for example by displaying the search space or sub-set of the search space).

In some embodiments the audio/haptic processor comprises an audio/tactile signal parameter determiner 303. The audio/tactile parameter determiner 303 can in some embodiments be configured to receive the selected candidate audio/tactile signal or signals from the candidate determiner 301 . Furthermore the audio/tactile parameter determiner 303 can be configured to further analyse the audio/tactile signal to generate at least one associated parameter (such as spatial parameters/music objects or any suitable parameter) to be further processed by the signal or spatial processor 305. For example in some embodiments the audio/tactile signal parameter determiner 303 can be configured to generate a spatial parameterised version of the audio signal. In such embodiments the audio signal can be divided into time frames which are time to frequency domain converted and filtered into sub-band components. Each sub-band component can then be analysed in such a way that at least one directional component is identified, for example by correlating at least two channels of the audio signal, and the directional component is filtered from the background signal. In some embodiments the spatial parameterised version of the audio signal can be that provided by the Directional Audio Coding (DirAC) method wherein a mid (M) signal component with a direction (a) representing the directional component and side (S) component representing the background signal is generated. However any suitable parameterisation of the audio signal can in some embodiments be generated. The operation of analysing the audio/tactile signal to generate spatial parameters/music objects is shown in Figure 7 by step 409.

In some embodiments the audio/haptic processor 111 comprises a signal or spatial processor (or suitable means for signal processing or spatial processing the audio/tactile signal). The spatial processor 305 in some embodiments is configured to receive the audio/tactile signal in a suitable parameterised form such as from the audio/tactile parameter determiner 303 or in some embodiments directly from the candidate determiner 301 where the audio/tactile signal is a pre-parameterised form. In some embodiments the spatial processor 305 can be configured to spatially process the audio/tactile signal based on the video analysis motion parameters. For example in some embodiments the motion analysis from the video analyser determines that the motion direction is one where the object moves from one side to the other (for example left to right) then the audio signal can be processed such that the audio signal is processed in such a way that it is heard moving from one side to the other. For example in some embodiments the spatial processor 305 can be configured to apply amplitude panning to the audio signal, which comprises a stereo or multichannel audio signal (for example a binaural or 5,1 channel audio signal), such that the audio signal has a greater volume or energy from the one side (left) initially which then moves such that by the end of the motion it is heard with a greater volume at the other (right). In some embodiments the spatial processing can be any suitable spatial processing such as applying a suitable inter-aural time delay or inter- aural level difference. Similarly in some embodiments where the motion direction is towards or away from the apparatus the spatial processor 305 can be configured to increase or decrease respectively the volume of the audio signal. In some embodiments the speed of the motion towards the apparatus or away the apparatus can be cause the spatial processor to apply a pitch shift to the audio signal to simulate the Doppler shift effect. The operation of performing spatial processing of the audio/tactile spatial parameters based on the motion parameters is shown in Figure 7 by step 411 .

In some embodiments the audio/tactile parameter determiner 303, can be configured to apply processing to a component or object of the audio signal. For example where the candidate audio signal is a piece of music the parameterised version of the audio signal can be considered to have separated instruments due to their differing dominant frequencies then the spatial processor can be configured to select a specific instrument (or frequency sub-band) on the audio track to be processed to follow the movement pattern from the region. In some embodiments the spatial processor can be configured to apply an echo or other effect to a determined object to model the movement of the region. For example where the analysed video shows waves on water then a tremolo effect can be produced on a selected sub-band of the audio signal to provide an equivalent audio experience.

The processed audio/tactile signal can in some embodiments be configured to be output.

The operation of outputting a processed audio signal is shown in Figure 7 step 413.

It would be understood that in some embodiments that the spatial processor 305 can be configured to process the audio/tactile signal according to more than one of these embodiments. Furthermore in some embodiments the candidate determiner 301 and spatial processor 305 can be controlled by a user interface input configured to enable the switching on or off of the various audio enhancements as described herein. It will also be understood that some embodiments relate to audio processing of the signal without spatial processing.

Although the audio enhancements/processing described herein are described with respect to moving images within cinemagraphs it would be understood that in some embodiments similar approaches can be applied to conventional video or moving images and to a single image. Thus for example while recording a single frame or picture a piece of video can be captured (not directly visible to the user). In some embodiments the audio is captured and stored as well, for example starting a couple of seconds before taking the picture and ending a couple seconds after. The movement on the video can be analysed with a similar type of movement analysis engine as described herein and based on the analysis the user is provided different alternatives for audio content and processing of the audio content.

In some embodiments the audio/tactile processor 111 can then output the selected/processed audio/tactile signals to the synchroniser 109.

In some embodiments the apparatus comprises a synchroniser 109 configured to receive video information from the video processor 103, audio/tactile information from the audio/tactile processor 111 , and user interface information from the user interface input 100. In some embodiments the synchroniser 109 can be configured to adjust the audio/tactile and/or video frame parameters and further perform synchronisation and enhancement to the audio and video signals prior to outputting the file information. For example in some embodiments the synchroniser 109 can display on the user interface an expanded selected audio region to permit the user interface input to select frames for synchronising the image to the audio/tactile signal.

The operation of selecting an image frame for synchronising with the audio/tactile signal is shown in Figure 3 by step 209.

In some embodiments the video or audio data can furthermore be manipulated such that the audio/tactile and/or video images are warped in time to produce a better finished product.

The operation of adjusting the audio and frame parameters, synchronisation and enhancement operations are shown in Figure 3 by step 210.

The synchroniser 109 can in some embodiments be configured to save the completed cinemagraph or animate image with audio file according to any suitable format.

In some embodiments the synchroniser 109 can then be configured to mix or multiplex the data to form a cinemagraph or animated image metadata file comprising both image or video data and audio/tactile signal data. In some embodiments this mixing or multiplexing of data can generate a file comprising at least some of: video data, audio data, tactile signal data, sub region identification data and time synchronisation data according to any suitable format. The mixer and synchroniser 109 can in some embodiments output the metadata or file output data. The operation of saving the file containing the audio and video information is shown in Figure 3 by step 212.

It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers. Furthermore, it will be understood that the term acoustic sound channels is intended to cover sound outlets, channels and cavities, and that such sound channels may be formed integrally with the transducer, or as part of the mechanical integration of the transducer with the device.

In general, the design of various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

The design of embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.

The memory used in the design of embodiments of the application may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.

Embodiments of the inventions may be designed by various components such as integrated circuit modules.

As used in this application, the term 'circuitry' refers to all of the following:

(a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and

(b) to combinations of circuits and software (and/or firmware), such as: (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and

(c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

This definition of 'circuitry' applies to all uses of this term in this application, including any claims. As a further example, as used in this application, the term 'circuitry' would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term 'circuitry' would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or similar integrated circuit in server, a cellular network device, or other network device.

The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims

CLAIMS:

1 . A method comprising:

analysing at least two images to determine at least one region common to the at least two images;

determining at least one parameter associated with a motion of at least one region;

determining at least one playback signal to be associated with the at least one region; and

processing the at least one playback signal based on the at least one parameter.

2. The method as claimed in claim 1 , wherein determining at least one parameter associated with a motion of at least one region comprises:

determining a motion of the at least one region; and

determining at least one parameter based on the motion of the at least one region.

3. The method as claimed in any of claims 1 and 2, wherein the at least one parameter comprises at least one of:

a motion periodicity;

a motion direction;

a motion speed; and

a motion type.

4. The method as claimed in any of claims 1 to 3, wherein determining at least one playback signal to be associated with the at least one region comprises determining at least one playback signal based on the at least one parameter.

5. The method as claimed in claim 4, wherein determining at least one playback signal based on the at least one parameter comprises:

determining at least two playback signals based on the at least one parameter; receiving an input to select one of the at least two playback signals; and selecting one of the at least two playback signals based on the input.

6. The method as claimed in any of claims 4 and 5, wherein determining at least one playback signal based on the at least one parameter comprises: determining for at least one playback signal at least one motion parameter value; and

determining the at least one motion parameter value is within a determined distance of the at least one parameter.

7. The method as claimed in any of claims 1 to 6, wherein processing the at least one playback signal based on the at least one parameter comprises at least one of: spatial processing the at least one playback signal based on the at least one parameter;

combining the at least one playback signal to a recorded at least one audio signal based on the at least one parameter; and

signal processing the at least one playback signal based on the at least one parameter.

8. The method as claimed in claim 7 wherein spatial processing the at least one playback signal based on the at least one parameter comprises modifying the audio field of the at least one playback signal to move based on the motion of the at least one region.

9. The method as claimed in any of claims 1 to 8, further comprising:

displaying at least one image of the at least two images; and

synchronising and outputting the processed at least one playback signal.

10. The method as claimed in claims 1 to 9 wherein the at least one playback signal comprises at least one of:

at least one audio signal; and

at least one tactile signal.

11 . The method as claimed in claims 1 to 10, wherein processing the at least one playback signal based on the at least one parameter comprises at least one of:

determining within the playback signal at least one audio object; and

spatially processing the at least one audio object based on the at least one parameter such that the at least one audio object follows the motion of the at least one region.

12. An apparatus comprising:

means for analysing at least two images to determine at least one region common to the at least two images;

means for determining at least one parameter associated with a motion of at least one region; means for determining at least one playback signal to be associated with the at least one region; and

means for processing the at least one playback signal based on the at least one parameter.

13. The apparatus as claimed in claim 12, wherein the means for determining at least one parameter associated with a motion of at least one region comprises:

means for determining a motion of at least one region; and

means for determining at least one parameter based on the motion of the at least one region.

14. The apparatus as claimed in claim 12 or claim 13, wherein the at least one parameter comprises at least one of:

a motion periodicity;

a motion direction;

a motion speed; and

a motion type.

15. The apparatus as claimed in any of claims 12 to 14, wherein the means for determining at least one playback signal to be associated with the at least one region comprises means for determining at least one playback signal based on the at least one parameter.

16. The apparatus as claimed in claim 15, wherein the means for determining at least one playback signal based on the at least one parameter comprises:

means for determining at least two playback signals based on the at least one parameter;

means for receiving an input to select one of the at least two playback signals; and means for selecting one of the at least two playback signals based on the input.

17. The apparatus as claimed in claim 15 or claim 16, wherein the means for determining at least one playback signal based on the at least one parameter comprises:

means for determining for at least one playback signal at least one motion parameter value; and

means for determining the at least one motion parameter value is within a determined distance of the at least one parameter.

18. The apparatus as claimed in any of claims 12 to 17, wherein the means for processing the at least one playback signal based on the at least one parameter comprises at least one of:

means for spatial processing the at least one playback signal based on the at least one parameter;

means for combining the at least one playback signal to a recorded at least one audio signal based on the at least one parameter; and

means for signal processing the at least one playback signal based on the at least one parameter.

19. The apparatus as claimed in claim 18, wherein the means for spatial processing the at least one playback signal based on the at least one parameter comprises means for modifying the audio field of the at least one playback signal to move based on the motion of the at least one region.

20. The apparatus as claimed in any of claims 12 to 19, comprising:

means for displaying at least one image of the at least two images;

and means for synchronising and outputting the processed at least one signal.

21 . The apparatus as claimed in any of claims 12 to 20, wherein the at least one playback signal comprises at least one of:

at least one audio signal; and

at least one tactile signal.

22. The apparatus as claimed in any of claims 12 to 21 , wherein the means for processing the at least one playback signal based on the at least one parameter comprises at least one of:

means for determining within the playback signal at least one audio object; means for spatially processing the at least one audio object based on the at least one parameter such that the at least one audio object follows the motion of the at least one region.

23 An apparatus comprising at least one processor and at least one memory including computer code for one or more programs, the at least one memory and the computer code configured to with the at least one processor cause the apparatus to at least to:

analyse at least two images to determine at least one region common to the at least two images;

determine at least one parameter associated with a motion of at least one region; determine at least one playback signal to be associated with the at least one region; and

process the at least one playback signal based on the at least one parameter. 24. An apparatus comprising:

an analyser configured to analyse at least two images to determine at least one region common to the at least two images;

a motion determiner configured to determine at least one parameter associated with a motion of at least one region;

a playback determiner configured to determine at least one playback signal to be associated with the at least one region; and

a processor configured to process the at least one playback signal based on the at least one parameter.