GB2533360A

GB2533360A - Method, apparatus and computer program product for processing multi-camera media content

Info

Publication number: GB2533360A
Application number: GB1422544.5A
Authority: GB
Inventors: Oikkonen Markku
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2014-12-18
Filing date: 2014-12-18
Publication date: 2016-06-22

Abstract

Media contents (e.g. video, directional audio, images) associated with a scene are received from media sources (e.g. cameras), as well as spatial information (e.g. location info, such as 3D geographical coordinates and altitude info, orientation, distances, satnav data) and media source data (e.g. settings, optical specifications) associated with the sources. A target portion, such as a region of interest or object (308), in one of the contents is selected (e.g. based on user inputs, such as enclosing within a boundary), the content being received from a first of one of the sources. The position and geometrical representation (e.g. an enclosing subframe) of a corresponding target portion (i.e. the same object) in at least one other media content is determined, the other media content being captured by a second media source of the plurality. The position and the geometrical representation is determined based on (1) the selection of the target portion in the media content, and (2) the spatial information and the media source data of both the first source and the second source. Interesting features in the scene can therefore be identified and extracted from all the sources and displayed (324, 326, 328), optionally having been cropped (340, 342, 344).

Description

METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR PROCESSING

MULTI-CAMERA MEDIA CONTENT

TECHNICAL FIELD

[0001] Various implementations relate generally to method, apparatus, and computer program product for processing multi-camera media content.

BACKGROUND

100021 Various electronic devices are widely used for capturing media content, for example video content, audio content, and the like. In some example scenarios, multiple the electronic devices may be located at different locations across a scene for capturing media content associated with the scene. The media content captured/recorded by the multiple electronic devices may be processed in order to improve user experience when the processed media content is accessed. During processing, most suitable portions of the media content may be selected and edited to produce a media content in form of a smoothly progressing story or documentary of a high quality. However, the available techniques for generating high quality media content by editing and/or combining the media contents retrieved from multiple electronic devices are time intensive and complex, and may pose challenges in producing high quality media content as output.

SUMMARY OF SOME EMBODIMENTS

[0003] Various example embodiments are set out in the claims.

[0004] In a first embodiment, there is provided a method comprising: receiving, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; facilitating selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and determining, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information, and the media source data of the media source and the at least one another media source.

100051 In a second embodiment, there is provided an apparatus comprising at least one processor; and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least: receive, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; facilitate selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources' and determine, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information, and the media source data of the media source and the at least one another media source.

[0006] In a third embodiment, there is provided a computer program product comprising at least one computer-readable storage medium, the computer-readable storage medium comprising a set of instructions, which, when executed by one or more processors, cause an apparatus to perform at least: receive, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; facilitate selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and determine, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information, and the media source data of the media source and the at least one another media source.

100071 In a fourth embodiment, there is provided an apparatus comprising: means for receiving, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; means for facilitating selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and means for determining, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information, and the media source data of the media source and the at least one another media source.

[0008] In a fifth embodiment, there is provided a computer program comprising program instructions which when executed by an apparatus, cause the apparatus to: receive, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; facilitate selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and determine, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information, and the media source data of the media source and the at least one another media source.

BRIEF DESCRIPTION OF THE FIGURES

100091 Various embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which: [0010] FIGURE 1 illustrates a device, in accordance with an example embodiment; 100111 FIGURE 2 illustrates an apparatus for processing multi-camera media content, in accordance with an example embodiment; 100121 FIGURE 3 illustrates an example representation of processing of multi-camera media content in accordance with an example embodiment; 100131 FIGURE 4 is an example representation of a multi-camera system in accordance with an example embodiment; [0014] FIGURE 5 is an example representation of capturing media content by utilizing a multi-camera system, in accordance with an example embodiment; 100151 FIGURE 6 is a flowchart depicting example method for processing of multi-camera media content, in accordance with an example embodiment; and [0016] FIGURE 7 is a flowchart depicting example method for processing of multi-camera media content, in accordance with an example embodiment.

DETAILED DESCRIPTION

[0017] Example embodiments and their potential effects are understood by referring to FIGURES 1 through 7 of the drawings.

[0018] FIGURE 1 illustrates a device 100 in accordance with an example embodiment. It should be understood, however, that the device 100 as illustrated and hereinafter described is merely illustrative of one type of device that may benefit from various embodiments, therefore, should not be taken to limit the scope of the embodiments. As such, it should be appreciated that at least some of the components described below in connection with the device 100 may be optional and thus in an example embodiment may include more, less or different components than those described in connection with the example embodiment of FIGURE 1. The device 100 could be any of a number of types of mobile electronic devices, for example, portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, cellular phones, all types of computers (for example, laptops, mobile computers or desktops), cameras, audio/video players, radios, global positioning system (GPS) devices, media players, mobile digital assistants, or any combination of the aforementioned, and other types of communications devices.

100191 The device 100 may include an antenna 102 (or multiple antennas) in operable communication with a transmitter 104 and a receiver 106. The device 100 may further include an apparatus, such as a controller 108 or other processing device that provides signals to and receives signals from the transmitter 104 and the receiver 106, respectively. The signals may include signaling information in accordance with the air interface standard of the applicable cellular system, and/or may also include data corresponding to user speech, received data and/or user generated data. In this regard, the device 100 may be capable of operating with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the device 100 may be capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like. For example, the device 100 may be capable of operating in accordance with second-generation (2G) wireless communication protocols IS-136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA1000, wideband CDMA (VVCDMA) and time division-synchronous CDMA (TD-SCDMA), with 3.9G wireless communication protocol such as evolved-universal terrestrial radio access network (E-UTRAN), with fourth-generation (4G) wireless communication protocols, or the like. As an alternative (or additionally), the device 100 may be capable of operating in accordance with non-cellular communication mechanisms. For example, computer networks such as the Internet, local area network, wide area networks, and the like; short range wireless communication networks such as Bluetooth® networks, Zigbee® networks, Institute of Electric and Electronic Engineers (IEEE) 802.11x networks, and the like; wireline telecommunication networks such as public switched telephone network (PSTN), asymmetric digital subscriber line (ADSL) network, optical fiber network, and so on.

[0020] The controller 108 may include circuitry implementing, among others, audio and logic functions of the device 100. For example, the controller 108 may include, but are not limited to, one or more digital signal processor devices, one or more microprocessor devices, one or more processor(s) with accompanying digital signal processor(s), one or more processor(s) without accompanying digital signal processor(s), one or more special-purpose computer chips, one or more field-programmable gate arrays (FPGAs), one or more controllers, one or more application-specific integrated circuits (ASICs), one or more computer(s), various analog to digital converters, digital to analog converters, and/or other support circuits. Control and signal processing functions of the device 100 are allocated between these devices according to their respective capabilities. The controller 108 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission. The controller 108 may additionally include an internal voice coder, and may include an internal data modem. Further, the controller 108 may include functionality to operate one or more software programs, which may be stored in a memory. For example, the controller 108 may be capable of operating a connectivity program, such as a conventional Web browser. The connectivity program may then allow the device 100 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like. In an example embodiment, the controller 108 may be embodied as a multi-core processor such as a dual or quad core processor. However, any number of processors may be included in the controller 108.

[0021] The device 100 may also comprise a user interface including an output device such as a ringer 110, an earphone or speaker 112, a microphone 114, a display 116, and a user input interface, which may be coupled to the controller 108. The user input interface, which allows the device 100 to receive data, may include any of a number of devices allowing the device 100 to receive data, such as a keypad 118, a touch display, the microphone 114 or other input device. In embodiments including the keypad 118, the keypad 118 may include numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the device 100. Alternatively or additionally, the keypad 118 may include a conventional QVVERTY keypad arrangement. The keypad 118 may also include various soft keys with associated functions. In addition, or alternatively, the device 100 may include an interface device such as a joystick or other user input interface. The device 100 further includes a battery 120, such as a vibrating battery pack, for powering various circuits that are used to operate the device 100, as well as optionally providing mechanical vibration as a detectable output.

[0022] In an example embodiment, the device 100 includes a media capturing element, such as a camera, video and/or audio module, in communication with the controller 108. The media capturing element may be any means for capturing an image, video and/or audio for storage, display or transmission. In an example embodiment in which the media capturing element is a camera module 122, the camera module 122 may include a digital camera capable of forming a digital image file from a captured image. As such, the camera module 122 includes all hardware, such as a lens or other optical component(s), and software for creating a digital image file from a captured image. Alternatively, the camera module 122 may include the hardware needed to view an image, while a memory device of the device 100 stores instructions for execution by the controller 108 in the form of software to create a digital image file from a captured image. In an example embodiment, the camera module 122 may further include a processing element such as a co-processor, which assists the controller 108 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to a JPEG standard format or another like format. For video, the encoder and/or decoder may employ any of a plurality of standard formats such as, for example, standards associated with H.261, H.262/ MPEG-2, H.263, H.264, H.264/MPEG-4, MPEG-4, and the like. In some cases, the camera module 122 may provide live image data to the display 116. Moreover, in an example embodiment, the display 116 may be located on one side of the device 100 and the camera module 122 may include at least one lens positioned on the opposite side of the device 100 with respect to the display 116 to enable the camera module 122 to capture images on one side of the device 100 and present a view of such images to the user positioned on the other side of the device 100. In an example embodiment, the device 100 may include multiple lenses for capturing three-dimensional (3D) images of a scene. In an example embodiment, the multiple lenses may also facilitate in capturing depth information associated with the scene. In an example embodiment, the depth information may include depth map associated with the scene. Herein, the depth map may be considered to represent the values related to the distance of the surfaces of the scene objects from a reference location, for example a view-point of an observer. A depth map is an image that may include per-pixel depth information or any similar information. In an example embodiment, the device 100 may include a light-field camera for capturing the 3D image of the scene, and the depth information associated with the scene.

100231 In an example embodiment, the device 100 may include one or more sensors for identifying target objects and/or target portions associated with a scene. Herein, the term 'target objects' and/or 'target portions' may refer to those objects and/or portions associated with a scene that may be objects and/or regions of interest in the media content captured by multiple media capturing devices. The target objects and/or portions may be captured along with other portions/regions (such as background regions, background audio music/noise/sounds, etc.) of the scene, and may be extracted from the media contents captured by the multiple capturing devices for further processing. In an example embodiment, the device 100 may further include one or more sensors for measuring distances between various media capturing devices. In an example embodiment, the one or more sensors may further be configured to measure distances between one or more objects or space points on the scene with respect to the multiple media capturing devices. In an example embodiment the sensors may further facilitate in gathering spatial information associated with the multiple media capturing devices. Example of such sensors may include radar sensors or sensors utilizing optical triangularization.

[0024] In an example embodiment, the device 100 may include position detection sensors that may facilitate in detecting the location of the device 100. For example, the device 100 may include sensors such as outdoor satellite navigation based global positioning system (GPS) sensors or indoor positioning devices (technology) to determine the location of the device 100. In an example embodiment, the device 100 may further include sensors such as accelerometers, gyroscopes and other such sensors for determining the 3D-orientation of the device 100. In an example embodiment, the device 100 may further be configured to receive an information such as position and location information of one or more objects of the scene. In an example embodiment, the one or more objects of the scene may include position detecting and indicating systems.

[0025] The device 100 may further include a user identity module (UIM) 124. The UIM 124 may be a memory device having a processor built in. The UIM 124 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USIM), a removable user identity module (R-UIM), or any other smart card. The UIM 124 typically stores information elements related to a mobile subscriber. In addition to the UIM 124, the device 100 may be equipped with memory. For example, the device 100 may include volatile memory 126, such as volatile random access memory (RAM) including a cache area for the temporary storage of data. The device 100 may also include other non-volatile memory 128, which may be embedded and/or may be removable. The non-volatile memory 128 may additionally or altematively comprise an electrically erasable programmable read only memory (EEPROM), flash memory, hard drive, or the like. The memories may store any number of pieces of information, and data, used by the device 100 to implement the functions of the device 100.

[0026] FIGURE 2 illustrates an apparatus 200 for processing multi-camera media content, in accordance with an example embodiment. The apparatus 200 may be employed, for example, in the device 100 of FIGURE 1. However, it should be noted that the apparatus 200, may also be employed on a variety of other devices both mobile and fixed, and therefore, embodiments should not be limited to application on devices such as the device 100 of FIGURE 1. Alternatively, embodiments may be employed on a combination of devices including, for example, those listed above. Accordingly, various embodiments may be embodied wholly at a single device (for example, the device 100) or in a combination of devices. Furthermore, it should be noted that the devices or elements described below may not be mandatory and thus some may be omitted in certain embodiments.

100271 The apparatus 200 includes or otherwise is in communication with at least one processor 202 and at least one memory 204. Examples of the at least one memory 204 include, but are not limited to, volatile and/or non-volatile memories. Some examples of the volatile memory include, but are not limited to, random access memory, dynamic random access memory, static random access memory, and the like. Some examples of the non-volatile memory include, but are not limited to, hard disks, magnetic tapes, optical disks, programmable read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, flash memory, and the like. The memory 204 may be configured to store information, data, applications, instructions or the like for enabling the apparatus 200 to carry out various functions in accordance with various example embodiments. For example, the memory 204 may be configured to buffer input data comprising media content for processing by the processor 202. Additionally or alternatively, the memory 204 may be configured to store instructions for execution by the processor 202.

[0028] An example of the processor 202 may include the controller 108. The processor 202 may be embodied in a number of different ways. The processor 202 may be embodied as a multi-core processor, a single core processor; or combination of multi-core processors and single core processors. For example, the processor 202 may be embodied as one or more of various processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. In an example embodiment, the multi-core processor may be configured to execute instructions stored in the memory 204 or otherwise accessible to the processor 202. Alternatively or additionally, the processor 202 may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor 202 may represent an entity, for example, physically embodied in circuitry, capable of performing operations according to various embodiments while configured accordingly. For example, if the processor 202 is embodied as two or more of an ASIC, FPGA or the like, the processor 202 may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, if the processor 202 is embodied as an executor of software instructions, the instructions may specifically configure the processor 202 to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor 202 may be a processor of a specific device, for example, a mobile terminal or network device adapted for employing embodiments by further configuration of the processor 202 by instructions for performing the algorithms and/or operations described herein. The processor 202 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 202.

[0029] A user interface 206 may be in communication with the processor 202. Examples of the user interface 206 include, but are not limited to, input interface and/or output user interface. The input interface is configured to receive an indication of a user input. The output user interface provides an audible, visual, mechanical or other output and/or feedback to the user. Examples of the input interface may include, but are not limited to, a keyboard, a mouse, a joystick, a keypad, a touch screen, soft keys, and the like. Examples of the output interface may include, but are not limited to, a display such as light emitting diode display, thin-film transistor (TFT) display, liquid crystal displays, active-matrix organic light-emitting diode (AMOLED) display, a microphone, a speaker, ringers, vibrators, and the like. In an example embodiment, the user interface 206 may include, among other devices or elements, any or all of a speaker, a microphone, a display, and a keyboard, touch screen, or the like. In this regard, for example, the processor 202 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface 206, such as, for example, a speaker, ringer, microphone, display, and/or the like. The processor 202 and/or user interface circuitry comprising the processor 202 may be configured to control one or more functions of one or more elements of the user interface 206 through computer program instructions, for example, software and/or firmware, stored on a memory, for example, the at least one memory 204, and/or the like, accessible to the processor 202.

[0030] In an example embodiment, the apparatus 200 may include an electronic device. Some examples of the electronic device include communication device, media capturing device with communication capabilities, computing devices, and the like. Some examples of the communication device may include a mobile phone, a personal digital assistant (PDA), and the like. Some examples of computing device may include a laptop, a personal computer, and the like. In an example embodiment, the electronic device may include a user interface, for example, the user interface 206, having user interface circuitry and user interface software configured to facilitate a user to control at least one function of the electronic device through use of a display and further configured to respond to user inputs. In an example embodiment, the electronic device may include a display circuitry configured to display at least a portion of the user interface 206 of the electronic device. The display and display circuitry may be configured to facilitate the user to control at least one function of the electronic device.

[0031] In an example embodiment, the electronic device may be embodied as to include a transceiver. The transceiver may be any device operating or circuitry operating in accordance with software or otherwise embodied in hardware or a combination of hardware and software. For example, the processor 202 operating under software control, or the processor 202 embodied as an ASIC or FPGA specifically configured to perform the operations described herein, or a combination thereof, thereby configures the apparatus 200 or circuitry to perform the functions of the transceiver. The transceiver may be configured to receive media content from a plurality of media sources. Examples of media content may include audio content, video content, data, and a combination thereof.

[0032] In an example embodiment, the electronic device may be embodied so as to include a plurality of sensors 208 such as one or more image sensors, one or more object detection sensors, one or more distance sensors, one or more location sensors, and one or more position sensors. In an example embodiment, the plurality of sensors 208 may be in communication with the processor 202 and/or other components of the apparatus 200. The plurality of sensors 208 may be in communication with other imaging circuitries and/or software, and are configured to perform various functions. For example, the one or more image sensors may be configured to capture digital images or to make a video or other graphic media files. The one or more image sensors and other circuitries, in combination, may be an example of the camera module 122 of the device 100. In an example embodiment, the one or more object detection sensors may be configured to identify target objects and/or target portions associated with a scene. In an example embodiment, the one or more distance sensors may be configured to measure distances between various media capturing devices. In an example embodiment, the one or more location sensors may further be configured to measure distances between one or more objects or space points on the scene with respect to the multiple media capturing devices. In an example embodiment, the one or more distance sensors may further facilitate in gathering spatial information associated with the multiple media capturing devices. Example of the one or more location sensors may include radar sensors or sensors utilizing optical triangularization. In an example embodiment, the one or more position detection sensors may facilitate in detecting the location of the electronic device. Examples of the one or more position detection sensors may include outdoor satellite navigation based system like global positioning system (GPS) sensors and indoor positioning system to determine the location of the electronic device. In an example embodiment, the plurality of sensors may further include sensors such as accelerometers, gyroscopes and other such sensors for determining the 3D-orientation of the electronic device. In an example embodiment, the apparatus 200 may be configured to receive a spatial information such as position and location information of one or more objects of the scene. In an example embodiment, the one or more objects of the scene may include position detecting and indicating systems, for example, location sensors, for capturing and sending the position and the location information thereof to the apparatus 200.

[0033] The components 202, 204, 206, and 208 may hereinafter be referred to as components 202-208. These components 202-208 may communicate to each other via a centralized circuit system 210 for editing of media content. The centralized circuit system 210 may be various devices configured to, among other things, provide or enable communication between the components (202-208) of the apparatus 200. In certain embodiments, the centralized circuit system 210 may be a central printed circuit board (PCB) such as a motherboard, main board, system board, or logic board. The centralized circuit system 210 may also, or alternatively, include other printed circuit assemblies (PCAs) or communication channel media.

[0034] In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to receive a plurality of media contents associated with a scene from the plurality of media sources. Herein, the term 'scene' may refer to an arrangement (natural, manmade, sorted or assorted) of one or more objects of which images and/or videos that can be captured by media capturing devices. In various example embodiments, the media content of the plurality of media contents may include, but are not limited to, video with audio, directional audio, animation, collection of still images with background song/audio/music, light-field camera content, time synchronization information such as time code/time stamp information of the captured content and/or combinations of these.

100351 In an example embodiment, the media content may include any content that can be captured by the plurality of media sources. In an example embodiment, the plurality of media sources may include media capturing devices, for example, a camera, camcorders and the like, or that can be stored on multimedia storage devices, for example, hard disks, flash drives, RAM, ROM, disks and any computer-readable media, and that are capable of being played on multimedia systems and/or are capable of being displayed on the user interface 206 (for example, the display 116). In various example embodiments, the apparatus 200 may be caused to access or receive the plurality of media contents from the plurality of media sources external to the apparatus 200 and/or stored and/or otherwise accessible to the apparatus 200. Accessing or receiving the media content from external sources includes retrieving the media content from web server, local servers, and any other location from where the media content may be downloaded or streamed through wired or wireless connections. Examples of wired connections may include PSTN networks, ADSL networks, optical fibers, cable lines, and the like. Examples of wireless connections may include Bluetooth® networks, Zigbee® networks, IEEE 802.11x networks, and the like. Additionally or alternatively, the access of the media content includes a playback of the stored or streamed media content at the apparatus 200. In an example embodiment, a processing means may be configured to facilitate receipt of the plurality of media contents from the plurality of media sources at the apparatus 200. An example of the processing means may include the processor 202, which may be an example of the controller 108.

[0036] In an example embodiment, the plurality of media contents may include media contents that may be recorded and stored, and later retrieved for the purpose of processing. For example, during a live performance by a music band, the plurality of media contents may include video clips/footages of live performance, where the plurality of media sources such as multiple cameras may be located at different positions at the event location. The video clips/footages may be stored, for example, at a local memory device or in a central memory location (such as a server, cloud memory, and so on). The video clips/footages may be retrieved from the central location and processed further. For example, in a crowd-sourced event, the plurality of media sources may upload a media content at a central location, for example, a cloud memory, and the apparatus 200 may download the plurality of media contents for further processing. In an example embodiment, the processing (or editing) of the plurality of media contents at the apparatus 200 may facilitate in generating an entire set-up of the scene.

100371 In another example embodiment, the plurality of media contents may include multiple media contents that may be live media contents, being retrieved in real-time. For example, in case of a surveillance system, a plurality of surveillance cameras may be installed for capturing live media content (for example, videos) of a location, for example a shopping complex. The plurality of media contents may be simultaneously displayed at display devices, for example at a display system (or display screens) installed for monitoring the location. In yet another example embodiment, the plurality of media contents may additionally include depth information, for example depth maps being received by the plurality of media sources. In an example embodiment, the plurality of media sources may facilitate in provisioning of the depth information from a stereo camera. In an example embodiment, the depth information may be determined by a light field camera. Herein, the stereo camera and the light field camera are provided as examples of the plurality of media sources for capturing the depth information. It will, however, be noted that any device capable of determining the image depth information associated with various objects of the scene may be utilized as a media source of the plurality of media sources for the purpose of various embodiments. Herein, a depth map may be considered to represent the values related to the distance of the surfaces of the scene objects from a reference location, for example a view-point of an observer. A depth map is an image that may include per-pixel depth information or any similar information. For example, each sample in a depth map represents the distance of a respective texture sample or samples from a lens plane of the camera. In other words, if the z-axis is along the shooting axis of the cameras (and hence orthogonal to the lens planes), a sample in a depth map represents the value on the z-axis. Since depth maps are generated containing a depth value for each pixel in the image, they may be depicted as gray-level images or images containing only the luma component.

[0038] In an example embodiment, the media content may include time synchronization information associated with the media content. In an example embodiment, the time synchronization information may include timestamps or time codes associated with the media content. In an example embodiment, the time synchronization information may facilitate in synchronizing the plurality of media contents, for example the video contents and/or the audio contents, being received from the plurality of media sources. For instance, time synchronization may be performed between the plurality of media contents being received from the plurality of media sources in a multi-camera shooting of an activity such as a sports activity, a concert, and the like, where the plurality of media contents are required to be in exact synchronization. In some embodiments, the media content information may preclude time synchronization information associated with the media content. For instance, in scenarios where a target object may be selected/retrieved from different media contents irrespective of their time of appearance/occurrence in the said media contents, the time synchronization information may be precluded.

100391 In an example embodiment, the apparatus 200 may be caused to facilitate receipt of depth information associated with the scene from the plurality of media sources. In an example embodiment, the plurality of media sources may include light field cameras that may be capable of determining depth information associated with the scene. The plurality of light field cameras may send respective depth information associated with the scene to the apparatus 200. In an example embodiment, the apparatus 200 may be caused to receive the depth information from one or more of the plurality of media sources. In an example embodiment, a processing means may be configured to facilitate receipt of the depth information from one or more of the plurality of media sources at the apparatus 200. An example of the processing means may include the processor 202, which may be an example of the controller 108. It will be noted here, that the depth information may be received along with the media content. In one example embodiment, the media content may include the depth information associated with the scene. In another example embodiment, along with the media content that may be received, the depth information associated with the scene may also be received at the apparatus 200.

[0040] In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to receive spatial information and media source data associated with the plurality of media sources. In an example embodiment, the spatial information associated with the plurality of media sources may include one or more of a location information, a physical dimension information associated with the media source, a media source orientation information, a shooting direction information associated with the media source, and a distance information associated with the media source. Herein, the location information may refer to an information associated with 3D location, such as 3D geographical coordinates (horizontal coordinates and altitude) of the media sources of the plurality of media sources. Herein, the media source orientation information of the media source may include details regarding orientation of the media source. For example, the orientation information of the media source may be measured in terms of angular displacement of the media source with respect to a reference position. In an example embodiment, the 'shooting direction information' associated with the media source may refer to a direction in which a straight line perpendicular to the lens plane of the media source is pointing. In an example embodiment, the spatial information may also include the data associated with physical form and dimensions of the media sources, so as to facilitate in determining exactly (for example, to a millimeter precision) the location of the media sources, and the location of the lenses of the media sources, and orientation of the media sources. Herein, the information on the physical form and dimensions of the media source may include measures of the device chassis and position and measures of the lens system and imaging sensor. In an example embodiment, the spatial information associated with the plurality of media sources may include physical dimension information associated with the media sources. In an example embodiment, the physical dimension information of a media source may include information associated with physical dimensions of the media source as well as that of the optical components of the media source. In an example embodiment, the location information, the orientation information and the shooting direction information of the plurality of media sources may be received at the apparatus 200 via sensors such as (global positioning system) GPS, barometer, accelerometer and gyroscopic sensors.

[0041] In an example embodiment, the spatial information may further include distance information associated with respective distances of the one or more objects of the scene from the plurality of media sources. In an example embodiment, the spatial information may further include distance information associated with the relative distance between the media source and the at least one another media source. Further, the distance information may include information associated with a relative distance between the plurality of objects of the scene. In an example embodiment, the distance information may be determined from the depth information associated with the plurality of media contents. In an example embodiment, the distance information may be determined by one or more location detection sensors and/or distance detection sensors, such as radar sensors embodied in the media source. In another example embodiment, the one or more location detection sensors and/or distance detection sensors may be integrated within a device communicably coupled with the plurality of media sources and configured to retrieve the location and/or distances of the device with respect to the scene objects. Additionally or alternatively, the one or more objects associated the scene may monitor their respective locations, and may send respective location information to the apparatus 200. In an example embodiment, the one or more objects may include one or more position sensors for monitoring their respective locations. In an example embodiment, a processing means may be configured to facilitate receipt of the spatial information associated with the plurality of media sources at the apparatus 200. An example of the processing means may include the processor 202, which may be an example of the controller 108.

[0042] In an example embodiment, the media source data may include, media capture settings, and the electrical and optical specifications associated with the media source during capture of a media content. In an example embodiment, the media capture settings may include settings such as, shooting mode, scene mode, exposure value, focusing distance, focal length, resolution, image/video quality, and various other such media source settings associated with a media content being captured by the media source. In an example embodiment, the electrical and optical specifications associated with the media source may include refractive indexes of the lenses, image sensor specifications, and so on. In an example embodiment, the media source data of the plurality of media sources may be received at the apparatus 200 via a data network. In an example embodiment, a processing means may be configured to facilitate receipt of the media source data associated with the plurality of media sources at the apparatus 200. An example of the processing means may include the processor 202, which may be an example of the controller 108.

[0043] In an example embodiment, the apparatus 200 may be caused to retrieve and display the plurality of media contents received from the plurality of media sources. In an example embodiment, the apparatus 200 may be caused to display the plurality of media contents on a user interface, for example the UI 206 of the apparatus 200. In an example embodiment, the apparatus 200 may further facilitate selection of at least one target portion, for example, the image of a guitarist of a band performing on stage, in one of the media content of the plurality of media contents. For instance, the apparatus 200 may facilitate in displaying the media contents (footages of the event). An example illustrating and describing the retrieval and display of the plurality of media contents is explained further with reference to FIGURE 3.

100441 In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to facilitate selection of at least one target portion associated with the scene in a media content of the plurality of media contents. In an example embodiment, the selection of the at least one target portion in one of the media content may be performed manually or at least in parts or under certain circumstances automatically. In an example embodiment, a user may utilize a user interface, for example, the UI 206 for selecting the at least one target portion in the media content. An example illustrating displaying of the selection of the at least one target portion in the media content is illustrated and described in further detail with reference to FIGURE 3.

[0045] In an example embodiment, the selection of the at least one target portion in the media content may include enclosing the target portion within a boundary in the media content. For example, a boundary may be drawn by means of a pointing device around the target portion in the media content. In an example embodiment, the selection of the at least one target portion may be performed by any other means also, for example, by pointing a point device towards a target portion, by tapping on the target portion, by swiping on the target portion, and various other such gestures. It will be noted that the selection of the at least one target portion in the media content may be performed by any of the above stated means and/or by another other means that are not stated herein, without limiting the scope of various embodiments.

[0046] In an example embodiment, the selection of the at least one target portion in the media content may include cropping of the at least one first target portion. For instance, on facilitating display of the plurality of media content at the apparatus 200, the apparatus 200 may prompt the user to select the target portion in one of the media content being displayed. The user may select the target portion, for example by enclosing the target portion in one of the media content within a boundary, for example a rectangular boundary. In an example embodiment, in response to the selection of the target portion within the boundary, a sub-frame comprising the target portion may be cropped from a frame in which the target portion is displayed. In an example embodiment, on cropping the target portion, said media content may display only the selected target portion, and may remove the remaining portions of the media content. It will be noted herein that the selection of the at least one target portion in the media content may include selection of visual content, audio content, image depth information and time stamp information associated with the at least one target portion.

[0047] In an example embodiment, the apparatus 200 may be caused to determine, in at least one another media content of the plurality of media contents, position, geometrical representation of a corresponding target portion. In an example embodiment, the geometrical representation of the corresponding target portion may include size, form or any other dimensional attribute of the corresponding target portion. In an example embodiment, the at least one another media content may be captured by at least one another media source of the plurality of media sources. In an example embodiment, the 'corresponding target portion' may refer to a target portion in the at least one another media content that is corresponding to the target portion in the media content, so that, for example, a same object or physical location in the scene is present in both the target portion and the corresponding target portion. For instance, in an event of a live performance by a guitarist that is recorded by a plurality of media sources, the target portion may be the guitarist. The guitarist in one of the media content may be selected, by for example, a user. Based on the selection of the guitarist in one of the recorded media content (such as a video footage) of the event, the position of the guitarist in at least one another video footage of the event may be determined. In an example embodiment, a processing means may be configured to determine the position of a corresponding target portion in at least one another media content of the plurality of media contents. An example of the processing means may include the processor 202, which may be an example of the controller 108.

[0048] In an example embodiment, the position of the corresponding target portion may be determined based at least on the spatial information, media source data of the media source and the at least one another media source. In an example embodiment, the apparatus 200 may be caused to determine the position of the corresponding target portion by determining an angle of view of the media source with respect to the at least one target portion. Herein, the angle of view may be referred to as the angular extent of a given scene that is imaged by a media source with the apex of the angle in the optical center of the lens system. The angle of view may consist of the vertical and horizontal angles that bound the given scene. In an example embodiment, the apparatus 200 may be caused to determine the angle of view around the target portion of the media source based at least on the spatial information and media source data associated with the media source and selection of the target portion (for example, a user defined boundary around the target portion on the media content). In an example embodiment, the apparatus 200 may further be caused to determine a corresponding angle of view of the at least one another media source with respect to the at least one target portion based on the angle of view of the media source with respect to the at least one target portion. In an example embodiment, the position of the corresponding target portion in the at least one another media content may be determined based on the corresponding angle of view, [0049] In an example embodiment, a user of the apparatus 200 may define a reference space point of the scene on the screen of the apparatus 200 by for example, touching that point on the screen. In response, the apparatus 200 may determine the xyz coordinates of the reference space point. In addition, the user may also define horizontal angle of view of the media source so as to cover the target portion. An example of defining the horizontal angle of view on the media source is explained in detail with reference to FIGURE 4. In an example embodiment, the corresponding angles of view may be defined for the other plurality of media sources. In an example embodiment, based on the corresponding angles of view, the apparatus 200 may be caused to determine the position of the reference space point on another media source screen. In an example embodiment, apparatus 200 may further be caused to determine the corresponding horizontal angle of view of the media source and then a corresponding horizontal angle of view is determined for the one another media source. In the similar manner, the apparatus 200 may be caused to define the vertical angle of view of the media source and then the corresponding vertical angle of view is determined for the one another media source. The horizontal and vertical angles of views on the at least one another media sources may define the corresponding target portion.

[0050] In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to crop the corresponding target portion from the at least one another media content based on the determination of the position and geometrical representation of the corresponding target portion. In an example embodiment, on selecting the position and geometrical representation of the corresponding target portion in the at least one another media content, the apparatus 200 may crop the corresponding target portion in the at least one another media content. In an example embodiment, the cropping of the corresponding target portion from the at least one another media content may include cropping a sub-frame having the corresponding target portion from the at least one another media content. In an example embodiment, the apparatus 200 may further be caused to perform image stabilization when the sub-frames are cropped from original frames of the at least one another media content. Various cropped image based stabilization methods may be utilized for performing image stabilization of the cropped target portions, for example digital image stabilization which uses pixels outside the target portion sub-frame to provide a buffer for the motion. In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to display the corresponding target portion being cropped from the at least one another media content. In an example embodiment, the cropped at least one media content may be displayed on the UI, for example, the UI 206 of the apparatus 200. In an example embodiment, the cropped at least one media content may be stored in the memory 204. In an example embodiment, the at least one media content from the plurality of media contents may be edited to generate an edited media content, for example a video content such that the edited media content may include only the cropped at least one media content from the plurality of media contents, and preclude other portions of the media content.

100511 In an example embodiment, the at least one target portion and the corresponding target portion may facilitate in processing of the media content associated with the scene. The processing of the plurality of media contents being captured by the plurality of media sources may be performed in various scenarios. In an example scenario, such as during a live performance by a music band, the live performance may be captured by the plurality of media sources (for example, multiple cameras), located at different positions at the event location. In an example scenario, the director of a live TV show may use the media sources online to select media sources and target portions for the line out video feed that may either be live-broadcasted or further processed in post-production. In another example scenario, all the media content (for example, non-edited raw media content), without any target selection or cropping, from the media sources may be stored. In an example embodiment, the stored media content may later be edited, by selecting target portions and cropping frames, in a post-production phase. The media content from multiple media sources may be processed so as to generate a media content associated with the event, that may be a high quality media content.

[0052] In still another example embodiment, the media content captured by multiple media capturing devices may include audio content. The audio content captured by the multiple media capturing devices may be utilized for enhancing an audio content associated with the media content. For example, the multiple media capturing devices may perform a video recording of an event. The recording may include video content as well as audio content associated with the event. In an example embodiment, the audio content from the multiple media capturing devices may be retrieved, and processed for enhancing the audio content in the video sequence. In one embodiment, the media capturing devices may capture directional audio content of the scene. Herein, the term 'directional audio content' may refer to audio sounds from various sources on the scenes the locations of which may be identified.

[0053] For example, the video content recorded by the plurality of media sources during the event may include an audio content (or audio track) which may not be clearly audible in the recording received from one or more of the media contents of the plurality of media contents. However, during the editing of the recorded video, the quality of the audio track may be enhanced, by for example, retrieving directional audio content from the plurality of media contents. For instance, in a live concert, a performer may play guitar and there is a lot of disturbing background/ambient noise by the audience, and the performance may be recorded by the plurality of media sources. However, the recoding of the event may not include good quality audio content of the guitar playing. In such a scenario, the apparatus 200 may be caused to receive audio content, such as directional audio content from one or more media contents of the plurality of media sources. In an example embodiment, the directional audio content received from the one or more media sources may be processed, for example, during an editing stage, so as to improve the audio quality in the edited media content. For example, the directional audio content pertaining to the music played on the guitar may be received from the video recordings of a live performance by the guitarist, and may be utilized for improving the audio quality of the processed media content. The directional audio content received from the plurality of media sources may include a part of audio content that may be received from the plurality of media sources. In particular, the directional audio content may be associated with a particular object, for instance an instrument such as a guitar or a speech audio, etc. In an example embodiment, the directional audio content may be retrieved by selecting the audio content being originated/generated by a particular object, and cropping the selected audio content from the plurality of audio contents. In an example embodiment, along with the video content, the plurality of media sources may record audio content associated with the scene. In an example embodiment, the plurality of media sources may record the audio content being received at the media source from all the directions, and a corresponding angle of reception of the media source with respect to the objects generating the audio content. For example, in a live event where a guitarist is singing along with playing the guitar, a media source of the plurality of media sources may receive sounds of guitarist singing, guitar sound, audience sound, any other instrument such as a drum being played, and so on. Here, the media source may receive/record the directional sound based on reception angle of the source of the sound with respect to the sound source. In an example embodiment, the media sources may include directional microphones and/or other microphones for capturing the audio content being received from all directions even in the video capturing phase. In an example embodiment, during editing once the corresponding target portion or angle of view is determined in the other media sources of the plurality of media sources, the apparatus 200 may retrieve only the directional audio content in the direction and angle of view of the corresponding target portion.

100541 In another example scenario, the plurality of media sources may include a plurality of surveillance cameras that may be installed for capturing live videos of a location, for example a shopping complex, the video content may be presented in real-time on surveillance monitor screens. In another example embodiment, the media content being recorded on the plurality of surveillance cameras may be retrieved and processed for further analysis. In an example embodiment, the media content captured at each of the surveillance cameras may be retrieved, and analyzed further to obtain analysis results of various objects of the scene into a single video sequence. In another example scenario, a user may select a target in the live video in real time or in the video recording being retrieved from one of the plurality of surveillance cameras, and may track the same target on at least one another surveillance camera of the plurality of surveillance cameras. In yet another example scenario, the plurality of media sources may be utilized for directing live multi-camera shooting of a video or TV show. In an example scenario, a director directing the film may define targets to be filmed in a scene, and may deploy a plurality of media sources (for example, remote controlled cameras, or cameras with cameramen) for capturing the defined targets. In another example scenario, a director may select media sources and define targets to line out video feed to be broadcasted in real-time. In another scenario, the plurality of media sources may be utilized for editing the recorded footage in the video post-production phase. In an example scenario, the director or editor may receive a plurality of recorded media contents of the scene from the individual cameras at an electronic device (such as a high resolution, wide angle of view camera), and may define a target or target portion on one media source, see the corresponding targets or target portions on other media sources, and select the most suitable targets or target portions views from the captured material to generate an edited media content for the film. It will be understood that the example scenarios disclosed herein are for illustrative purposes, and may not be construed as limiting to the disclosed embodiments.

[0055] In an example embodiment, the apparatus 200 may further facilitate in processing of media content such as images and videos. In an example embodiment, in case of videos the at least one target portion may be mobile (or in motion) and the distance from the apparatus 200 may change with time. In this example embodiment, the apparatus 200 may be caused to facilitate receipt of depth information associated with the scene from the plurality of media sources. In an example embodiment, the plurality of media sources may include light field cameras for recording the depth information associated with the scene. In an example embodiment, a light field camera may capture a 2D image, called as a light-field image. In an example embodiment, the light-field image may include a plurality of small images associated with the scene that may include depth information associated with the scene. In this embodiment, the plurality of media sources that are configured to capture the light-field image may include light-field cameras. The light-field cameras may include an array of micro lenses that enables the light-field camera to record not only image intensity, but also the distribution of intensity in different directions at each point. In the present example embodiment, on selection of the at least one target portion on the media content, the apparatus 200 may be caused to display the at least one target portion as sharp and in focus. In an example embodiment, the other portions of the media content (i.e. the portions excluding the at least one target portion) may be displayed as blurred and out of focus. In some example embodiment, the corresponding at least one target portions may also be displayed as sharp and in focus in the at least one another media content, while the other portions of the at least one another media content (i.e. the portions excluding the at least one target portion) may be displayed as blurred and out of focus.

[0056] In an example embodiment, the apparatus 200 may be caused to define a 3D (dimensional) trajectory of the at least one target portion based on the depth information being received from a media source of the plurality of media sources. In an example embodiment, a target portion may be selected in a media source from among the plurality of media sources, and based on the target portion a space point may be selected. In an example embodiment, the target portion such as an object of the scene may be in motion. In an embodiment, the user may follow the moving object, for example a bird flying towards the apparatus 200, on the play-back video by touching the screen with finger on the bird. Alternatively, a pattern recognition software may follow the object to define the 3D trajectory, and the apparatus 200 may determine the corresponding pixels on the screen and the depth data of the corresponding pixels, and based on the pixels and the depth data may determine the xyz coordinates. Herein, since the xyz coordinates may change with time due to object being in motion, for example, the flying bird. In an example embodiment, the corresponding target portion and the corresponding space point may be a function of time, and may change with time. In an example embodiment, the apparatus 200 may determine, the corresponding pixels based on the distance and direction from the apparatus 200 and the location of the apparatus 200. In an example embodiment, the 3D trajectory may define the target object/portion or the part of the scene (for example, the bird) that is in focus as a function of time. In this embodiment, the xyz coordinates of the moving object obtained from one media source may determine the xyz coordinates and the trajectories being followed, and thereby facilitate the apparatus 200 to determine the corresponding trajectory of the target portion. In an example embodiment, using light-field cameras, the moving object/target portion along the trajectory may be set to be sharp and in focus in the media content. Further, the apparatus 200 may determine the corresponding target portion along the corresponding trajectory in another media content. The corresponding target portion may also appear to be in focus and sharp, while the other parts of the image (i.e. the parts of the another media content excluding the corresponding target portion) may remain out of focus and blurred.

[0057] Some example embodiments of processing of multi-camera images are further described with reference to FIGURES 3 to 7. It shall be noted that, the FIGURES 3 to 7 represent one or more example embodiments only, and should not be considered limiting to the scope of the various example embodiments.

[0058] FIGURE 3 illustrates an example representation 300 of processing of multi-camera media content in accordance with an example embodiment. In an example scenario, the representation pertains to video recording of a scene by a plurality of media sources 302, 304, and 306. In an example scenario, the scene includes a person 308 walking across a room. In an example embodiment, the person 308 is filmed by three media sources, such as the media sources 302, 304, and 306. Examples of the media sources may include media capturing devices. Examples of media capturing devices may include handheld camera devices, light field camera, mobile phone, PDA, laptop, and any other device that may be capable to capturing the recording of the scene.

[0059] In an example embodiment, the plurality of media sources may be configured to record the media content from a plurality of distinct locations. For example, as illustrated in FIGURE 3, the plurality of media sources 302, 304, and 306 are located at distinct locations around the person 308 so as to record media content, for example media contents 310, 312 and 314, respectively. In an example embodiment, the plurality of media sources 302, 304, and 306 may be configured to record video content associated with the scene. Additionally or alternatively, the plurality of media sources 302, 304, and 306 may be configured to record directional audio content associated with the scene. Additionally or alternatively, the plurality of media sources 302, 304, and 306 may be configured to record light field information and depth information associated with the media content. In an example embodiment, the depth information may facilitate in focusing the recorded media content to desired distances from the media source. In an example embodiment, the plurality of media sources 302, 304, and 306 may include light field cameras, for recording the depth information associated with the scene.

[0060] In an example embodiment, the plurality of media sources 302, 304, and 306 may be operated by a plurality of videographers. For example, there may be a distinct videographer that may be assigned to operate a media source of the plurality of media sources 302, 304, and 306. In another example embodiment, instead of having multiple videographers that may hold the media sources 302, 304, and 306 for recording the respective media content, the plurality of media sources 302, 304, and 306 may be un-manned (not held by any videographer). Instead, the plurality of media sources 302, 304, and 306 may be positioned at a plurality of distinct locations, and may be controlled by programmed media source settings thereof. In another example embodiment, the plurality of media sources 302, 304, and 306 (unmanned media source) may be controlled by a director/operator at a centralized location. The plurality of media sources 302, 304, and 306 may be adjusted to respective settings for capturing the recording of the event. In yet another example embodiment, the plurality of media contents may be recorded by a plurality of users (for example, in a crowd sourced event), where each of the plurality of users may possess a respective media source.

[0061] In some example embodiments, the plurality of videographers may be instructed on techniques to shoot the scene. Alternatively, the plurality of media sources 302, 304, and 306 may be adjusted so as to point in an approximate direction of the event, and shoot with a wide angle so as to capture as much as possible of the filmed target or action and the surroundings of the scene. In an example embodiment, one or more of the plurality of media sources 302, 304, and 306 may be adjusted so as to change any setting thereof. For example, one of the media source may be adjusted to change a zoom setting thereof. In another example scenario, one of the media sources may be adjusted to change an orientation setting thereof.

[0062] In an example embodiment, the media sources may also record source information. Herein, the source information associated with a media source may include spatial information and media source data. In an example embodiment, the spatial information may include location information, source media dimensions information, orientation information and shooting direction information associated with the media source. Herein, the location information may refer to information associated with horizontal location and vertical altitude of the media sources of the plurality of media sources 302, 304, and 306. The orientation information may include 3D orientation of the media sources 302, 304, and 306 as well as the shooting direction of the media sources 302, 304, and 306. The source information may also include settings information associated with the media sources 302, 304, and 306 so that the entire multi-camera shooting set-up 3D geometry may be reconstructed afterwards, for example during editing/processing of the media content. In an example embodiment, on changing one or more settings of the media source, the changed setting may also be transferred/sent to an apparatus, for example an apparatus 320 for processing. In an example embodiment, the apparatus 320 may be an example of the apparatus 200 (FIGURE 2).

[0063] In an example scenario of a crowd sourced event, where a plurality of users/videographers in a crowd may shoot/record media content from a respective media source, the respective media source may be equipped with a capability to record the media content as well as the spatial information and media source data thereof. Additionally, the plurality of media sources 302, 304, and 306 may be equipped with the capability to send/transmit the media content as well as the spatial information and media source data thereof, to for instance an apparatus such as the apparatus 200. For example, the plurality of media sources 302, 304, 306 may send the media contents 310, 312 and 314 respectively as well as the spatial information and media source data thereof, respectively through for example a data network.

[0064] The plurality of media contents being recorded by the plurality of media sources 302, 304, and 306 may be uploaded/sent to an apparatus, for example the apparatus 200. Additionally, the spatial information and media source data may be uploaded/sent to the apparatus 320.

100651 As discussed, with reference to FIGURE 2, the apparatus 200 may be configured to process the plurality of media contents to generate a processed media content associated with the scene. In an example embodiment, processing of the plurality of media contents being received from the plurality of media sources 302, 304, and 306 may include editing of the plurality of media contents so as to generate a high quality processed media content. For instance, in an event, a video may be recorded by two or more media sources, and a directional audio of the event may be recorded by other one or more media sources. The video and the directional audio recorded by the media sources may be received at the apparatus 320. In an example embodiment, the apparatus 320 may process the video content received from the two or more media sources to enhance the video content associated with the event, and may process the directional audio content from the two or more media sources to enhance the audio content associated with the event.

[0066] In an example embodiment, the plurality of media contents being received from the plurality of media sources may be displayed at a user interface of the apparatus 320. In an example embodiment, the user interface may include a display screen such as a display screen 322 that may include display windows, each corresponding to a media source of the plurality of media sources 302, 304 and 306. For example, the display screen 322 may display windows 324, 326 and 328, corresponding to the plurality of media sources 302, 304 and 306, respectively. In an example embodiment, the display windows 324, 326 and 328 further include a corresponding sub-screens 330, 332, and 334 for displaying levels of audio content and/or audio information being received from the plurality of media sources 302, 304, and 306. In an example embodiment, the display screen 322 may include a plurality of sub-screens for displaying light field data and depth maps associated with the plurality of media contents.

[0067] In an example embodiment, the apparatus 320 may further receive the source information such as the spatial information and media source data associated with the plurality of media sources 302, 304, and 306. For example, the apparatus 320 may receive the coordinates of the location and altitude associated with each of the media sources 302, 304, and 306. In an example embodiment, the plurality of media sources 302, 304, and 306 may be GPS or some indoor positioning enabled media source, and may send the location co-ordinates of the respective media source to the apparatus 320. In addition, the plurality of media sources 302, 304, and 306 may send the media source settings associated with the respective media source.

[0068] In an example embodiment, the display screen 322 may enable a selection of at least one target portion in at least one of the media content being displayed at the display screen 322. For instance, a user may select a target portion in the display window 328. For example, the user may select the person displayed in the window 328. In an example embodiment, the user may select the target portion by means of the user interface. In an example embodiment, the user may provide a user input on the user interface to select the target portion. In an example embodiment, on one window, for example the window 328, the user may select the target portion by selecting a key point/position or a key area (for example, a rectangular sub-frame). For instance, the user may select the target portion by drawing a rectangle, such as a rectangle 336 around the target portion. On touching the screen, a dashed-line boundary, for example a rectangular shaped boundary may appear around the target portion. In an example embodiment, the user may adjust the selection boundaries to a desired size and position. In an example embodiment, the media content being accessed by the user for selecting the target portion may appear in a window, for example, a window 338. In an example embodiment, on selecting the boundary of the target portion by the user, the apparatus 320 may determine reference point related to the target portion. For example, the apparatus 302may select a point at the center of the boundary as the reference point. In an example scenario, a reference space point may be selected on the media content. In an example embodiment, the reference space point may be selected at least in parts or under certain circumstances by the apparatus 320. An example of the reference point determination for a target portion is explained in detail in FIGURE 4.

[0069] In another example scenario, the key point may be selected, by for example, the user based on a user-input. For example, the user may select a point on a user interface of the apparatus 320. In an example embodiment, upon user selection, the apparatus 320 may be configured to determine the location of the point. For example, the apparatus 200 may be configured to determine the 3D geographical coordinates (such as xyz coordinates) of the point. In yet another example embodiment, the user may select an object as a target portion. For example, the user may select a person as the target portion. In an example embodiment, the object may be determined by the apparatus 320, based on, for example, an object detection technique. In an example embodiment, the apparatus 320 may identify one or more objects, and the UI may display the identified objects. In an example embodiment, the identified objects may be associated with a reference point, for example a reference point xyz. An example of the reference point determination for a target portion is explained in detail with further in FIGURE 4.

[0070] In an example embodiment, when the user selects the target portion on one of the media contents, for example for the media content displayed in the window 328, the corresponding target portions may also be determined. In an example embodiment, the corresponding target portions may be determined at least in part or under certain circumstances automatically, by the apparatus 320. In an example embodiment, the corresponding target portions may be determined automatically and displayed in the other media contents, such as the media contents displayed in the windows 324, 326.

[0071] In an example embodiment, on selection of the target portions in the plurality of media contents, the target portions may be cropped and only the cropped target portion may be shown in windows such as windows 340, 342 and 344, and remaining portion of the media content may be removed from the display screen 322. In another example embodiment, the cropped target portion may be shown in a separate window on the display screen 322. In an example embodiment, the cropping of the corresponding target portions in the windows 324, 326 may be performed based on the spatial information and media source data associated with the corresponding media sources that have been recorded during the multi-camera shooting.

[0072] In an example embodiment, the apparatus 320 may facilitate in selection of a plurality of target portions in the media content. For example, once the user selects a target portion by enclosing the target portion within a boundary, the apparatus 320 may provide an option of selecting one or more target portions in the media content. In an example scenario, the apparatus 320 may facilitate selection of an audio content associated with the target portion of the media content. In an example embodiment, the audio content may be selected in the media content by touching a sound source area or areas on one screen and the apparatus 320 determines the corresponding sound source location on other screens and filters only the sound coming from those sources to be used.

[0073] In an example embodiment, for selecting a mobile target portion such as a person walking in a room, the person in multiple frames showing movement of the person may be selected, and in response of said selection, the selected windows/sub-frames in rest of the other media sources may be selected by the apparatus 320. In an example embodiment, the selected sub-frames in rest of the other media sources may be cropped. In an example embodiment, the cropped target portions from the plurality of media contents may be displayed on the display screen 322 of the apparatus 320. After the cropping of the target media content, the display screen 322 may display the cropped target portion from different media portions, where the cropped target portion includes unwanted visual content cropped away. In an example embodiment, an editor for example a user or a processor associated with the apparatus 320 (or the apparatus 200) may utilize the sub-frames having the cropped media content, and ensure better visual/audio content and continuity from one clip to another in the edited media content.

100741 In an example embodiment, the processing of the plurality of media contents, as described above may be performed once the event/recording of the scene is over. For example, the plurality of media sources may record the footages of the complete event by using the plurality of media sources, and after the event is over the footages, (the plurality of media contents) may be sent to the apparatus 320. The apparatus 320 may generate an edited media content from the plurality of media contents. In an example embodiment, the edited media content may be generated based on manual intervention by a user. In an example embodiment, the edited media content may be generated at least in parts or under certain circumstances manually. In another example embodiment, the plurality of media contents may be simultaneously recorded and streamed to the apparatus 320 during the event, and may be edited to generate an edited media content. For example, during an event the multiple cameras may record the event from different directions and locations, and may send the content to the apparatus 320 in real-time. The apparatus 320 may generate an edited media content from the plurality of media contents in real-time using predefined editing algorithms. For example, in case of real-time streaming of the media content, the apparatus 320 may generate an edited media content that may keep in focus the target portion such as a performer in the event while the other parts of the image may remain more or less blurred. In an example scenario, the multi-camera system may be incorporated in a surveillance system installed at a location such a shopping mall, where the plurality of media sources (surveillance cameras) may perform real-time recording in the mall. The footages of the plurality of the surveillance cameras may be transferred/sent to the apparatus 320. The apparatus 320 may process the plurality of footages of the mall being received from the plurality of surveillance cameras, and may process the plurality of footages to generate an edited media content. For instance, the plurality of footages being received from the plurality of surveillance cameras may be processed/edited by the apparatus 320 for generating an edited/processed media content that may keep in focus one target portion in the media content. In an example scenario, the target portion may be a suspected person or a performer performing in an event in the mall or a product being displayed for sale in the mall, and so on and so forth.

[0075] FIGURE 4 is an example representation of a multi-camera system 400 in accordance with an example embodiment. As illustrated herein, the multi-camera system 400 is shown to include a plurality of media sources such as media sources 402, 404, 406 for individually capturing a plurality of media contents associated with a scene. For example, the media sources 402, 404, 406 may capture images of a target in the scene, for example a target 408. In an example embodiment, the multi-camera system 400 may facilitate in capturing a plurality of media contents associated with the target, for example, from the plurality of media sources such as media sources, 402, 404, 406, and process the captured media content. In an example embodiment, during processing, the position and the geometrical representation of the chosen target portion of the target 408 may be determined in the plurality of media sources 402, 404, 406 for effectively processing the media content. In an example embodiment, for determining the position and the geometrical representation of the chosen target portion of the target 408 in the plurality of media sources 402, 404, 406, the angles of view of the media sources 402, 404, 406 with respect to the target 408 may be determined.

100761 In an example embodiment, the media sources 402, 404, 406 may be associated with a maximum angle of view. In an example embodiment, a maximum horizontal angle of view of a media source may refer to the angle with an apex in the optical center of the camera lens system and the sides of the angle define a maximum horizontal angular extent of the scene that may be captured by the media source. For example, the maximum horizontal angles of view of the media sources 402, 404, 406 may include angles 410, 412, 414 respectively as illustrated in FIGURE 4. In an example embodiment, the horizontal angle of view of the media source with respect to the target portion, for example target portion/point 422 of the target 408, may be defined to extend to a distance (for example, a distance r) to a right side 424 and a left side 428 of the target point 422 as seen from the media source 402. The lines 424 and 428 are perpendicular to the straight line 426 from the target point 422 to the optical center of lens system of media source 402. The horizontal angle of view of the media source 402 with respect to the target portion/point 422 of the target 408 is represented as angle 416.

[0077] In an example embodiment, the angle of view of the media source 402 with respect to the target portion or a point 422 (reference space point) located on the target 408 may be determined based at least on the spatial information, the media source data associated with the media source 402, and the depth information associated with the media content captured by the media source 402. In an example embodiment, first a reference space point in the media content may be selected. For example, a reference space point 422 may be selected on the target 408. In an example embodiment, the reference space point 422 may be selected on a viewfinder screen of the media source 402 (which may be in a shooting phase). In another example scenario, the reference space point 422 may be selected by the user on a display screen of an editing device (for example, in a post-capture editing phase). In an example embodiment, based on the selection, for example upon user touching the reference space point 422 on the device, the reference space point 422 on the screen may indicate horizontal x, y co-ordinates of the reference space point 422. In an example embodiment, the coordinates of the reference space point 422 may be determined by the device based on the spatial information and the media source data of the media source 402 and the direction and the distance of the point x,y from media source 402. In an example embodiment, the direction and the distance of the point x,y from the media source 402 may be determined based on the media source data or from the media source's radar data. Additionally, the horizontal x,y co-ordinates of the reference space point 422 may be determined based on the depth information associated with the target point 408. In an example embodiment, the depth information associated with the target point 408 may be determined based on the light field camera depth information, depth map associated with the scene or any such means that may be capable of providing the depth information associated with the scene.

[0078] In an example embodiment, upon determination of the reference space point 422, a horizontal distance for example a distance 'r' may be defined, for example, by dragging the line 424 through the reference space point 422. In an example embodiment, the line 424 may be proposed by the UI. In an example embodiment, the line 424 may be perpendicular to the straight line 426 passing through the reference space point 422 and the center of the lens of the media source 402. In an example embodiment, the UI may draw the corresponding line 428 to the left of the reference space point 422. In an example embodiment, the length of the line 424 may be defined by the user, so that the horizontal width of the target object is approximately within the distance Herein, the length of the lines 424, 428 (i.e. approximately, 2r) and additionally the distance 426 (i.e. the distance of the reference space point 422 from the optical center of lens system of the media source 402) may define the horizontal angle of view 416 and direction of the reference space point 422 from the media source 402.

[0079] In an example embodiment, a corresponding reference space point and a corresponding angle of view of the at least one another media source with respect to the at least one target portion may be determined. In an example embodiment, the corresponding reference space point and the corresponding angles of view 418, 420 for the media sources 404, 406 respectively may be determined based on the angle of view 416 of the media source 402 with respect to the target portion 422 and the spatial information, media content and media source data associated with the at least one another media source. In an example embodiment, the horizontal angles of view of the media sources 404 and 406 may be determined to be within the distance of r-' to the left side and to the right side of the reference space point 422 in the views of media sources 404 and 406, the lines r corresponding to media sources 404 and 406 being perpendicular to the straight lines from the reference space point 422 to the media sources 404 and 406. It will be noted that angles of view of the media sources 402, 404, 406 with respect to the target 408 may be different due to, for example, distance from the target portion 422, camera settings, and various similar factors.

[0080] Herein, the representation of a multi-camera system 400 is shown to include a 2D horizontal plane, for the sake of brevity of description and ease of understanding. It will however be understood that various embodiment explained herein may be applied similarly to include the vertical z-axis. In an example embodiment, the angles of view for both the horizontal direction and the vertical direction are determined, and the system 400 may take into account the aspect ratio limitation. For example, in case the output media content has an aspect ratio of 16:9, then in the example described above, the vertical distance defining the vertical angle of view may be 9/16 *2* r.

100811 In an example embodiment, defining the 3D target portion may be done by a third co-ordinate data for example z-coordinate data to the 2D data associated with the reference space point 422. For example, a vertical distance 'h' may be defined from the 2D point (x,y) in the direction of z-axis in the positive and negative direction. In an example embodiment, the aspect ratio of the target portion 422 and the corresponding target portion may be the same 2r/2h=r/h. In an example embodiment, the apparatus 200 may facilitate in adjusting the lengths r and h in the media source 402, so that the aspect ratio of the images are of some standard or often used value, for example 16:9.

100821 In one embodiment, the set-up of the media sources/cameras on the scene with the spatial information, media source locations, shooting directions, maximum angle of view, chosen angles of views etc. (as shown in FIGURE 4) can be shown as a view on a controlling device (for example, a display of the UI of the apparatus 200/320), to facilitate the shooting process and later in the editing process to show the reconstructed setup of the shooting event and facilitate editing. The UI may further facilitate in defining the angles of views and target portions on this "map" view of the display screen, and the corresponding angles of view to be formed.

[0083] FIGURE 5 is an example representation of capturing of media content by a multi-camera system, in accordance with an example embodiment. In an example embodiment, the multi-camera system 400 may be an example of the multi-camera system being utilized for capturing of media content. In an example embodiment, the multi-camera system may be utilized for directing a multi-camera shooting of a scene 500 with camera locations 502, 504 and 506. For example, a plurality of media sources associated with the multi-camera system 400 may be utilized for directing multi-camera shooting of a film.

[0084] In an example scenario, a director directing the film may utilize one of the media source devices of the locations 502, 504 or 506 as a master device. On the master device screen, the director may have a wide angle, high resolution view of the whole scene 500 from/on the master screen the director may define the image portions that the individual cameras may capture. For example, as illustrated in FIGURE 5, the director may, using the master view of the scene 500, deploy a plurality of cameras at locations such as 502, 504, and 506. In an example embodiment, the director may define targets by defining a plurality of frames for the plurality of cameras at locations such as location 502, 504, and 506. In FIGURE 5, the director has defined three different targets on the scene 500. The cameras at the locations 502, 504 and 506 may capture those targets from the three camera locations. It will be noted that the actual camera views of the targets of the three cameras are not shown here except for the one camera, the screen of which is used as the master screen. In general, the multiple cameras may have different shooting directions and may show different contents of the targets. In another example embodiment, the director may also define the target for all the cameras to be the same, the shooting direction being different, and the identical frames of 502, 504, and 506 would be on the same spot on the scene 500. In one embodiment, the cameras may be unmanned and the director may remotely control the cameras. In one embodiment, all the remotely controlled cameras may operate based on the definitions of the director and preprogrammed software setting algorithm for the cameras. In one embodiment, the director may define camera specific and scene specific adjustments in the remotely controlled camera set-up. Examples of the camera setting may include, but are not limited to, focal length, distance from the target, zoom setting, and so on. In one embodiment, the cameras may be operated by individual camera men/videographers, and the shooting/target choices and definitions by the director may be taken as advises and instructions to operate the camera. The camera men/videographers may perform the final camera adjustments by themselves/manually. In one embodiment, the cameras may shoot the scene 500 independently without continuous instructions and control adjustments from the director, and the captured footage is afterwards, in the editing phase, processed with the help of the spatial information and media source data and the editor choices. In the editing phase, the editor may choose target portion in one media source and the apparatus may determine the corresponding target portions in the other camera views and the final editing may be done with the cropped target portions.

100851 FIGURE 6 is a flowchart depicting example method 600 for processing of multi-camera media content, in accordance with an example embodiment. In an example embodiment, the method 600 includes facilitating receipt of a plurality of media contents from a multi-camera system such as a multi-camera system 400 (FIGURE 4), at an apparatus, for example the apparatus 200 (FIGURE 2) and performing editing of the plurality of media contents based on a source information associated with a plurality of media sources associated with the multi-camera system. The method 600 depicted in the flow chart may be executed by, for example, the apparatus 200 of FIGURE 2.

[0086] At 602, a plurality of media contents associated with a scene may be received from a plurality of media sources. In an example embodiment, the plurality of media contents may include one or more of video content, an audio content, light field camera data, and a depth map associated with the scene. In an example embodiment, the scene may be associated with an event, for example, a live performance by a guitar artist. In such an example scenario, each of the plurality of media contents may include a video recording of the event being recorded from a plurality of distinct locations. In an example embodiment, the plurality of media contents may be received from the plurality of media sources. For instance, the event may be a live concert and a plurality of users (from the audience) may be capturing the live concert (one of the plurality of media contents) on a respective media capturing device thereof. An example of capturing of the plurality of media contents by the plurality of media sources, and receipt of the plurality of media content by the apparatus 200 is explained in detail with reference to FIGURE 3.

[0087] Also, a spatial information and media source data associated with the plurality of media sources, may be received at the apparatus. In an example embodiment, the spatial information associated with the media source of the plurality of media sources includes one or more of location information (for example, geographical co-ordinates and altitude), information on the physical dimensions of media source, media source orientation information and shooting direction information. In an example embodiment, the spatial Information may further include distance information associated with relative distances between one or more objects of the scene, and relative distance between the media source and the at least one another media source. In an example embodiment, the media source data associated with the media source of the plurality of media sources comprises media capture settings of the media source. In an example embodiment, the media source data may include electrical and optical specifications associated with the media source. For example, the media source data may include refractive index of the lenses, image sensor specifications, etc. associated with the media source. Various examples illustrating the receipt of the spatial information and media source data at the apparatus are explained with reference to FIGURE 2.

[0088] At 604, selection of at least one target portion in a media content of the plurality of media contents is facilitated. In an example embodiment, the media content may be received from a media source of the plurality of media sources. In an example embodiment, the selection of the at least one media content may be performed based on a user input at, for example a user interface of the apparatus. For example, the user may provide an input such as selection of the at least one target portion on a UI by movement of a point device around the at least one target portion. Various examples of selection of the at least one target portion in the media content is explained in detail with reference to FIGURE 3.

[0089] At 606, a position and geometrical representation of a corresponding target portion in at least one another media content of the plurality of media contents may be determined. In an example embodiment, the at least one another media content may be captured by at least one another media source. In an example embodiment, the position, and the geometrical representation of the corresponding target portion may be determined based on the target portion of the (1st) media source and the spatial information, media content and media source data associated with the at least one another media source. In an example embodiment, for determining the position, geometrical representation of the corresponding target portion, an angle of view of the media source with respect to the at least one target portion may be determined. In an example embodiment, the angle of view of the media source may be determined based on the target portion, and spatial information, media content and media source data associated with the media source. Further, a corresponding angle of view of the at least one another media source with respect to the at least one target portion may be determined. In an example embodiment, the corresponding angle of view of the at least one another media source may be determined based on angle of view of the media source with respect to the at least one target portion.

100901 In an example embodiment, the corresponding target portion may be cropped from the at least one another media content based on the position and geometrical representation of the corresponding target portion. In an example embodiment, the cropped at least one media content may be displayed at the UI, for example, the UI 206 of the apparatus 200. In an example embodiment, the at least one target portion and the corresponding target portion may facilitate in generating an edited media content associated with the scene. In an example embodiment, the processing of the plurality of media contents being captured by a plurality of media sources may be performed in various scenarios. Various example scenarios that may facilitate in editing/processing of media content in multi-camera setup are described with reference to FIGURES 2-5.

[0091] FIGURE 7 is a flowchart depicting example method 700 for processing of multi-camera media content, in accordance with an example embodiment. In an example embodiment, the method 700 includes facilitating access of a plurality of media contents at an apparatus, for example the apparatus 200 (FIGURE 2) and performing editing of the plurality of media contents for improving the quality of edited media content. The method 700 depicted in the flow chart may be executed by, for example, the apparatus 200 of FIGURE 2.

[0092] At 702, the method 700 includes facilitating receipt of a plurality of media contents associated with the scene from the plurality of media sources. In an example embodiment, the plurality of media contents associated with a scene may be received from a plurality of media sources. In an example embodiment, the plurality of media contents may include at least one of video content, audio content, directional audio content, animation, one or more images with background audio content, depth information and time synchronization information associated with the scene. An example of capturing of the plurality of media contents by the plurality of media sources, and receipt of the plurality of media contents by the apparatus 200 is explained in detail with reference to FIGURE 3.

100931 At 704, a spatial information and media source data associated with the plurality of media sources, may be received. In an example embodiment, the spatial information and the media source data may be received at an apparatus for example the apparatus 200. In an example embodiment, the spatial information associated with the media source of the plurality of media sources includes one or more of location and altitude information, information on the media source physical dimensions, media source orientation information, shooting direction information, distance information associated with relative distances between one or more objects of the scene and relative distance between the media source and the at least one another media source. In an example embodiment, the media source data associated with the media source of the plurality of media sources comprises media capture settings of the media source, and electrical and optical specifications associated with the media source. Various examples illustrating the receipt of the spatial information and media source data at the apparatus are explained with reference to FIGURE 2.

[0094] At 706, selection of at least one target portion in a media content of the plurality of media contents is facilitated. In an example embodiment, the media content may be received from a media source of the plurality of media sources. In an example embodiment, the plurality of media contents may be received, and selection of the target portion in one of the plurality of media contents may be facilitated. In some example embodiments, the selection of the target portion may be facilitated in multiple media contents from the plurality of media contents. In an example embodiment, the selection of the at least one media content may be performed based on a user input at, for example a user interface of the apparatus. For example, the user may provide an input such as selection of the at least one target portion on a UI by movement of a point device around the at least one target portion. Various examples of selection of the at least one target portion in the media content is explained in detail with reference to FIGURE 3.

[0095] At 708, position and the geometrical representation of a corresponding target portion in at least one another media content of the plurality of media contents may be determined. In an example embodiment, the at least one another media content may be captured by at least one another media source. In an example embodiment, the position and geometrical representation of the corresponding target portion may be determined based on the spatial information, media content and media source data associated with the media source and at least one another media source. A method for determining position and the geometrical representation of a corresponding target portion in at least one another media content is described below with reference to 710-714.

[0096] At 710, an angle of view of the media source with respect to the at least one target portion may be determined. In an example embodiment, the angle of view may be determined based on selected target portion, and the spatial information and media source data associated with the media source. In addition, the angle of view may be determined based on the depth information associated with the selected target portion. Further, a corresponding angle of view of the at least one another media source with respect to the at least one target portion may be determined at 712. In an example embodiment, the corresponding angle of view of the at least one another media source may be determined based on angle of view of the media source with respect to the at least one target portion, the spatial information, the media content and the media source information associated with the media source and the at least one another media source. At 714, the position and geometrical representation of the corresponding target portion may be determined based on the corresponding angle of view. An example describing the angle of view of the media source with respect to the at least one target portion is described further with reference to FIGURE 4.

[0097] At 716, the corresponding target portion from the at least one another media content may be cropped based on the determination of the position and geometrical representation of the corresponding target portion in the at least one another media content. In an example embodiment, cropping the corresponding target portion from the at least one another media content may include cropping a sub-frame having the corresponding target portion from the at least one another media content. In an example embodiment, the apparatus 200 may further be caused to perform image stabilization when the sub-frames are cropped from original frames of the at least one another media content. In an example embodiment, the corresponding target portion being cropped from the at least one another media content may be displayed.

[0098] In an example embodiment, the at least one target portion and the corresponding target portion may facilitate in processing of the media content associated with the scene. The processing of the plurality of media contents being captured by a plurality of media sources may be performed in various scenarios. In an example scenario, such as during a live performance by a music band, the live performance may be recorded by a plurality of media sources (for example, multiple cameras), located at different positions at the event location. The media content (for example, video footages) from multiple cameras may be processed so as to generate a processed/edited media content associated with the event, where the edited media content may be a high quality media content.

[0099] It should be noted that to facilitate discussions of the flowcharts of FIGURES 67, certain operations are described herein as constituting distinct steps performed in a certain order. Such implementations are examples only and non-limiting in scope. Certain operation may be grouped together and performed in a single operation, and certain operations can be performed in an order that differs from the order employed in the examples set forth herein. Moreover, certain operations of the methods 600 to 700 are performed in an automated fashion. These operations involve substantially no interaction with the user. Other operations of the methods 600 to 700 may be performed by in a manual fashion or semi-automatic fashion. These operations involve interaction with the user via one or more user interface presentations.

[001001 The methods 600 to 700 depicted in these flow charts may be executed by, for example, the apparatus 200 of FIGURE 2. Operations of the flowchart, and combinations of operation in the flowcharts, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described in various embodiments may be embodied by computer program instructions. In an example embodiment, the computer program instructions, which embody the procedures, described in various embodiments may be stored by at least one memory device of an apparatus and executed by at least one processor in the apparatus. Any such computer program instructions may be loaded onto a computer or other programmable apparatus (for example, hardware) to produce a machine, such that the resulting computer or other programmable apparatus embody means for implementing the operations specified in the flowchart. These computer program instructions may also be stored in a computer-readable storage memory (as opposed to a transmission medium such as a carrier wave or electromagnetic signal) that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the operations specified in the flowchart. The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a non-transitory computer-implemented process such that the instructions, which execute on the computer or other programmable apparatus provide operations for implementing the operations in the flowchart. The operations of the methods are described with help of apparatus 200. However, the operations of the methods can be described and/or practiced by using any other apparatus.

[00101] Without in any way limiting the scope, interpretation, or application of the claims appearing below, a technical effect of one or more of the example embodiments disclosed herein is to process multiple-capture media content at an apparatus, for example, the apparatus 200. Various embodiments provide methods for processing multiple-capture media content such that the processed media content may be of a better quality as compared to the quality of the individual media content of the plurality of media contents. The method facilitates in precluding a detailed guidance to be provided to camera-men holding the plurality of media sources regarding capturing of media content. The cameraman may film the scene with a wide angle capture device, so that the filmed media content may later be processed for generating good quality media content. In an example embodiment, the target portions/objects in a scene may be cropped, and the unwanted visual content may be removed from the footage of the multi-camera shooting, thereby ensuring balanced image compositions and fluent, continuous transitions from one video shot to another in editing. In various example embodiments, the boundaries of the target portion on one of the media content may be selected, and the apparatus facilitates in selecting boundaries of the corresponding target portions in other media contents being received from other media sources. The boundaries of the target portions in other media contents may be selected based on exact coordinates of target objects on the scene and exact definition of the boundaries of the target portions in the media content, and thus the detection of the target portion in other media content is not dependent of object or face recognition techniques which provides inaccurate results. Various embodiments provide receiving directional audio content associated with the media content, and using the directional audio content for enhancing the audio content of the processed media content. Also, the embodiments provides use of light field camera data and depth maps associated with the plurality of media contents, and thereafter refocusing the target portions so as to enhance the quality of the processed media content.

[00102] Various embodiments described above may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on at least one memory, at least one processor, an apparatus or, a computer program product. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "computer-readable medium" may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer, with one example of an apparatus described and depicted in FIGURES 1 and/or 2. A computer-readable medium may comprise a computer-readable storage medium that may be any media or means that can contain or store the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

[00103] If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined.

1001041 Although various aspects of the embodiments are set out in the independent claims, other aspects comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

[001051 It is also noted herein that while the above describes example embodiments of the invention, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present disclosure as defined in the appended claims.

Claims

CLAIMSWe Claim: 1. A method comprising: receiving, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; facilitating selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and determining, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information, and the media source data of the media source and the at least one another media source.
2. The method as claimed in claim 1, wherein the spatial information associated with the media source of the plurality of media sources comprises one or more of a location information comprising geographical co-ordinates and altitude information associated with the media source, a physical dimension information associated with the media source, a media source orientation information, a shooting direction information associated with the media source, and a distance information associated with the media source, the distance information comprising relative distances between one or more objects of the scene, relative distances between the media source and the at least one another media source, and relative distances of the one or more objects of the scene with the plurality of media sources.
3. The method as claimed in claims 1 or 2, wherein the spatial information of the media source from among the plurality of media sources being determined based on at least one of satellite navigation system and indoor positioning system.
4. The method as claimed in claims 1 or 2, wherein the media source data associated with the media source of the plurality of media sources comprises media capture settings of the media source and electrical and optical specifications associated with the media source.
5. The method as claimed in any of claims 1 to 4, wherein the media content of the plurality of media contents comprises at least one of video content, audio content, directional audio content, one or more images with or without background audio content, image depth information and time synchronization information associated with the scene.
6. The method as claimed in any of claims 1 to 5, wherein the selection of the at least one target portion being performed based on a user input, the user input comprising selecting a boundary enclosing the at least one target portion or a point touching the at least one target portion.
7. The method as claimed in any of claims 1 to 6, further comprising: determining a reference space point in the media content based on the selection; determining the position and geometrical representation of the at least one target portion of the media source with respect to the reference space point based at least on the spatial information of the media source; and determining an angle of view of the media source with respect to the position and geometrical representation of the at least one target portion based at least on the spatial information and the media source data associated with the media source.
8. The method as claimed in claim 7, wherein determining the position and the geometrical representation of the corresponding target portion comprises: determining a corresponding reference space point and a corresponding angle of view of the at least one another media source with respect to the at least one target portion based on the angle of view of the media source with respect to the at least one target portion, the spatial information, the media content and the media source data associated with the media source and the at least one another media source; and determining the position and the geometrical representation of the corresponding target portion in the at least one another media content based on the corresponding angle of view of the at least one another media source with respect to the at least one target portion.
9. The method as claimed in claim 8, wherein determining the position and the geometrical representation of the corresponding target portion comprises determining a sub-frame enclosing the corresponding target portion.
10. The method as claimed in any of claims 1 to 9, further comprising cropping the corresponding target portion from the at least one another media content based on the determination of the position and the geometrical representation of the corresponding target portion in the at least one another media content.
11. The method as claimed in claim 10, wherein the at least one target portion and the corresponding target portion facilitates in generating an edited media content associated with the scene.
12. The method as claimed in claim 1, further comprising: defining a 3D (3-dimensional) trajectory of the at least one target portion in the media content, the at least one target portion being in motion in the scene; and determining a corresponding 3D trajectory of the at least one corresponding target portion in the at least one another media content based on the position and the geometrical representation of the corresponding target portion in the at least one another media content.
13. The method as claimed in claims 1 to 12, wherein the media source comprises a light field camera, and the media content being captured by the light field camera.
14. The method as claimed in claim 13, further comprising displaying, on the selection of the at least one target portion in the media content, the at least one target portion and the at least one corresponding target portion to be in focus.
15. An apparatus comprising: at least one processor; and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to at least perform: receive, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; facilitate selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and determine, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information, and the media source data of the media source and the at least one another media source.
16. The apparatus as claimed in claim 15, wherein the spatial information associated with the media source of the plurality of media sources comprises one or more of a location information, comprising geographical co-ordinates and altitude information associated with the media source, a physical dimension information associated with the media source, a media source orientation information, a shooting direction information associated with the media source, and a distance information associated with the media source, the distance information comprising relative distances between one or more objects of the scene, relative distances between the media source and the at least one another media source, and relative distances of the one or more objects of the scene with the plurality of media sources.
17. The apparatus as claimed in claims 15 or 16, wherein the spatial information of the media source from among the plurality of media sources being determined based on at least one of satellite navigation system and indoor positioning system.
18. The apparatus as claimed in claims 15 or 16, wherein the media source data associated with the media source of the plurality of media sources comprises media capture settings of the media source and electrical and optical specifications associated with the media source.
19. The apparatus as claimed in any of claims 15 to 18, wherein the media content of the plurality of media contents comprises at least one of video content, audio content, directional audio content, one or more images with or without background audio content, image depth information and time synchronization information associated with the scene.
20. The apparatus as claimed in any of claims 15 to 19, wherein the selection of the at least one target portion being performed based on a user input, the user input comprising selecting a boundary enclosing the at least one target portion or a point touching the at least one target portion.
21. The apparatus as claimed in any of claims 15 to 20, wherein the apparatus is further caused, at least in part to: determine a reference space point in the media content based on the selection; determine the position and geometrical representation of the at least one target portion of the media source with respect to the reference space point based at least on the spatial information of the media source; and determine an angle of view of the media source with respect to the position and geometrical representation of the at least one target portion based at least on the spatial information and the media source data associated with the media source.
22. The apparatus as claimed in claim 21, wherein for determining the position and the geometrical representation of the corresponding target portion, the apparatus is further caused, at least in part to: determine a corresponding reference space point and a corresponding angle of view of the at least one another media source with respect to the at least one target portion based on the angle of view of the media source with respect to the at least one target portion, the spatial information, the media content and the media source data associated with the media source and the at least one another media source; and determine the position and the geometrical representation of the corresponding target portion in the at least one another media content based on the corresponding angle of view of the at least one another media source with respect to the at least one target portion.
23. The apparatus as claimed in claim 22, wherein for determining the position and the geometrical representation of the corresponding target portion, the apparatus is further caused, at least in part to determine a sub-frame enclosing the corresponding target portion.
24. The apparatus as claimed in any of claims 15 to 23, wherein the apparatus is further caused, at least in part to crop the corresponding target portion from the at least one another media content based on the determination of the position and the geometrical representation of the corresponding target portion in the at least one another media content.
25. The apparatus as claimed in claim 24, wherein the at least one target portion and the corresponding target portion facilitates in generating an edited media content associated with the scene.
26. The apparatus as claimed in claim 15, wherein the apparatus is further caused, at least in part to: define a 3D (3-dimensional) trajectory of the at least one target portion in the media content, the at least one target portion being in motion in the scene; and determine a corresponding 3D trajectory of the at least one corresponding target portion in the at least one another media content based on the position and the geometrical representation of the corresponding target portion in the at least one another media content.
27. The apparatus as claimed in claims 15 to 26, wherein the media source comprises a light field camera, and the media content being captured by the light field camera.
28. The apparatus as claimed in claims 15 to 27, wherein the apparatus comprises an electronic device comprising: a user interface circuitry and user interface software configured to facilitate a user to control at least one function of the electronic device through use of a display and further configured to respond to user inputs; and a display circuitry configured to display at least a portion of a user interface of the electronic device, the display and display circuitry configured to facilitate the user to control at least one function of the electronic device.
29. The apparatus as claimed in claim 28, wherein the user interface and the user interface software are configured to facilitate the selection of the at least one target portion based on the user input.
30. The apparatus as claimed in claim 28, wherein the display and the display circuit are configured to display one or more of the spatial information, media source data associated with the plurality of media sources, the at least one target portion, the at least one corresponding target portion, the angle of view of the media source, and the corresponding angle of view of the at least one another media source.
31. The apparatus as claimed in claims 28, 29 or 30, wherein the display and the display circuit are configured to display a maximum angle of view associated with the media source, the maximum angle of view being a maximum angular extent of the scene being captured by the media source.
32. The apparatus as claimed in claims 27, 28, 29, or 30, wherein on the selection of the at least one target portion, the display and the display circuit are configured to display the angle of view of the view of the media source and the corresponding angle of view of the at least one another media source with respect to the at least one target portion.
33. The apparatus as claimed in claims 28, 29, 30, 31 or 32, wherein the display and the display circuit are configured to display the corresponding target portion being cropped from the at least one another media content.
34. The apparatus as claimed in any of claims 28 to 33, wherein the display and the display circuit are configured to display, on the selection of the at least one target portion in the media content, the at least one target portion and the at least one corresponding target portion to be in focus.
35. The apparatus as claimed in any of claims 28 to 34, wherein the electronic device comprises a mobile phone.
36. A computer program product comprising at least one computer-readable storage medium, the computer-readable storage medium comprising a set of instructions, which, when executed by one or more processors, cause an apparatus to at least perform: receive, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; facilitate selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and determine, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information and the media source data of the media source and the at least one another media source.
37. The computer program as claimed in claim 36, wherein the spatial information associated with the media source of the plurality of media sources comprises one or more of a location information, comprising geographical co-ordinates and altitude information associated with the media source, a physical dimension information associated with the media source, a media source orientation information, a shooting direction information associated with the media source, and a distance information associated with the media source, the distance information comprising relative distances between one or more objects of the scene, relative distances between the media source and the at least one another media source, and relative distances of the one or more objects of the scene with the plurality of media sources.
38. The computer program product as claimed in claims 36 or 37, wherein the spatial information of the media source from among the plurality of media sources being determined based on at least one of satellite navigation system and indoor positioning system.
39. The computer program product as claimed in claims 36 or 37, wherein the media source data associated with the media source of the plurality of media sources comprises media capture settings of the media source and electrical and optical specifications associated with the media source.
40. The computer program product as claimed in any of claims 36 to 39, wherein the media content of the plurality of media contents comprises at least one of video content, audio content, directional audio content, one or more images with or without background audio content, image depth information and time synchronization information associated with the scene.
41. The computer program product as claimed in any of claims 36 to 40, wherein the selection of the at least one target portion being performed based on a user input, the user input comprising selecting a boundary enclosing the at least one target portion or a point touching the at least one target portion.
42. The computer program product as claimed in any of claims 36 to 41, wherein the apparatus is further caused, at least in part to: determine a reference space point in the media content based on the selection; determine the position and geometrical representation of the at least one target portion of the media source with respect to the reference space point based at least on the spatial information of the media source; and determine an angle of view of the media source with respect to the position and geometrical representation of the at least one target portion based at least on the spatial information and the media source data associated with the media source.
43. The computer program product as claimed in claim 42, wherein for determining the position and the geometrical representation of the corresponding target portion, the apparatus is further caused, at least in part to: determine a corresponding reference space point and a corresponding angle of view of the at least one another media source with respect to the at least one target portion based on the angle of view of the media source with respect to the at least one target portion, the spatial information, the media content and the media source data associated with the media source and the at least one another media source; and determine the position and the geometrical representation of the corresponding target portion in the at least one another media content based on the corresponding angle of view of the at least one another media source with respect to the at least one target portion.
44. The computer program product as claimed in claim 43, wherein determining the position and the geometrical representation of the corresponding target portion comprises determining a sub-frame enclosing the corresponding target portion.
45. The computer program product as claimed in any of claims 36 to 44, wherein the apparatus is further caused, at least in part to crop the corresponding target portion from the at least one another media content based on the determination of the position and the geometrical representation of the corresponding target portion in the at least one another media content.
46. The computer program product as claimed in claim 45, wherein the at least one target portion and the corresponding target portion facilitates in generating an edited media content associated with the scene.
47. The computer program product as claimed in claim 36, wherein the apparatus is further caused, at least in part to: define a 3D (3-dimensional) trajectory of the at least one target portion in the media content, the at least one target portion being in motion in the scene; and determine a corresponding 3D trajectory of the at least one corresponding target portion in the at least one another media content based on the position and the geometrical representation of the corresponding target portion in the at least one another media content.
48. The computer program product as claimed in claims 36 to 47, wherein the media source comprises a light field camera, and the media content being captured by the light field camera.
49. The computer program product as claimed in claim 48, wherein the apparatus is further caused, at least in part to display, on the selection of the at least one target portion in the media content, the at least one target portion and the at least one corresponding target portion to be in focus.
50. An apparatus comprising: means for receiving, from a plurality of media sources a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; means for facilitating selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and means for determining, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information, and the media source data of the media source and the at least one another media source.
51. The apparatus as claimed in claim 50, wherein the spatial information associated with the media source of the plurality of media sources comprises one or more of a location information, comprising geographical co-ordinates and altitude information associated with the media source, a physical dimension information associated with the media source, a media source orientation information, a shooting direction information associated with the media source, and a distance information associated with the media source, the distance information comprising relative distances between one or more objects of the scene, relative distances between the media source and the at least one another media source, and relative distances of the one or more objects of the scene with the plurality of media sources.
52. The apparatus as claimed in claims 50 or 51, wherein the spatial information of the media source from among the plurality of media sources being determined based on at least one of satellite navigation system and indoor positioning system.
53. The apparatus as claimed in claims 50 or 51, wherein the media source data associated with the media source of the plurality of media sources comprises media capture settings of the media source and electrical and optical specifications associated with the media source.
54. The apparatus as claimed in any of claims 50 to 53, wherein the media content of the plurality of media contents comprises at least one of video content. audio content, directional audio content, one or more images with or without background audio content, image depth information and time synchronization information associated with the scene.
55. The apparatus as claimed in any of claims 50 to 54, wherein the selection of the at least one target portion being performed based on a user input, the user input comprising selecting a boundary enclosing the at least one target portion or a point touching the at least one target portion.
56. The apparatus as claimed in any of claims 50 to 55, further comprising: means for determining a reference space point in the media content based on the selection; means for determining the position and geometrical representation of the at least one target portion of the media source with respect to the reference space point based at least on the spatial information of the media source; and means for determining an angle of view of the media source with respect to the position and geometrical representation of the at least one target portion based at least on the spatial information and the media source data associated with the media source.
57. The apparatus as claimed in claim 56, wherein means for determining the position and the geometrical representation of the corresponding target portion comprises: means for determining a corresponding reference space point and a corresponding angle of view of the at least one another media source with respect to the at least one target portion based on the angle of view of the media source with respect to the at least one target portion, the spatial information, the media content and the media source data associated with the media source and the at least one another media source; and means for determining the position and the geometrical representation of the corresponding target portion in the at least one another media content based on the corresponding angle of view of the at least one another media source with respect to the at least one target portion.
58. The apparatus as claimed in claim 57, wherein means for determining the position and the geometrical representation of the corresponding target portion comprises means for determining a sub-frame enclosing the corresponding target portion.
59. The apparatus as claimed in any of claims 50 to 58, wherein further comprising means for cropping the corresponding target portion from the at least one another media content based on the determination of the position and the geometrical representation of the corresponding target portion in the at least one another media content.
60. The apparatus as claimed in claim 59, wherein the at least one target portion and the corresponding target portion facilitates in generating an edited media content associated with the scene.
61. The apparatus as claimed in claim 50, further comprising: means for defining a 3D (3-dimensional) trajectory of the at least one target portion in the media content, the at least one target portion being in motion in the scene; and means for determining a corresponding 3D trajectory of the at least one corresponding target portion in the at least one another media content based on the position and the geometrical representation of the corresponding target portion in the at least one another media content.
62. The apparatus as claimed in claims 50 to 61, wherein the media source comprises a light field camera, and the media content being captured by the light field camera.
63. The apparatus as claimed in claim 62, further comprising means for displaying, on the selection of the at least one target portion in the media content, the at least one target portion and the at least one corresponding target portion to be in focus.
64. A computer program comprising program instructions which when executed by an apparatus, cause the apparatus to: receive, from a plurality of media sources, a plurality of media contents associated with a scene, spatial information and media source data associated with the plurality of media sources; facilitate selection of at least one target portion in a media content of the plurality of media contents, the media content being received from a media source of the plurality of media sources; and determine, in at least one another media content of the plurality of media contents, position and geometrical representation of a corresponding target portion, the at least one another media content being captured by at least one another media source of the plurality of media sources, the position and the geometrical representation of the corresponding target portion being determined based on the selection of the at least one target portion in the media content, the spatial information and the media source data of the media source and the at least one another media source.
65. An apparatus substantially as hereinbefore described with reference to accompanying drawings.
66. A method substantially as hereinbefore described with reference to accompanying drawings.