WO2019043288A1

WO2019043288A1 - A method, device and a system for enhanced field of view

Info

Publication number: WO2019043288A1
Application number: PCT/FI2018/050586
Authority: WO
Inventors: Payman Aflaki Beni
Original assignee: Nokia Technologies Oy
Priority date: 2017-08-30
Filing date: 2018-08-17
Publication date: 2019-03-07
Also published as: GB201713843D0

Abstract

The invention relates to enhanced viewing of a scene. Central pixel values of pixels in a center area of a picture are formed corresponding to a middle area of a scene in a current viewing direction. The middle area of the scene is mapped to the pixels of the center area of the picture using a first magnification. Fringe pixel values of pixels in a fringe area of a picture are formed corresponding to an edge of the scene in the viewing direction. The edge of the scene is mapped to the fringe area of said picture using a second magnification, wherein the second magnification is smaller than said first magnification.

Description

A method, device and a system for enhanced field of view

Background

Head-mounted displays (HMDs) make it possible to view new kinds of content. For example, three-dimensional (stereoscopic) viewing may be achieved by presenting images to the left and right eye with an appropriate disparity. As another example, the user may be shown a part of a 360-degree image, corresponding to the current viewing direction. However, the field of view (FOV) of current head-mounted displays is less than that of an actual field of view of a human.

There is a difference in the user awareness of the scene and in the ability to react to events in the scene between the user having a head-mounted display and viewing a scene with their bare eyes. Consequently, the capability of users to experience a real life smooth perception of the surrounding is diminished. This is clearly noticeable in current head-mounted displays by which only a part of the normally visible area is visible and the rest of the full field of view is black. There is, therefore, a need for a solution for improving the viewing capability of head- mounted displays.

Summary Now there has been invented an improved method and technical equipment implementing the method, by which the above problems are alleviated. Various aspects of the invention include a method, an apparatus, a system and a computer readable medium comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various embodiments of the invention are disclosed in the dependent claims.

The physical field of view of the head-mounted device may be limited, e.g. to be smaller than the physically possible field of view of the human visual system. For example, the field of view of the head-mounted device may be limited to approximately 150 degrees. The values of the extreme pixels at the left, right, bottom and/or top, that is, the fringe pixels or the fringe area of the picture being shown on the head-mounted display may be assigned to correspond to the scene being presented so that they present more content (larger FOV) than usual. In other words, the magnification in the fringe area may be smaller, e.g. a fraction of the magnification at the center area of the picture, and thereby a fringe pixel may represent a larger area of the scene in horizontal and/or vertical direction than a center pixel. The physical size of the fringe pixels may remain the same, that is, they may be part of a contiquous display with homogeneous pixel size. The color value of a fringe pixel may be formed by downsampling from image data that is outside the normally viewable area, or by otherwise mapping such edge of the wider field of view content to the pixels in the fringe area of the picture being shown to the user.

The pixel values (i.e. image data) of the fringe area of the display may be formed e.g. by downsampling the content in horizontal or vertical direction or both. In addition or in the process of forming the pixel values, emphasis may be added to objects in the fringe area, e.g. moving objects, potential events of interest, areas with more details, objects closer to the camera and/or objects or areas with higher audio coming from that direction. Emphasis, in other words bolding, may be achieved by increasing the visibility of an area or an object e.g by altering the color, enlarging or adding temporal dynamics. According to a first aspect there is provided a method, comprising forming central pixel values of pixels in a center area of a picture, the central pixel values corresponding to a middle area of a scene in a current viewing direction, the middle area of the scene being mapped to the pixels of the center area of the picture using a first magnification, and forming fringe pixel values of pixels in a fringe area of a picture, the fringe pixel values corresponding to an edge of the scene in the viewing direction, the edge of the scene being mapped to the fringe area of the picture using a second magnification, wherein the second magnification is smaller than the first magnification. According to a second aspect there is provided an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to form central pixel values of pixels in a center area of a picture, the central pixel values corresponding to a middle area of a scene in a current viewing direction, the middle area of the scene being mapped to the pixels of the center area of the picture using a first magnification, and to form fringe pixel values of pixels in a fringe area of a picture, the fringe pixel values corresponding to an edge of the scene in the viewing direction, the edge of the scene being mapped to the fringe area of the picture using a second magnification, wherein the second magnification is smaller than the first magnification. According to a third aspect there is provided a system comprising a camera, the camera being one of the group of a panoramic camera, a wide-angle stereo camera and an omnidirectional camera, a viewer comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the system to capture a scene picture of a scene using the camera, receive the scene picture at the viewer, form central pixel values of pixels in a center area of a picture using the received scene picture, the central pixel values corresponding to a middle area of the scene in a current viewing direction, the middle area of the scene being mapped to the pixels of the center area of the picture using a first magnification, form fringe pixel values of pixels in a fringe area of a picture using the received scene picture, the fringe pixel values corresponding to an edge of the scene in the viewing direction, the edge of the scene being mapped to the fringe area of the picture using a second magnification, wherein the second magnification is smaller than the first magnification, and display the picture using the viewer.

According to a fourth aspect there is provided a computer program product embodied on a non-transitory computer readable medium, comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to form central pixel values of pixels in a center area of a picture, the central pixel values corresponding to a middle area of a scene in a current viewing direction, the middle area of the scene being mapped to the pixels of the center area of the picture using a first magnification, and to form fringe pixel values of pixels in a fringe area of a picture, the fringe pixel values corresponding to an edge of the scene in the viewing direction, the edge of the scene being mapped to the fringe area of the picture using a second magnification, wherein the second magnification is smaller than the first magnification.

The second magnification may be smaller than the first magnification in a horizontal direction, in a vertical direction, or both. The second magnification may gradually diminish across the fringe area of the picture. The fringe pixel values may be formed of pixels in a sub-area of the fringe area of the picture by mapping the edge of the scene to the sub-area using a third magnification, wherein the third magnification is smaller than the second magnification. In the forming of the fringe pixel values, an object may be visually emphasized in the edge area of the scene, wherein the emphasizing is done based on one or more of the group of movement of the object, a predefined event related to the object, spatially high-frequency texture of the object, and closeness to camera of the object. A scene picture may be received from a panoramic camera, a wide-angle stereo camera or an omnidirectional camera, such as an omnidirectional stereo camera, and the central pixel values and the fringe pixel values may be formed using the received scene picture. In the forming of the fringe pixel values, an area of a scene picture may be downsampled to obtain the fringe pixel values, wherein the downsampling decreases magnification of the area of the scene picture. The central pixel values and the fringe pixel values may be computed using a three-dimensional computer model of a scene. A system or an apparatus may comprising a head-mounted display, and the picture may be displayed using the head-mounted display. The fringe pixel values may be formed for one of left and right pictures of a stereo picture pair, and different fringe pixel values may be formed for the other of left and right pictures of the stereo picture pair. The picture may be a left or right picture of a stereo picture pair, and the stereo picture pair may be displayed on a head-mounted display using the central pixel values and the fringe pixel values.

Description of the Drawings

In the following, various embodiments of the invention will be described in more detail with reference to the appended drawings, in which

Figs. 1 a, 1 b, 1 c and 1 d

show a setup for forming a stereo image to a user;

Figs. 2a, 2b, 2c, 2d, 2e and 2f

show a system and apparatuses (e.g. stereo camera and head- mounted display) for stereo viewing and illustrate the viewing process;

Figs. 3a, 3b and 3c

illustrate the principle of enhanced field of view for head-mounted displays;

Figs. 4a, 4b, 4c, 4d and 4e illustrate various ways and details in forming the enhanced field of view by using the fringe area of a picture; and

Figs. 5a and 5b

show flow charts for rendering images with enhanced field of view.

Description of Example Embodiments In the following, several embodiments of the invention will be described in the context of an example 3D video viewing implementation. It is to be noted, however, that the invention is not limited to any specific example. In fact, the different embodiments have applications in any environment where fast rendering of video with an enhanced region is required.

Figs. 1 a, 1 b, 1 c and 1d show a setup for forming a stereo image to a user, for example a video frame of a 3D video. In Fig. 1 a, a situation is shown where a human being is viewing two spheres A1 and A2 using both eyes E1 and E2. The sphere A1 is closer to the viewer than the sphere A2, the respective distances to the first eye E1 being I_EI ,AI and I_EI ,A2. The different objects reside in space at their respective (x,y,z) coordinates, defined by the coordinate system SZ, SY and SZ. The distance d i2 between the eyes of a human being may be approximately 62-64 mm on average, and varying from person to person between 55 and 74 mm. This distance is referred to as the parallax, on which stereoscopic view of the human vision is based on. The viewing directions (optical axes) DIR1 and DIR2 are typically essentially parallel, possibly having a small deviation from being parallel, and define the field of view for the eyes. The head of the user has an orientation (head orientation) in relation to the surroundings, most easily defined by the common direction of the eyes when the eyes are looking straight ahead. That is, the head orientation tells the yaw, pitch and roll of the head in respect of a coordinate system of the scene where the user is.

When the viewer's body (thorax) is not moving, the viewer's head orientation is restricted by the normal anatomical ranges of movement of the cervical spine.

In the setup of Fig. 1 a, the spheres A1 and A2 are in the field of view of both eyes. The center-point Ο12 between the eyes and the spheres are on the same line. That is, from the center-point, the sphere A2 is behind the sphere A1 . However, each eye sees part of sphere A2 from behind A1 , because the spheres are not on the same line of view from either of the eyes. In Fig. 1 b, there is a setup shown, where the eyes have been replaced by cameras C1 and C2, positioned at the location where the eyes were in Fig. 1 a. The distances and directions of the setup are otherwise the same. Naturally, the purpose of the setup of Fig. 1 b is to be able to take a stereo image of the spheres A1 and A2. The two images resulting from image capture are Fci and Fc2. The "left eye" image Fci shows the image SA2 of the sphere A2 partly visible on the left side of the image SAI of the sphere A1 . The "right eye" image Fc2 shows the image SA2 of the sphere A2 partly visible on the right side of the image SAI of the sphere A1 . This difference between the right and left images is called disparity, and this disparity, being the basic mechanism with which the HVS determines depth information and creates a 3D view of the scene, can be used to create an illusion of a 3D image.

In this setup of Fig. 1 b, where the inter-eye distances correspond to those of the eyes in Fig. 1 a, the camera pair C1 and C2 has a natural parallax, that is, it has the property of creating natural disparity in the two images of the cameras. Natural disparity may be understood to be created even though the distance between the two cameras forming the stereo camera pair is somewhat smaller or larger than the normal distance (parallax) between the human eyes, e.g. essentially between 40 mm and 100 mm or even 30 mm and 120 mm. In Fig. 1 c, the creating of this 3D illusion is shown. The images Fci and Fc2 captured by the cameras C1 and C2 are displayed to the eyes E1 and E2, using displays D1 and D2, respectively. The disparity between the images is processed by the HVS so that an understanding of depth is created. That is, when the left eye sees the image SA2 of the sphere A2 on the left side of the image SAI of sphere A1 , and respectively the right eye sees the image of A2 on the right side, the HVS creates an understanding that there is a sphere V2 behind the sphere V1 in a three- dimensional world. Here, it needs to be understood that the images Fci and Fc2 can also be synthetic, that is, created by a computer. If they carry the disparity information, synthetic images will also be seen as three-dimensional by the HVS. That is, a pair of computer-generated images can be formed so that they can be used as a stereo image. Fig. 1 d illustrates how the principle of displaying stereo images to the eyes can be used to create 3D movies or virtual reality scenes having an illusion of being three- dimensional. The images Fxi and Fx2 are either captured with a stereo camera or computed from a model so that the images have the appropriate disparity. By displaying a large number (e.g. 30) frames per second to both eyes using display D1 and D2 so that the images between the left and the right eye have disparity, the HVS will create a cognition of a moving, three-dimensional image. When the camera is turned, or the direction of view with which the synthetic images are computed is changed, the change in the images creates an illusion that the direction of view is changing, that is, the viewer's head is rotating. This direction of view, that is, the head orientation, may be determined as a real orientation of the head e.g. by an orientation detector mounted on the head, or as a virtual orientation determined by a control device such as a joystick or mouse that can be used to manipulate the direction of view without the user actually moving his head. That is, the term "head orientation" may be used to refer to the actual, physical orientation of the user's head and changes in the same, or it may be used to refer to the virtual direction of the user's view that is determined by a computer program or a computer input device. It has been noticed here that displaying high resolution stereoscopic (i.e. 3D) 360- degree video content may be problematic. There may be a number of reasons for this. For example, many devices only support so-called 4K video decoding (4360 pixels in the horizontal direction). However, displaying stereoscopic 360-degree video requires two 4K streams to be decoded. This 4K content may not be sufficient to render 360-degree video with perceivably good quality, because only 1 K (1080 pixels in the horizontal direction) would be displayed per eye on virtual reality head- mounted displays (VR HMDs), assuming a 90-degree field of view (FOV). Current generation of the HMDs already support more than 1 K in the visible field of view and the future HMDs will support an even higher resolution.

A commonly available video codec such as the H.264 codec may be used on many relevant devices. However, many commonly available video codecs do not provide sufficiently high quality for 3D video in 360 degrees. This new problem may be alleviated as presented in this description.

A video signal can be divided into sub-areas, for example tiles, so that each sub- area is a separate video stream. These sub-areas may be called enhancement regions, and their purpose may be understood to improve the image quality in the viewing direction user is looking at. That is, the orientation of a head-mounted display may determine the view direction, and this may be used to render the appropriate field-of-view area to the display. In addition to the enhancement video signal and sub-areas, a base layer video signal (base region) may be used. A purpose of the base layer may be understood to guarantee that regardless of the current viewing direction, some video content can be rendered, albeit possibly at a lower quality. The rendering of such view-dependent delivery may operate so that a full base layer is first rendered to the viewport (the current field of view) and each enhancement tile is then rendered on top of the base layer. The rendering order may be back-to- front (painter algorithm), because enhancement tiles can be partially transparent, one reason being fading enhancements in gradually. For transparent rendering, the underlying content is therefore usually rendered first.

One problem identified here with the current view dependent rendering is that in some cases the screen (or screen buffer / rendering buffer) is filled with the base layer video data and then the screen is also fully or partially filled with the enhancement layer video data. Therefore, the screen or buffer may be filled twice or nearly twice even though the base layer is not visible, or only a part of the base layer is visible. This has been detected here to cause unnecessary memory fetches and unnecessary processing by the graphics processing unit (GPU). As an example, such rendering may be used with 3D video content captured with the Nokia OZO camera. The 3D video content may contain two or more video streams, for example coded with the H.264 video codec. There may be a base layer video that covers the full 360 degree 3D video (e.g. with the resolution of 1080p). There may be a number of enhancement layers (tiles) (e.g. with the resolution of 4K). The active, displayed tiles (regions) are changed based on the view direction. The current view dependent rendering may render the full base layer to the eye buffer, and then render each tile on top of the base layer. This may lead to high memory bandwidth and processing load as the eye buffers are filled twice (once with base layer, once with enhancement tiles).

In this description, an improved way of rendering such content is described. Figs. 2a, 2b, 2c, 2d, 2e and 2f show a system and apparatuses (e.g. stereo camera and head-mounted display) for stereo viewing and illustrate the viewing process. Fig. 2a shows a system and apparatuses for stereo viewing, that is, for 3D video and 3D audio digital capture and playback. The task of the capturing system is that of capturing sufficient visual and auditory information from a specific location such that a convincing reproduction of the experience, or presence, of being in that location can be achieved by one or more viewers physically located in different locations and optionally at a time later in the future. Such reproduction requires more information than can be captured by a single camera or microphone, in order that a viewer can determine the distance and location of objects within the scene using their eyes and their ears. As explained in the context of Figs. 1 a to 1 d, to create a pair of images with disparity, two camera sources are used. In a similar manned, for the human auditory system to be able to sense the direction of sound, at least two microphones may be used (the commonly known stereo sound is created by recording two audio channels). The human auditory system can detect the cues e.g. in timing difference of the audio signals to detect the direction of sound.

The system of Fig. 2a may consist of three main parts: image sources, a server and a rendering device. A video capture device SRC1 comprises multiple (for example, 8) cameras CAM1 , CAM2, CAMN with overlapping field of view so that regions of the view around the video capture device are captured from at least two cameras. The device SRC1 may comprise multiple microphones to capture the timing and phase differences of audio originating from different directions. The device may comprise a high resolution orientation sensor so that the orientation (direction of view) of the plurality of cameras can be detected and recorded. The device SRC1 comprises or is functionally connected to a computer processor PROC1 and memory MEM1 , the memory comprising computer program PROGR1 code for controlling the capture device. The image stream captured by the device may be stored on a memory device MEM2 for use in another device, e.g. a viewer, and/or transmitted to a server using a communication interface COMM1 .

Alternatively or in addition to the video capture device SRC1 creating an image stream, or a plurality of such, one or more sources SRC2 of synthetic images may be present in the system. Such sources of synthetic images may use a computer model of a virtual world to compute the various image streams it transmits. For example, the source SRC2 may compute N video streams corresponding to N virtual cameras located at a virtual viewing position. When such a synthetic set of video streams is used for viewing, the viewer may see a three-dimensional virtual world, as explained earlier for Fig. 1 d. The device SRC2 comprises or is functionally connected to a computer processor PROC2 and memory MEM2, the memory comprising computer program PROGR2 code for controlling the synthetic source device SRC2. The image stream captured by the device may be stored on a memory device MEM5 (e.g. memory card CARD1 ) for use in another device, e.g. a viewer, or transmitted to a server or the viewer using a communication interface COMM2.

There may be a storage, processing and data stream serving network in addition to the capture device SRC1 . For example, there may be a server SERV or a plurality of servers storing the output from the capture device SRC1 or computation device SRC2. The device comprises or is functionally connected to a computer processor PROC3 and memory MEM3, the memory comprising computer program PROGR3 code for controlling the server. The server may be connected by a wired or wireless network connection, or both, to sources SRC1 and/or SRC2, as well as the viewer devices VIEWER1 and VIEWER2 over the communication interface COMM3.

For viewing the captured or created video content, there may be one or more viewer devices VIEWER1 and VIEWER2. These devices may have a rendering module and a display module, or these functionalities may be combined in a single device. The devices may comprise or be functionally connected to a computer processor PROC4 and memory MEM4, the memory comprising computer program PROGR4 code for controlling the viewing devices. The viewer (playback) devices may consist of a data stream receiver for receiving a video data stream from a server and for decoding the video data stream. The data stream may be received over a network connection through communications interface COMM4, or from a memory device MEM6 like a memory card CARD2. The viewer devices may have a graphics processing unit for processing of the data to a suitable format for viewing as described with Figs. 1 c and 1 d. The viewer VIEWER1 comprises a high-resolution stereo-image HMD for viewing the rendered stereo video sequence. The head- mounted device may have an orientation sensor DET1 and stereo audio headphones. The viewer VIEWER2 comprises a display enabled with 3D technology (for displaying stereo video), and the rendering device may have a head- orientation detector DET2 connected to it. Any of the devices (SRC1 , SRC2, SERVER, RENDERER, VIEWER1 , VIEWER2) may be a computer or a portable computing device, or be connected to such. Such rendering devices may have computer program code for carrying out methods according to various examples described in this text.

Fig. 2b shows a camera device for stereo viewing. The camera comprises three or more cameras that are configured into camera pairs for creating the left and right eye images, or that can be arranged to such pairs. The distance between cameras may correspond to the usual distance between the human eyes. The cameras may be arranged so that they have significant overlap in their field-of-view. For example, wide-angle lenses of 180 degrees or more may be used, and there may be 3, 4, 5, 6, 7, 8, 9, 10, 12, 16 or 20 cameras. The cameras may be regularly or irregularly spaced across the whole sphere of view, or they may cover only part of the whole sphere. For example, there may be three cameras arranged in a triangle and having a different directions of view towards one side of the triangle such that all three cameras cover an overlap area in the middle of the directions of view. As another example, 8 cameras having wide-angle lenses and arranged regularly at the corners of a virtual cube and covering the whole sphere such that the whole or essentially whole sphere is covered at all directions by at least 3 or 4 cameras. In Fig. 2b, three stereo camera pairs are shown. Camera devices with other types of camera layouts may be used. A multi-camera device or system may generally be understood as any system/device including more than two cameras for capturing the surrounding area at the same time. For example, a camera device with all the cameras in one hemisphere may be used. The number of cameras may be e.g. 3, 4, 6, 8, 12, or more. The cameras may be placed to create a central field of view where stereo images can be formed from image data of two or more cameras, and a peripheral (extreme) field of view where one camera covers the scene and only a normal non-stereo image can be formed.

Fig. 2c shows a HMD for stereo viewing. The HMD contains two screen sections or two screens DISP1 and DISP2 for displaying the left and right eye images. The displays are close to the eyes, and therefore lenses are used to make the images easily viewable and for spreading the images to cover as much as possible of the eyes' field of view. The device is attached to the head of the user so that it stays in place even when the user turns his head. The device may have an orientation detecting module ORDET1 for determining the head movements and direction of the head. It is to be noted here that in this type of a device, tracking the head movement may be done, but since the displays cover a large area of the field of view, eye movement detection may not be necessary. Alternatively, the head movement and eye movement may both be done. The head orientation may be related to real, physical orientation of the user's head, and it may be tracked by a sensor for determining the real orientation of the user's head. Alternatively or in addition, head orientation may be related to virtual orientation of the user's view direction, controlled by a computer program or by a computer input device such as a joystick. That is, the user may be able to change the determined head orientation with an input device, or a computer program may change the view direction (e.g. in gaming, the game program may control the determined head orientation instead or in addition to the real head orientation. In other words, with head-mounted displays (HMDs), the user can select the viewing angle by moving their head rather than being limited to one viewing angle, as is currently the case with is experienced in conventional 3D display arrangements.

Fig. 2d illustrates a camera CAM1 . The camera has a camera detector CAMDET1 , comprising a plurality of sensor elements for sensing intensity of the light hitting the sensor element. The camera has a lens OBJ1 (or a lens arrangement of a plurality of lenses), the lens being positioned so that the light hitting the sensor elements travels through the lens to the sensor elements. The camera detector CAMDET1 has a nominal center point CP1 that is a middle point of the plurality sensor elements, for example for a rectangular sensor the crossing point of the diagonals. The lens has a nominal center point PP1 , as well, lying for example on the axis of symmetry of the lens. The direction of orientation of the camera is defined by the line passing through the center point CP1 of the camera sensor and the center point PP1 of the lens. The direction of the camera is a vector along this line pointing in the direction from the camera sensor to the lens. The optical axis of the camera is understood to be this line CP1 -PP1 .

Fig. 2e illustrates the field of view of the human visual system. The left and right eye have a natural capability of forming an image of a certain field of view RFOVL, RFOLR of the real world. Due to anatomical constraints, the field of view of the left eye may extend further to the left than the field of view of the right eye. Similarly, the field of view of the right eye may extend further to the right than the field of view of the left eye. Due to this difference, the combined field of view RFOV there may be an area where an object OBJ1 is seen by both eyes. Furthermore, there may be an area on the left where an object OBJ2 is seen only by the left eye, and an area on the right where an object OBJ3 is seen only by the right eye. The common viewing direction VIEW_DIR may be understood to be the orientation of the nose, or more accurately the central normal of the line connecting the two eyes. In a normal real- world situation, the viewing direction VIEW_DIR is in the middle of the field of view RFOV. In the vertical direction, the field of view may be determined by the anatomy of the eyes and restricted by the forehead and cheeks. As is understood from the above, the real-world field of view RFOV may be defined as a range of the scene that can be seen without moving the head. That is, the field of view may be understood as the extent of the observable world that is seen at any given moment. In case of optical instruments or sensors, the field of view may be understood as a solid angle through which a detector is sensitive to electromagnetic radiation.

Fig. 2f illustrates the field of view of a head-mounted display for stereoscopic viewing. As described earlier, the HMD comprises a display for the left eye DISP1 and a display for the right eye DISP2. The field of view, that is, the extent of the viewable scene is determined by the size, distance from the eye and optical arrangements of the displays DISP1 , DISP2. The field of view HFOVL of the left eye with the head-mounted display may be smaller than the real-world field of view RFOVL, indicated by the dashed line on the left. Similarly, the field of view HFOVR of the right eye with the head-mounted display may be smaller than the real-world field of view RFOVR, indicated by the dashed line on the right.

Consequently, an object OBJ1 in the center area of the scene may be seen by using a head-mounted display, but some objects may fall out of the field of view although they would be viewable in the real world. For example, a tree OBJ2 on the left may not be seen by the user of the head-mounted display, thereby not being able to orient the viewing direction to the object OBJ2 for better viewing of the details. Furthermore, a scary wolf OBJ3 attacking from the right may be on the right side of the field of view of the head-mounted display. Consequently, the user is not able to see the movement of this object. As can be understood, such limitations may make it difficult for the user to interact with the head-mounted display and the underlying viewing system. As an example, a head-mounted display may be capable of providing a 1 10° field of view HFOV, while a human real field of view RFOV may be 230°. There may be a significant difference between user awareness considering their surrounding with a HMD compared to their natural daily experience. In the following, methods to make the users aware of what is happening around them and inform them to consider turning to that direction from the current viewing direction in an improved manner are explained.

In the following a method may be described with reference to a picture, wherein the picture is a left or right picture of a stereo picture pair, and the method comprises displaying the stereo picture pair on a head-mounted display using central pixel values and fringe pixel values for enhanced field of view. It needs to be understood, though, that the picture may be formed for later viewing, for example by preprocessing for later viewing, and the picture may be transmitted or stored for such purpose.

As described earlier, 3D perception can be achieved by providing each eye with a slightly different view. These two views can be reference views, i.e. the views which have been transmitted or can be the output of some rendering algorithm applied to the reference views. The views which are presented as left/right views differ slightly as they are presenting the similar content/scene from a different point of view. Hence, there are some parts of the scene which are visible in one view and they are not visible in another view. There are also some parts that are not visible in either view.

A multi camera system or device can be understood as a system or device including more than two cameras capturing the surrounding area simultaneously. Such a camera may have a wide field of view, or it may be omnidirectional, that is, capable of producing a view covering all directions. Some following examples may be given with the help of such a system, but the invention is not limited to any specific image source or camera.

Figs. 3a, 3b and 3c illustrate the principle of enhanced field of view for head- mounted displays. In Fig. 3a, the left stereo-pair image (left and right eye images) illustrates a conventional stereoscopic image to be shown to the user on the displays DISP1 , DISP2. In Fig. 3a, the right stereo-pair image shows a stereoscopic presentation more similar to the human eye, that is, with a wider field of view as illustrated in Fig. 2e. As explained earlier, the edge areas SEA1 , SEA2 of the scene are visible only to one eye and do not contribute to the stereoscopic perception of the scene. Furthermore, these parts SEA1 , SEA2 are visible in the normal daily perception experience but are not visible in conventional HMDs (that is, the display is not extending to areas SEA1 , SEA2). This results in less awareness about the surrounding for the users wearing the HMD comparing to the users who are actually present in the scene. This may result in lower quality of experience achieved by the users. This may also make it more difficult for the user to interact with the viewer, e.g. to orient the viewing direction appropriately.

In Fig. 3b, a stereo pair before and after the process for enhancing the field of view is shown. In such a method, central pixel values of pixels in a center area CA1 of a picture may be formed. Such central pixel values correspond to a middle area of a scene in a current viewing direction, and the pixel values are obtained by mapping this middle area of the scene to the pixels of the center area of the picture using a first magnification. This first magnification may be the usual magnification with which a scene is shown to the user. Furthermore, fringe pixel values of pixels in a fringe area FA1 , FA2 of a picture may be formed. Some or all of the fringe pixel values may correspond to an edge SEA1 , SEA2 of the scene in the viewing direction. This edge of the scene may be mapped to the fringe area FA1 , FA2 (correspondingly) of the picture using a second magnification. This second magnification may be smaller than the first magnification. That is, the fringe area FA1 , FA2 may come to represent a wider (and/or taller) field of view of the scene than it would with the first magnification.

In the present description a magnification related to a pixel or an area of a picture may be understood as relating to the width and/or height of a part of a scene that is mapped to the pixel or the area of the picture. When the magnification is large, the part of the scene mapped to the pixel or the area is small, and consequently, the part of the scene appears large in the picture. When the magnification is small, the part of the scene mapped to the pixel or the area is large, and consequently, the part of the scene appears small in the picture. An area of a picture having a smaller magnification than another area having the same size is able to capture a larger part of the scene. In this manner, the magnification in the present description can be seen as analogous to zooming in or out using a lens of a camera, the lens having an adjustable focal length (and magnification).

In Fig. 3b, the leftmost and rightmost bold dashed lines represent the limits of a full presentation of the scene, that is, the full real field of view. The next dashed lines illustrate the limits of presentation in a conventional stereoscopic image or display DISP1 , DISP2. The inner dashed lines illustrate the limits of the original images which are kept intact. This can be e.g. 90% of the original image. The "hidden" scene areas SEA1 , SEA2, that is, the dark areas illustrate the scene parts that are to be added to the field of view and thus in some way shown to the user. The fringe areas FA1 , FA2 illustrate the modified parts presented in the right stereo-pair of Fig. 3b by processing the scene areas between the outer and inner dashed lines in the left stereo pair. Such processing may provide the user with a more detail in the extreme left, right, top and/or bottom parts of the image that would usually not be visible in the head-mounted display. This can be achieved by altering the magnification, e.g. by downsampling, and presenting the scene in those areas by mapping them onto the fringe areas FA1 , FA2.

Based on the perceived content in these areas, the users may decide to interact with the head-mounted display and change their head direction towards those areas, based on their personal preference. It should be noted that this presentation in the fringe FA1 , FA2 is not identical to the full presentation that targets detail perception of the content such as in the center area CA1 , but is rather to enable the users to figure out if something interesting is happening in their natural full field of view. Moreover, since these areas are mainly in the 2D perceived areas, users may naturally rotate their heads towards them if they see something interesting. The presentation of this fringe does not necessarily result in the user looking at the objects in the fringe. However it gives the awareness to the user that there may be something interesting happening in that side edge of the view and hence, motivating and enabling them to change their head directions (thus, the view direction of the HMD). If such head direction movement towards the interesting part of the scene happens, this interaction with the head-mounted display will then bring the interesting part of the scene (e.g. the wolf OBJ3) into the center area and the presentation may be done normally in 3D mode.

In the present description, various sources of image data for forming the pixel values are possible.

For example, a scene picture or a stereo pair of pictures may be received from a panoramic camera, a wide-angle stereo camera or an omnidirectional camera (e.g. an omnidirectional stereo camera). The central pixel values and the fringe pixel values may then be formed by using the received scene picture. The central pixel values may be formed as is known in the state of the art with a certain magnification, and the fringe pixel values may be formed by mapping a wider view to the fringe, that is, by using a smaller magnification.

Alternatively or in addition the central pixel values and the fringe pixel values may be computed using a three-dimensional computer model of a scene. Such forming of the pixel values may be achieved with the help of conventional computer graphics hardware.

Fig. 3c illustrates the process for enhancing the field of view in two dimensions (horizontal and vertical dimensions). That is, fringe area FA1 , FA2 may extend outwards from the central area CA1 in a horizontal direction (left or right) and/or a vertical direction (up or down), or both. That is, the magnification in horizontal or vertical direction or both may be smaller than the magnification in the center. A similar process may be applied to two-dimensional fringe areas as in the horizontal direction. For the rest of this document, for keeping the presentation simple, the presentation in 1 D is used in the examples while the examples are to be understood to be extendable to two dimensions. Moreover, only one view (picture) is shown in the rest of the presentation as the other view may be treated in the same way. There may be no dependency between the views, that is, the left and right pictures may be handled separately. That is, fringe pixel values FA1 for one of left and right pictures of a stereo picture pair may be formed, and different fringe pixel values FA2 for the other of left and right pictures of the stereo picture pair may be formed. The pictures may be processed for enhanced field of view as is described in the following.

The size and location adjustment of scene edge area to be captured and the fringe area FA1 , FA2 to be used may be done based on the following tuning parameters: - available content and the respective original field of view of the scene,

- field of view limit of the head mounted device, that is, the maximum FOV that can be displayed on the device, e.g. 1 10° or 125°,

- the target field of view, e.g. close to the human FOV e.g. 230°,

- relative alignment of the previous FOVs, i.e. the direction in which the extension of current FOV should be applied to reach the target FOV,

- amount of content that is required to remain intact (this can be aligned with the amount of FOV used for stereoscopic presented areas of the images) - predefined and/or adjusted preference of user and/or the content provider,

- the scene itself and the objects therein, and/or

- the amount of bitrate available to encode the content i.e. limitations applicable due to the transmission of the content.

Based on these, the parameters of the method by which the content is modified for enhanced field of view may be adjusted. Any combination of these aforementioned tuning parameters may contribute to selecting or adjusting the size and location of the rectangles and lines as presented in figures 3b and 3c, respectively.

The enhancement methods are used to create the fringe parts FA1 , FA2 in the right stereo-pair in Figures 3b and 3c. The enhancement may use two parts of the left stereo-pair image: 1 ) the scene edge area SEA1 , SEA2 between two outermost lines and 2) the (original) fringe area FA1 , FA2 between the two innermost lines. The original pixels or other image data for those areas are converted to occupy the fringe area FA1 , FA2.

The following processing may be applied to obtain the fringe pixel values:

- downsampling the content in one or two directions (1 D or 2D image downsampling, respectively),

- emphasizing (bolding) the moving objects (such as the wolf OBJ3) in the edge areas to the user,

- emphasizing (bolding) the potential events of interest in those areas to the user,

- emphasizing (bolding) the potential regions of interest in those areas to the user,

- emphasizing (bolding) the potential areas with higher colour contrast or luminance contrast compared to the adjacent parts of the scene to the user,

- emphasizing (bolding) the potential areas with more details, i.e. high frequency spatial components (HFCs), e.g. the tree OBJ2,

- emphasizing (bolding) the objects closer to the camera, and/or

- emphasizing (bolding) the potential areas with higher audio coming from that direction.

In the above, luminance values of the pixels of the fringe area of interest or object may be increased or decreased. This increasing or decreasing may be carried out so that the changed area or object appears brighter or darker than the surrounding. Alternatively or in addition, the chrominance values of the pixels of the fringe area of interest or object may be modified to make the changed area or object appear different in colour to stand out from the surrounding. Alternatively or in addition, the luminance and/or chrominance values may be modified such that the modification changes over time, for example to cause a blinking effect or an effect of "appearing".

In here, emphasizing and bolding may be understood as presenting the objects or areas with a higher visibility. Such modifications are described in the following (here only the left image is shown but similar methods can be used on the right image too as the views are independently analysed and processed)

Figs. 4a, 4b, 4c, 4d and 4e illustrate various ways and details in forming the enhanced field of view by using the fringe area of a picture.

Figure 4a illustrates a downsampling process. In the forming of the fringe pixel values FA1 , an area of a scene picture (an edge area) may be downsampled to obtain the fringe pixel values. Such downsampling decreases the magnification of the downsampled area of the scene picture. That is, a larger field of view can be fitted to the fringe FA1 . The downsampling of the image data may happen in one or two directions (1 D or 2D image downsampling, respectively). In this approach, for example, the original image data in the areas SEA1 and FA1 are downsampled to compress them to the area FA1 . Such downsampling may depend on the closeness of the pixels to the edge of the image or may be applied evenly on the pixels. The downsampling may be a simple reduction of the number of pixel values or may be a combination of applying a Low Pass Filter (LPF), together with the reduction of the number. Low-pass filtering a scene area removes high frequency spatial components (HFCs) while keeping the spatial resolution and general structure of the image untouched. This enables the compression of the same content with reduced number of bits since less detail (high frequency components) need to be encoded. In the case where videos are presented in polarized displays, a downsampling with ratio 1/2 along the vertical direction may be applied to the content. This may be done because the vertical spatial resolution of the display is divided between the left and right view and hence, each one has half the vertical resolution. In such cases, depending on the display and content, a large aliasing artifact may be introduced while perceiving the stereoscopic content. However, applying LPF reduces such aliasing artifact considerably since the high frequency components responsible for the creation of aliasing are removed in a pre-processing stage Downsampling or subsampling in signal processing reduces the sampling rate of a signal. This is usually done to reduce the data rate or the size of the data. Image downsampling is performed by selecting a specific number of pixels, based on the downsampling ratio, out of the total number of pixels in the original image. This will result in presenting the original image with a lower respective spatial resolution. Other than reducing the complexity and required bitrate to encode the content, downsampling is also used to represent larger images with less number of pixels. In this scenario, some quality degradation is expected while the amount of change in quality and the reduction introduced to the number of pixels are the trade-off points on the process.

In some downsamplings, a lowpass filter is first applied to the content to remove the HFCs and then a sub-sampling follows to reduce the number of pixels. This will result in a smoother downsampled content compared to the case where only a direct sub-sampling is applied.

The downsampling in Fig. 4a may be carried out so that even horizontal downsampling is applied to original image data of the areas SEA1 and FA1 . Alternatively, a weighted horizontal downsampling may be applied, where the area FA1 has a heavier weight compared to area SEA1 in accordance with the area FA1 being closer to the center of the image compared to the area SEA1 which is closer to the edge of the image. It may be expected that the content of the closer areas may be more of the interest of the user compared to the content of the farther areas from the center of the image. For example, the weighting may mean that area FA1 is downsampled with ratio ~½ while the area SEA1 is downsampled with ratio ~¼ to reach a similar number of pixels as in the target area FA1 . This may be achieved by utilizing a low-pass filter (LPF) with filter coefficients specifically designed to address this asymmetric downsampling, followed by the respective asymmetric sub- sampling. It should be noted that, it may be the case that the area FA1 is downsampled more heavily compared to the area SEAL This may depend on factors e.g. the application, the content, user/content provider preference, or other factors that are affecting the size and location of scene edge area to be captured and the fringe area FA1 . The downsampling may be a simple sub-sampling or a more sophisticated type of downsampling including linear/non-linear filtering and subsampling. Similar downsampling may be applied to 2D areas (2 directions at the same time). When the users turns his head towards the direction where the downsampling has been applied, the full version of the scene will be shown to the user. The transition between the fringe magnification and higher magnification of the center area happens in fractions of second and in a seamless way. A gradual upsampling approach may also be applied for the transition from downsampled presentation to the full resolution.

Fig. 4b illustrates a specific direction or an object in the edge area. In the forming of the fringe pixel values FA1 , an object OBJ2 may be visually emphasized in the edge area of the scene, wherein the emphasizing is done based on movement of the object, a predefined event related to the object, spatially high-frequency texture of the object, and/or closeness to camera of the object, or some other characteristic in which one specific direction is required to be covered more than other directions. Such a specific direction (object) is shown in the following figures with a tree shaped object OBJ2.

Fig. 4c illustrates emphasizing a specific direction or an object in the edge area. Change of magnification may be achieved by downsampling the image data in an un-even manner. The area including the specific direction to be bolded (tree OBJ2 in Fig. 4c) may be downsampled with a ratio of ½ while other areas (outside of the central area) may be downsampled more heavily to reach the target width of the fringe area. It should be noted that the downsampling ratio ½ is just an example compared to larger downsampling ratios. This is to clarify that the specific direction may be downsampled more lightly (with a larger downsampling ratio) compared to the rest of the area in order to preserve the object OBJ2 in that area better and to enable the user to perceive that specific direction (object FOBJ2 obtained by downsampling) with more precision compared to the rest of that area. The area including the specific direction with an object OBJ2 may be downsampled with a ratio of ½ while the rest of the scene edge areas may be downsampled with a heavier downsampling (smaller downsampling ratio e.g. ¼).

It should also be noted that, depending on the size of OBJ2 and its relation to the target width of the fringe area, it may be the case that by only downsampling the OBJ2 to with a specific ratio, the whole target width of the fringe area is covered. In this case, the rest of the areas may be removed completely and not shown at all to the user to have enough space (width) to show the OBJ2 with a higher number of pixels. Fig. 4d illustrates another un-even magnification, for example by downsampling, for forming the fringe. The magnification may diminish gradually across the fringe area of the picture. That is, the original image pixels or objects SP1 , SP2, SP3, SP4, SP5 may be e.g. evenly spaced in the scene. The magnification may be the larger the closer the pixel is to the edge. This may result in an un-even spacing of the processed pixels or objects FP1 , FP2, FP3, FP4, FP5 corresponding to the pixels or objects SP1 , SP2, SP3, SP4, SP5, respectively, where the more central objects FP1 , FP2 have a spacing close to the original whereas the edge objects FP4, FP5 are spaced closely together or even fused.

Fig. 4e illustrates using different magnifications in different edge areas of the scene for forming the fringe. The fringe pixel values of pixels in a sub-area of the fringe area FA4, FA6 of the picture may be obtained by mapping the edge of the scene to the sub-area using a special (third) magnification, wherein the third magnification is smaller (or larger) than the second magnification used in another area FA1 , FA3, FA5 of the fringe. In this manner, an object OBJ2 may be made visible or its visibility may be enhanced in the processed image (as the object FOBJ2). There may be multiple areas on the fringe, and areas may be next to each other horizontally and vertically. The different magnifications (e.g. downsampling ratios) may be adjusted such that e.g. the horizontal number of pixels in the fringe area is the same on each pixel row. This is illustrated in Fig. 4e by the different downsampling ratios 1/2, 1/4 and 1/8, as an example. Figs. 5a and 5b show flow charts for rendering images with enhanced field of view. In Fig. 5a, in phase 505, first magnification is selected to be used for the middle of the scene. In phase 510, the center pixels of the picture may be formed normally with this magnification through decoding or from a three-dimensional model, also comprising the center area of the picture. In phase 515, a second magnification smaller than the first magnification may be selected for the fringe area of the picture. In phase 520, all or some fringe pixels may be formed by using this smaller magnification, thereby capturing an enhanced field of view of the scene to this fringe. In phase 525, the center pixels and fringe pixels form the picture together. This may happen so that the values of the center and fringe pixels are formed directly to the picture, or so that they are first formed separately and then copied to the picture. In Fig. 5b, another flow chart for rendering images with enhanced field of view is shown. In phase 540, a scene picture may be captured using a camera, as described earlier. In phase 545, this scene picture may be selected or received for forming the pixel values of a picture to be displayed. For example, a certain area of a panoramic or omnidirectional camera scene picture may be selected e.g. based on the current viewing direction. Additionally or alternatively, in phase 550, a three- dimensional model may be used for forming the pixel values of the picture to be displayed. In phase 555, central pixel values may be formed from the middle of the scene, as described earlier. In phase 560, downsampling may be applied to the edge of the scene for forming pixel values. In phase 565, un-even magnification may be applied to a sub-area of the fringe (edge of the scene). In phase 570, gradually changing magnification may be applied to the fringe. In phase 575, an object or area of interest in the fringe may be emphasized. In phase 580, fringe pixels may be formed in the (other) vertical or horizontal direction, as well. In phase 585, fringe pixels for the other picture of a stereo picture pair may be formed. In phase 590, the single or stereo picture may be displayed to the user using a head-mounted display. In the context of Figs 5a and 5b, ways and details described earlier in this specification may be applied, and the different features and steps may be understood in light of the earlier specification.

This description may provide ways for enhancing visibility and awareness of the users to areas which are not currently visible on a head-mounted display. Those areas may be shown only in monoscopic presentation and are seen on the extreme borders of the image, the fringe areas FA1 , FA2. Therefore, the applied downsampling on the content may present little or no unpleasant quality experience to the user and on the other hand, may provide the users with a larger field of view enabling them to track the events of interest more easily. In this way, the user may not miss interesting parts of the image which is a drawback of current HMDs, and may thus be able to interact with the system better.

It also should be noted that the fringe areas may be further encoded/filtered as they are not subject of central perception from user point of view. These areas can be more severely encoded to decrease the required bitrate to present the data considering their visibility in extreme areas and also the fact that they may be perceived in 2D as opposed to the rest of the image which is visible in 3D. From panoramic content available to the HMDs, the required pixel values can readily be fetched from surrounding areas of the currently visible view.

The fringe area may be encoded with more skip decisions for its block in the encoder compared to other areas. This may result in a smaller bitrate used for the fringe area and may resemble an effect of a lower frames per second (FPS) effect for the fringe area. This may be applied to any of the aforementioned various ways in forming the enhanced field of view by using the fringe area of a picture. When the user notices an event of interest/or an object of interest and moves his/her head towards that area, the content presentation may switch to the normal full resolution presentation as the downsampled part of the view starts to move towards the center of the currently visible image. Alternatively, the content presentation may still be so that the edge parts are downsampled and the center part is with full resolution.

This description may allow to increase the awareness of the user of the surrounding without a requirement to change the hardware of the HMD, since only image processing is applied to the available content.

The various embodiments of the invention can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the invention. For example, a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment. Yet further, a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.

It should be noted that the various ways in forming the enhanced field of view by using the fringe area of a picture may be done automatically or manually. The automatic selection and adjustment of the fringe area refers to the image processing algorithms which work depending on the content of the scene. The manual selection and adjustment of the fringe area refers to the performing the processes by the user/content provider and based on human preference, selection and/or action. It is obvious that the present invention is not linnited solely to the above-presented embodiments, but it can be modified within the scope of the appended claims.

Claims

Claims:

1 . A method, comprising:

- forming central pixel values of pixels in a center area of a picture, said central pixel values corresponding to a middle area of a scene in a current viewing direction, said middle area of said scene being mapped to said pixels of said center area of said picture using a first magnification, and

- forming fringe pixel values of pixels in a fringe area of a picture, said fringe pixel values corresponding to an edge of said scene in said viewing direction, said edge of said scene being mapped to said fringe area of said picture using a second magnification, wherein said second magnification is smaller than said first magnification.

2. A method according to claim 1 , wherein said second magnification is smaller than said first magnification in a horizontal direction, in a vertical direction, or both.

3. A method according to claim 1 or 2, wherein said second magnification gradually diminishes across said fringe area of said picture.

4. A method according to any of the claims 1 , 2 or 3, comprising:

- forming said fringe pixel values of pixels in a sub-area of said fringe area of said picture by mapping said edge of said scene to said sub-area using a third magnification, wherein said third magnification is smaller than said second magnification.

5. A method according to any of the claims 1 to 4, comprising

- in said forming of said fringe pixel values, visually emphasizing an object in said edge area of said scene, wherein said emphasizing is done based on one or more of the group of movement of said object, a predefined event related to said object, spatially high-frequency texture of said object, and closeness to camera of said object.

6. A method according to any of the claims 1 to 5, comprising:

- receiving a scene picture from panoramic camera, wide-angle stereo camera or an omnidirectional camera, such as an omnidirectional stereo camera,

- forming said central pixel values and said fringe pixel values using said received scene picture.

7. A method according to any of the claims 1 to 6, comprising:

- in said forming of said fringe pixel values, downsampling an area of a scene picture to obtain said fringe pixel values, wherein said downsampling decreases magnification of said area of said scene picture.

8. A method according to any of the claims 1 to 7, comprising:

- computing said central pixel values and said fringe pixel values using a three- dimensional computer model of a scene.

9. A method according to any of the claims 1 to 8, comprising:

- forming said fringe pixel values for one of left and right pictures of a stereo picture pair, and

- forming different fringe pixel values for the other of left and right pictures of said stereo picture pair.

10. A method according to any of the claims 1 to 9, wherein said picture is a left or right picture of a stereo picture pair, said method comprising:

- displaying said stereo picture pair on a head-mounted display using said central pixel values and said fringe pixel values.

1 1 . An apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

- form central pixel values of pixels in a center area of a picture, said central pixel values corresponding to a middle area of a scene in a current viewing direction, said middle area of said scene being mapped to said pixels of said center area of said picture using a first magnification, and

- form fringe pixel values of pixels in a fringe area of a picture, said fringe pixel values corresponding to an edge of said scene in said viewing direction, said edge of said scene being mapped to said fringe area of said picture using a second magnification, wherein said second magnification is smaller than said first magnification.

12. An apparatus according to claim 1 1 , wherein said second magnification is smaller than said first magnification in a horizontal direction, in a vertical direction, or both.

13. An apparatus according to claim 1 1 or 12, wherein said apparatus is arranged to diminish said second magnification gradually across said fringe area of said picture.

14. An apparatus according to any of the claims 1 1 , 12 or 13, comprising computer program code to cause the apparatus to:

- form said fringe pixel values of pixels in a sub-area of said fringe area of said picture by mapping said edge of said scene to said sub-area using a third magnification, wherein said third magnification is smaller than said second magnification.

15. An apparatus according to any of the claims 1 1 to 14, comprising computer program code to cause the apparatus to:

- visually emphasize an object in said edge area of said scene in said forming of said fringe pixel values, wherein said apparatus is arranged to carry out said emphasizing based on one or more of the group of movement of said object, a predefined event related to said object, spatially high-frequency texture of said object, and closeness to camera of said object.

16. An apparatus according to any of the claims 1 1 to 15, comprising computer program code to cause the apparatus to:

- receive a scene picture from panoramic camera, wide-angle stereo camera or an omnidirectional camera, such as an omnidirectional stereo camera,

- form said central pixel values and said fringe pixel values using said received scene picture.

17. An apparatus according to any of the claims 1 1 to 16, comprising computer program code to cause the apparatus to:

- downsample an area of a scene picture to obtain said fringe pixel values in said forming of said fringe pixel values, wherein said downsampling decreases magnification of said area of said scene picture.

18. An apparatus according to any of the claims 1 1 to 17, comprising computer program code to cause the apparatus to:

- compute said central pixel values and said fringe pixel values using a three- dimensional computer model of a scene.

19. A method according to any of the claims 1 1 to 18, comprising computer program code to cause the apparatus to:

- form said fringe pixel values for one of left and right pictures of a stereo picture pair, and

- form different fringe pixel values for the other of left and right pictures of said stereo picture pair.

20. An apparatus according to any of the claims 1 1 to 19, wherein said picture is a left or right picture of a stereo picture pair, comprising a head-mounted display and computer program code to cause the apparatus to:

- display said stereo picture pair on said head-mounted display using said central pixel values and said fringe pixel values.

21 . An apparatus, comprising:

- means for forming central pixel values of pixels in a center area of a picture, said central pixel values corresponding to a middle area of a scene in a current viewing direction, said middle area of said scene being mapped to said pixels of said center area of said picture using a first magnification, and

- means for forming fringe pixel values of pixels in a fringe area of a picture, said fringe pixel values corresponding to an edge of said scene in said viewing direction, said edge of said scene being mapped to said fringe area of said picture using a second magnification, wherein said second magnification is smaller than said first magnification.

22. An apparatus according to claim 21 , comprising means to further cause the apparatus to carry out the method according to any of the claims 2 to 10.

23. An apparatus according to claim 21 or 22, comprising a head-mounted display.

24. A system comprising a camera, said camera being one of the group of a panoramic camera, a wide-angle stereo camera and an omnidirectional camera, a viewer comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the system to perform at least the following:

- capture a scene picture of a scene using said camera,

- receive said scene picture at said viewer, - form central pixel values of pixels in a center area of a picture using said received scene picture, said central pixel values corresponding to a middle area of said scene in a current viewing direction, said middle area of said scene being mapped to said pixels of said center area of said picture using a first magnification,

- form fringe pixel values of pixels in a fringe area of a picture using said received scene picture, said fringe pixel values corresponding to an edge of said scene in said viewing direction, said edge of said scene being mapped to said fringe area of said picture using a second magnification, wherein said second magnification is smaller than said first magnification,

- display said picture using said viewer.

25. A computer program product embodied on a non-transitory computer readable medium, comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to: