WO2019111027A1

WO2019111027A1 - Method for creating virtual or augmented reality and system for creating virtual or augmented reality

Info

Publication number: WO2019111027A1
Application number: PCT/HU2018/050052
Authority: WO
Inventors: Barnabás NÁNAY; Domonkos VARGA
Original assignee: Curious Lab Technologies Ltd.
Priority date: 2017-12-08
Filing date: 2018-12-07
Publication date: 2019-06-13
Also published as: HUP1700512A2

Abstract

The object of the invention relates to a method for creating of virtual or augmented reality of a space part (200) that contains at least one geometrical object, the essence of which is providing a digital camera (12) with internal parameters and fixed spatial position as compared to the space part (200) adapted for recording images of the space part (200), a display unit (14) in data connection with the digital camera (12), and a central computer unit (16) in data connection with the digital camera (12) and the display unit (14) adapted for sending, receiving and processing data, and by using the image stream produced by the digital camera (12), identifying at least one geometrical object 10 per image with neural network- based, preferably convolutional neural network-based computer vision software running on the central computer unit (16) and determining its 3-dimensional spatial position in the space part (200), and generating virtual image content depending on the 3-dimensional spatial position of the at least one geometrical object (10), which is then displayed with the display unit (14). The object of the invention also relates to a system (100) for creating virtual or augmented reality,

Description

Method for creating virtual or augmented reality and system for creating virtual or augmented reality

The object of the invention relates to a method for creating virtual or augmented reality of a space part that contains at least one geometrical object.

The object of the invention also relates to a system for creating virtual or augmented reality of a space part that contains at least one geometrical object.

As a consequence of the significant advances in information technology in recent years, those in connection with virtual and augmented reality have become the focus of technology-related developments. There are numerous possible areas of the use of virtual and augmented reality, including entertainment, education, medical science, and industry.

Virtual reality (VR) is a 3-dimensional artificial environment generated by computer in which the elements of reality are entirely excluded. The user may “walk around” the virtual environment and interact with its elements. VR in other words is a totality of visual and audio information created by computer having an effect on the senses (primarily vision and hearing) of the user, which is created as a result of the interaction between the user and the computer. At present the displaying of virtual reality mainly takes place with helmets (VR headsets) fitted to the head. The user uses buttons installed on the helmet or a controller held in the hand to control the computer connected to the helmet, interacting with the virtual environment in this way.

In the case of augmented reality (AR), contrary to the virtual reality outlined above, the objective is not to entirely exclude reality, but, instead, to extend the real physical environment with virtual image content generated by computer. Such virtual content may be, for example, 2- or 3-dimensional animation, text, etc. The real and the virtual environments appear in the augmented reality in real time, layered on one another. The displaying of the reality created by AR may take place in various ways, using various devices. The currently most widespread method is displaying AR on computer screen, or on the display of a smartphone, but, optionally, a head-worn device (e.g. Microsoft HoloLens) may also be used for this purpose. In the case of mobile device solutions so-called position and direction augmented reality is created, the purpose of which is supplementing an image displayed on the device’s display with new content. Such augmented reality may be created, for example, with an application that may be downloaded onto Android, or IOS devices, such as navigation software.

The existing devices suitable for creating augmented reality (such as Microsoft Kinect) use at least one image display device, two or more image recording devices, and at least one supplementary non-image recording input device required for establishing the interaction. The user may give commands to the device with the supplementary input device (such as infrared sensor, ultrasound sensor, gamepad, mouse, keyboard). However, these solutions entirely depend on the given, non-image recording input devices, without which the interaction cannot take place.

The virtual and augmented reality solutions presented above have various disadvantages, which limit the scope of use of the applications. The majority of the non-image recording devices commercially available limit the free movement of the user, because these usually have to be used while being held in the hands. Further obstructive factors may be the cable that links up the devices, as well as the detecting sensitivity and/or range of the device. In the case of the solutions known of at present 2- or 3-dimensional objects, so-called markers are arranged in advance onsite to facilitate the identification of objects, which, without being exhaustive, may be, for example, ARToolkit, ARTag, ARToolKitPlus, SCR Marker Data Matrix, QR code, PDF 417, Intersense IS-1200 Marker, Shotcode, or RealTIVison type markers. The publication entitled "Application of Augmented Reality and Robotic Technology in Broadcasting: A Survey" by Dingtian Yan and Huosheng Hu discusses the areas of application of marker-based augmented reality and their theoretical background. The makers are characterised by that their pattern or shape is known, and are positioned on objects in advance to facilitate easy computer identification. The tracking of the objects takes place with markers using the algorithm disclosed in section 3.2.2. The initial parameter of the disclosed algorithm is the marker’s known coordinate vector. The article entitled "A simplified nonlinear regression method for human height estimation in video surveillance" by Shengzhe Li, Van Huan Nguyen, Mingjie Ma, Cheng-Bin Jin, Trung Dung Do and Hakil Kim deals with a method of determining the height of people visible in a camera image with the real-time analysis of the images provided by cameras. The disclosed method initially presumes that a person can be seen in the image and that the person’s feet are touching the ground. However, the publication does not deal with the determination of 3-dimensional spatial position, because, according to the authors, a greater number of objects and a network of coordinates are required in order to determine this (see section 2). Accordingly, the distance of the object (a person in this case) from the camera (Z coordinate) is eliminated from the equations.

Due to the above disadvantages the present solutions are either not or only to a very limited extent suitable for use in community locations, such as shopping centres, museums, institutes of education, etc., or for group use.

The objective of the invention is to provide a method and system aimed at creating virtual or augmented reality that is free of the disadvantages of the solutions according to the state of the art, in other words that enable users to interact with digital content without the use of supplementary input devices or markers.

The objective of the invention is also to provide a method and system in the case of which the information required to generate the interactive digital content is determined using the data obtained by a single digital camera.

The invention is based on the recognition that by using the data of a single digital camera that has a known position as compared to a space part and known internal parameters and by using a neural network, preferably a convolutional neural network-based computer vision software, at least one geometrical object located in the space may be identified without the conventionally used markers, and its 3-dimensional location may be determined with good approximation, therefore virtual or augmented reality may be created in a simpler and more cost- effective manner as compared to the solutions according to the state of the art, even in places where there is no or only a limited possibility of using markers (e.g. community locations, the street, shopping centres, etc.).

The invention is also based on the recognition that the scope of use of the method and system according to the invention is much broader as compared to the solutions according to the state of the art with respect to both the location and the number of users as a result of the possibility of omitting the use of markers and input devices.

The invention is also based on the recognition that the effectiveness of the method and system according to the invention may be simply and cost-effectively increased in parallel with the development of computer vision algorithms as opposed to the solutions according to the state of the art, in the case of which the support and updating of each and every supplementary input device involve significant costs.

The task was solved in accordance with the invention with the method according to claim 1.

Preferably several different categories are determined on the basis of characteristics relating to the size and/or shape and/or colour of various geometrical objects, and the at least one geometrical object in the space part is(are) classed into the created categories on the basis of the size and/or shape and/or colour of the identified object image with neural network-based, preferably convolutional neural network-based computer vision software, and the virtual image content is created with consideration to the category of the at least one geometrical object.

Preferably the raw image stream produced by the digital camera is displayed before it is processed along with the currently generated virtual image content.

The task was also solved in accordance with the invention with the system according to claim 10.

Certain preferred embodiments of the invention are determined in the sub claims.

Further details of the invention will be explained by way of exemplary embodiments with reference to figures, wherein:

Figure 1 depicts a schematic view of an exemplary embodiment of a system serving for creating virtual or augmented reality according to the invention,

Figure 2 depicts a schematic flowchart presenting the main steps of the method according to the invention, Figure 3a depicts an embodiment of the method according to the invention presenting the creation of virtual reality.

Figure 3b depicts an embodiment of the method according to the invention presenting the creation of augmented reality.

Figure 1 illustrates a schematic view of a possible exemplary embodiment of the system 100 according to the invention. The system 100 serves for the creation of virtual or augmented reality of a space part 200 containing at least one geometrical object 10. The geometrical object 10 may be any object or living being located in the space part 200, including humans or even fictive characters. The system 100 according to the invention contains a digital camera 12 with known internal parameters and a fixed and known spatial position as compared to the space part 200 that is adapted for recording images of the space part 200, a display unit 14 in data connection with the digital camera 12, and also contains a central computer unit 16 in data connection with the digital camera 12 and the display unit 14 that is suitable for sending, receiving and processing data.

In the context of the present invention camera 12 is understood to mean a device capable of transforming moving images into digital form and transmitting them, and, optionally, recording them, and that is supplied with a lens system and digital imaging sensor. The images created by the camera 12 are, in fact, digital signals, which may be transmitted through the digital data connections mentioned above.

The internal parameters of the camera 12 that determine the imaging are known throughout the course of the method. In the case of a preferred embodiment the known internal parameters of the digital camera 12 are selected from a group containing the resolution and size of the digital camera’s 12 sensor, as well as the focal length and field of view of the camera 12. The values of the internal parameters may be determined by measurement, or on the basis of catalogue data, for example, in the knowledge of the type of the digital camera 12.

In the context of the present invention space part 200 is understood to mean a range of space with a flat, preferably horizontal floor plane 201 (floor surface), such as a room, hall, street part in front of a shop window, etc. The digital camera 12 is arranged in a fixed way with respect to the space part 200, in other words the camera 12 is arranged at a known and fixed distance d from the floor plane 201 , furthermore the viewing direction of the camera 12 is also fixed parallel to the floor plane 201 , for example. Naturally, optionally embodiments are conceivable in the case of which the floor plane 201 of the space part 200 and/or the viewing direction of the camera 12 are at a known angle to the horizontal. In the case of an especially preferred embodiment the camera 12 is established as a digital video camera capable of making known HD or FULL HD video recordings, especially preferably capable of making 2K or 4K video recordings, which is arranged on the top of the display unit 14, in its plane, in the way shown in figure 1 .

The display unit 14 in data connection with the camera 12 is preferably established as an LCD, LED, OLED, or plasma display, or as a projector, where the size of the display (diagonal) is preferably greater than 32 inches, more preferably greater than 55 inches, even more preferably greater than 100 inches.

The concept of central computer unit 16 is interpreted broadly in the present specification, and includes all hardware devices suitable for receiving, processing, and storing digital data, and for digitally transmitting the processed digital data. In the case of an especially preferred embodiment the central computer unit 16 is established as a personal computer containing background storage 16a serving for storing the data received from the digital camera 12, and the computer programs, and a central processing unit 16b (processor) suitable for processing the received data and running the computer programs. The computer unit 16, in addition to the conventional elements (e.g. direct access memory, network card, etc.), may optionally contain an additional one or more input devices (such as a keyboard, mouse, etc.), and may contain an interface serving as both an input and output device (such as a CD/DVD writer/reader, etc.), as is obvious for a person skilled in the art. The computer unit 16 is in a data connection with the digital camera 12 and the display unit 14, in other words the display unit 14 is able to receive the image data created by the camera 12 via the data connection. The data processed by the computer unit 16 may be transmitted to the display unit 14 via the data connection between the computer unit 16 and the display unit 14. It should be noted that the concept of data connection means both direct and indirect connections. An indirect data connection between the camera 12 and the display unit 14 may be created, for example, by interposing the computer unit 16. The data connection may be a wired connection or, optionally, a wireless connection (e.g. WiFi, Bluetooth, etc.), as is known to a person skilled in the art.

The central computer unit 16 according to the invention processes the image data received from the digital camera 12, in other words the consecutive images made of the space part 200. The computer unit 16 being configured suitable:

- for identifying an object image 10’ corresponding to the at least one geometrical object 10 in the images made by the digital camera 12,

- for determining the 3-dimensional spatial position of the at least one geometrical object 10 on the basis of the identified object image 10’, and

- for generating virtual image content 21 depending on the 3-dimensional spatial position of the at least one geometrical object 10.

Virtual image content 21 is understood to mean the digital information created by the computer unit 16 associated with at least one object image 10’ identified in the images made by the camera 12, which will be described in detail at a later stage. The virtual image content 21 may be 2-dimensional, or preferably 3-dimensional photorealistic animation, which is selected from a group containing, for example, animals, plants, fictive characters, known persons, and objects that may be worn. It should be noted that the image content produced by the camera 12, and the virtual image content 21 do not only have to contain visual data, these contents may also, optionally contain audio data (a sound track), therefore image data or content is also understood to mean video data containing both image and sound elements.

In the case of a preferred embodiment the system 100 contains a sound generating unit 18 controlled by the central computer unit 16, preferably loudspeakers, which serves for playing the sound track of the virtual image content 21 generated by the computer unit 16.

The object of the invention also relates to a method for creating virtual or augmented reality of a space part 200 containing at least one geometrical object 10. In the following the operation of the system 100 according to the invention along with the method according to the invention will be presented.

In the following the method according to the invention will be presented with reference to figure 2. In the course of the method a digital camera 12 with internal parameters suitable for creating images that is arranged in a spatially fixed way as compared to the space part 200, a display unit 14 in data connection with the digital camera 12, and a central computer unit 16 that is in data connection with the digital camera 12 and the display unit 14 and suitable for sending, receiving and processing data are provided. In the case of an especially preferred embodiment the method according to the invention is implemented with the system 100 according to the invention presented above. The known arrangement of the digital camera 12 as compared to the space part 200 is provided in such a way that the distance d of the camera 12 from the floor plane 201 of the space part 200 and the orientation of the digital camera 12, i.e. the direction which the camera 12 faces, are determined.

In the first step 301 of the method according to the invention images are made of the space part 200 containing the at least one geometrical object 10 using the digital camera 12, and a raw image stream is produced of the images. The distance d of the camera 12 from the floor plane 201 is fixed; therefore it has a view of the space part 200 from a given and constant perspective. The camera 12 makes an image of the space part 200 and of the at least one geometrical object 10 there at a determined frequency (e.g. every 1 /30^th or 1 /60^th of a second), which images together form a raw image stream comprising a moving picture. The geometrical object 10 appears in the individual images as object image 10’.

In the following step 302 of the method the raw image stream made with the camera 12 is transmitted to the central computer unit 16 via the data connection between the camera 12 and the computer unit 16, and the object image 10’ corresponding to the at least one geometrical object 10 in the images of the raw image stream is identified per image using neural network-based, preferably convolutional neural network-based computer vision software running on the central computer unit 16. In the context of the present invention identification means that the object image 10’ is differentiated from the other parts of the image, in other words the coordinates of the pixels belonging to the object image 10’ are determined, or the object image 10’ is delimited by a planar shape (such as rectangle, circle, etc.).

Neural network-based, preferably convolutional neural network-based computer vision software in the context of the present invention is understood to mean a computer program or program package that searches for characteristic areas (pixel ranges) in the images of the incoming image stream that can be isolated from the other parts of the image and the detection of which is less sensitive to changes in angle of view and illumination. The neural network-based, preferably convolutional neural network-based computer vision software may contain, for example, R-CNN, Fast R-CNN, HyperNet SP or Deep Lap algorithms, known of to a person skilled in the art. The selection of characteristic areas may take place with scale and rotation invariant so-called local feature detectors that tolerate changes to angle of view and illumination well (such as MSER, SIFT, SURF, ORB, etc.), or image segmenters and classifiers not using artificial intelligence (such as Trainable Weka Segmentation, kNN, MOG, MOG2, GMG).

In the case of an especially preferred embodiment several different categories (for example, ball, chair, person; within this small child, adult, etc.) are determined in advance according to the features of different geometrical objects 10 relating to size and/or shape and/or colour, and in the course of identifying the object image 10’ the at least one geometrical object 10 in the space part 200 is classed into one of the created categories with the neural network-based, preferably convolutional neural network-based computer vision software on the basis of the size and/or shape and/or colour of the object image 10’. For example, if the geometrical object 10 in the space part 200 is a person, then the computer vision software places the object image 10’ appearing in the images of the raw image stream produced by the camera 12 into the category of “person” on the basis of it shape and size.

In the third step 303 of the method by using the data relating to the size and/or orientation of the at least one object image 10’ identified per image, as well as the data relating to the internal parameters of the digital camera 12 and its spatial position, position data giving the 3-dimensional spatial position of the at least one geometrical object 10 in the space part 200 are determined.

In the case of a possible embodiment the geometrical object 10 is a person, the 3-dimensional spatial position of whom is preferably determined according to the following (method one). The camera 12 is secured at the known distance d from the floor plane 201 in such a way that the bottom of the field of view of the camera 12 is a horizontal straight line closer to the floor plane 201 , and its top is a horizontal straight line further away from the floor plane 201. After the neural network- based, preferably convolutional neural network-based computer vision software has determined that the geometrical object 10 is a person, the points of the corresponding identified object image 10’ closest to the bottom and the top of the field of view of the digital camera 12 are determined, then a straight line is fitted between the points determined in this way. As in the present case it may be presumed about the person forming the geometrical object 10 that he/she is standing on the floor plane 201 approximately vertically, it may be assumed that the point of the object image 10’ closest to the bottom of the field of view of the camera 12 marks the position of the geometrical object 10 on the floor plane 201 , in other words the point of the geometrical object 10 falling closest to the bottom of the field of view is viewed as a point falling on the floor plane 201 of the space part 200 (the person is in contact with the floor plane 201 ). In addition, the orientation of the geometrical object 10 is determined in such a way that it is presumed that the fitted straight line is perpendicular to the floor plane 201 (in other words the person is standing vertically on the floor plane 201 ). Using the above assumptions, and in the knowledge of the distance d of the camera 12 from the floor plane 201 , as well as the size of the field of view of the camera 12, especially the size of its vertical field of view, the distance of the geometrical object 10 from the camera 12, in other words its 3-dimensional spatial position is determined.

In the case of another possible embodiment (method two), data relating to the real physical dimensions of the geometrical object 10 are determined before the raw image stream is produced. Determining dimensions is understood to mean the measurement of the spatial extension of the geometrical object 10, such as of its diameter, or their determination in another way (e.g. on the basis of a catalogue or datasheet). In the case of the present embodiment the geometrical object 10 is preferably an object the planar projection of which is rotationally invariant (e.g. a spherical ball), but naturally, optionally, an embodiment is conceivable in the case of which the geometrical object is not rotationally invariant, instead it is an object of another shape that has a characteristic dimension, for example, its largest dimension (e.g. in the case of a specific person, that person’s height), which is known. Preferably a separate category is created for each geometrical object 10, which has had its dimension determined in advance, and the determined dimension of the geometrical object 10 is assigned to this category.

In the next step the data relating to the dimension of the object image 10’ are determined on the basis of the images made by the camera 12, which dimension may be, for example, the greatest dimension of the object image 10’ counted in pixels. The data relating to the real size of the geometrical object 10 and to the size of the object image 10’ are compared with each other, and the actual 3-dimensional spatial position of the geometrical object 10 is determined by using the result of this comparison.

In the case of a preferred embodiment the method for determining the 3- dimensional spatial position of a geometrical object 10 is selected depending on the category that the neural network-based, preferably convolutional neural network-based computer vision software running on the computer unit 16 has placed the geometrical object 10 in on the basis of its object image 10’. In other words, for example, in the case that the computer vision software identifies the geometrical object 10 as a person (classes it in the category of “human”), the spatial position of the geometrical object 10 is determined with the first method. If, however, the geometrical object is, for example, a ball the size of which has been determined in advance, then the neural network-based, preferably convolutional neural network-based computer vision software classes this geometrical object 10 into the category of “ball” created for it in advance, and containing data relating to the real size (e.g. diameter) of the ball, and the 3-dimensional, spatial position of the geometrical object (ball) will be determined according to the second method. It should be noted that the appropriate categorisation of geometrical objects 10 of known size may be optionally made easier by using a special marker placed on the geometrical object 10, such as a QR code. Optionally, the above methods may also be combined thereby making the method more robust.

In the fourth step 304 of the method according to the invention virtual image content dependent on the 3-dimensional spatial position of the at least one geometrical object 10, preferably 3-dimensional virtual image content is generated per image on the basis of the position data determined with the central computer unit 16, then in the fifth step 305 the virtual image content generated per image is displayed with the display unit 14. If several geometrical objects 10 are located in the space part 200 (such as a group of persons and/or several other objects), optionally separate virtual image content is generated for each geometrical object.

As explained previously, the virtual image content may be of several different types; such as an animal, famous person, object or even a complete scene, which, with the computer unit 16, is preferably generated automatically. Automatic generation may be realised, for example, in such a way that the virtual content is generated so as to correspond to the category of the geometrical object 10. For example, if the geometrical object is a small child, then a 3-dimensional animal figure is automatically generated as virtual content, but if the geometrical object is an adult, then a wild animal is generated, such as a lion. Naturally, optionally an embodiment is conceivable in the case of which the user him/herself selects the type of virtual content from among several possibilities offered in advance, for example by using an input device (such as keyboard) connected to the computer unit 16.

In the case of the embodiment shown in figure 3a, the display unit 14 only displays the virtual image content generated by the computer unit 16 and dependent on the spatial position of the geometrical object 10, in other words virtual reality is created. It should be noted that the virtual image content may also have parts the position of which does not change over time on the display unit 14 (such as the background).

Figure 3b shows an embodiment presenting augmented reality. In this case, in addition to the virtual image content dependent on the spatial position of the geometrical object 10, the object images 10’ of the one or more geometrical objects 10 are also displayed on the display unit 14. Through this the user, who is actually the geometrical object 10 in the space part 200, sees him/herself on the display unit 14 as the object image 10’. The virtual content is generated dependent on the spatial position of the geometric object 10, in other words for example, if the geometrical object 10 recedes from the camera 12, then the size of the object image 10’ will be increasingly smaller in the images made by the camera 12, therefore, the size of the virtual image content is proportionally reduced in sequential images in the interest of attaining a more realistic effect. The rate of the generation of the virtual image content, in other words the total time requirement for executing the steps presented above for each image (identification of object image 10’, determination of the data relating to the spatial position of the geometrical object 10, generating the virtual image content) is characteristically a few tens of milliseconds. This delay, in the case the geometrical object 10 moves quickly, may be disturbing because the object image 10’ appearing on the display unit 14 only follows it after a certain period of time. This delay may significantly damage the experience of the augmented reality created. Because of this, in the case of an especially preferred embodiment, the raw image stream produced by the digital camera 12 is split into two in the way shown in figure 1 , in other words, on the one part, it is forwarded to the central computer unit 16, and, on the other part, in a step 301 a following the first step 301 and before the second step 302, without any processing, essentially in real time it is displayed on the display unit 14 along with the virtual image content generated at that moment. To put it another way, the generated virtual image content and the raw image stream produced by the digital camera 12 are simultaneously displayed on the display unit 14 in such a way that the raw image stream is displayed essentially in real time, while the generated virtual image content is displayed at the rate it is generated. In this way the object image 10’ displayed on the display unit 14 follows the movements of the geometrical object 10 essentially without any delay. As a result of this the human brain senses the interaction as something natural and lifelike, and the impression that the user(s) obtain is that the virtual image content reacts immediately to changes of the one or more geometrical objects 10 (such as movements of the user(s), movements of object(s)) in the space part 200. The raw image stream and the virtual image content are displayed in a way so that they are layered on each other.

In the case of a preferred embodiment a sound generating unit 18 suitable for generating sound and in data connection with the central computer unit 16 is provided, and a digital audio signal dependent on the 3-dimensional spatial position of the at least one geometrical object 10 is generated by the central computer unit 16, then the generated digital audio signal is played with the sound generating unit 18. In this way special sound effects can be assigned to the movements of the geometrical object 10, such as when the geometrical object 10 touches the generated virtual image content.

Various modifications to the above disclosed embodiments will be apparent to a person skilled in the art without departing from the scope of protection determined by the attached claims.

Claims

1. Method of creating virtual or augmented reality of a space part comprising at least one geometrical object, characterised by

- providing

• a digital camera (12) adapted for recording images of the space part (200), having internal parameters and being arranged in a spatial position that is fixed with respect to the space part (200),

• a display unit (14) being in data connection with the digital camera (12), and

• a central computer unit (16) being in data connection with the digital camera (12) and the display unit (14) and being adapted for sending, receiving and processing data, and

- taking images of the space part (200) containing the at least one geometrical object (10) with the digital camera (12), and producing a raw image stream from the images,

- forwarding the raw image stream to the central computer unit (16), and identifying at least one object image (10’) per image corresponding to the at least one geometrical object (10) in the images of the raw image stream with neural network-based, preferably convolutional neural network-based computer vision software running on the central computer unit (16),

- determining position data giving the 3-dimensional spatial position of the at least one geometrical object (10) in the space part (200) for each image, using the data relating to the size and/or orientation of the at least one object image (10’) identified per image, as well as the data relating to the internal parameters of the digital camera (12) and its spatial position,

- based on the position data, generating virtual image content per image with the central computer unit (16), depending on the 3-dimensional spatial position of the at least one geometrical object (10),

- displaying the virtual image content generated per image with the display unit (14). 2. Method according to claim 1 , characterised by forwarding the raw image stream produced by the digital camera (12) to the display unit (14) in addition to forwarding it to the central computer unit (16), and displaying the raw image stream in real time on the display unit (14) together with the currently generated virtual image content.

3. Method according to claim 1 or 2, characterised by defining several different categories according to the size and/or shape and/or colour characteristics of various geometrical objects (10), and classifying the at least one geometrical object (10) in the space part (200) into one of the created categories based on the size and/or shape and/or colour of the identified object image (10’) with computer vision software, and the virtual image content is generated taking into account the category of the at least one geometrical object (10).

4. Method according to any of claims 1 to 3, characterised by determining the 3-dimensional spatial position of the geometrical object (10) by

- determining the points of the object image (10’) of the geometrical object (10) closest to the bottom and the top of the field of view of the digital camera (12),

- fitting a straight line between the determined points, and

treating the fitted straight line as being perpendicular to the floor plane (201 ) of the space part (200), and the end point of the fitted straight line closer to the bottom of the field of view as a point lying in the floor plane (201 ) of the space part (200).

5. Method according to any of claims 1 to 4, characterised by determining the 3-dimensional spatial position of the geometrical object (10) by:

- determining data relating to the real size of the geometrical object (10) prior to producing the raw image stream,

- determining data relating to the size of the identified object image (10’) associated with the geometrical object (10), then

- comparing the data relating to the real size of the geometrical object (10) and the data relating to the size of the object image (10’) with each other.

6. Method according to any of claims 1 to 5, characterised by providing a sound generating unit (18) for generating sound and being in data connection with the central computer unit (16), and generating a digital audio signal dependent on the 3-dimensional spatial position of the at least one geometrical object (10) with the central computer unit (16), then playing the generated digital audio signal with the sound generating unit (18).

7. Method according to any of claims 1 to 6, characterised by providing the display unit (14) as an LCD, LED, OLED, or plasma display, or as a projector, and providing the sound generating unit (18) as a loudspeaker.

8. Method according to any of claims 1 to 7, characterised by selecting the known internal parameters of the digital camera (12) from a group containing resolution and size of a sensor of the digital camera (12), and focal length and size of a field of view of the digital camera (12).

9. Method according to any of claims 1 to 8, characterised by providing the fixed arrangement of the digital camera (12) with respect to the space part

(200) by determining the distance of the digital camera (12) from the floor plane

(201 ) of the space part (200) and the orientation of the digital camera (12).

10. System (100) for creating virtual or augmented reality of a space part (200) that contains at least one geometrical object (10), characterised by comprising:

- a digital camera (12) with internal parameters and fixed spatial position as compared to the space part (200) adapted for recording images of the space part (200), having internal parameters and being arranged in a spatial position that is fixed with respect to the space part (200)

- a display unit (14) being in data connection with the digital camera (12), and - a central computer unit (16) being in data connection with the digital camera (12) and the display unit (14), and being adapted for sending, receiving and processing data;

wherein the central computer unit (16) is configured for

- identifying an object image (10’) corresponding to a geometrical object (10) in the images taken by the digital camera (12) by using the image data received from the digital camera (12),

- determining the 3-dimensional spatial position of the at least one geometrical object (10) based on the at least one identified object image (10’), and

- generating virtual image content depending on the 3-dimensional spatial position of the at least one geometrical object (10).

1 1. System (100) according to claim 10, characterised in that the central computer unit (16) is provided as a personal computer containing a storage device (16a) for storing the data received from the digital camera (12) and computer programs, and a central processing unit (16b) for processing the received data and running the computer programs.

12. System (100) according to claim 10 or 1 1 , characterised by the virtual image content being a 2-dimensional, or preferably 3-dimensional photorealistic animation, which is selected from a group containing, animals, plants, fictive characters, known persons, and wearable objects.

13. System (100) according to any of claims 10 to 12, characterised in that the display unit (14) is an LCD, LED, OLED, or plasma display or projector.

14. System (100) according to any of claims 10 to 13, characterised by containing a sound generating unit (18) adapted for generating sound and controlled by the central computer unit (16).

15 System (100) according to any of claims 10 to 14, characterised by the space part (200) having a horizontal floor plane (201 ), and the system (100) containing the digital camera (12) arranged at a known distance (d) from the floor plane (201 ).

16. System (100) according to any of claims 10 to 15, characterised by the known internal parameters of the digital camera (12) being selected from a group containing resolution and size of a sensor of the digital camera (12), and focal length and size of a field of view of the digital camera (12).