WO2017065348A1

WO2017065348A1 - Collaboration method using head mounted display

Info

Publication number: WO2017065348A1
Application number: PCT/KR2015/013636
Authority: WO
Inventors: 우운택; 노승탁; 여휘숑
Original assignee: 한국과학기술원
Priority date: 2015-10-15
Filing date: 2015-12-14
Publication date: 2017-04-20
Also published as: KR20170044318A; KR101763636B1

Abstract

A collaboration method using a head mounted display (HMD), according to the present invention, comprises the steps of: generating a common space, in which a user in a local area and a user in a remote area can collaborate using a common object, in an image of a virtual world; obtaining an image of a real world including the common space from a stereo camera; determining a position of a user's hand in the image of the real world from hand tracking information obtained from a depth sensor; generating a mask mesh, which is present at a position corresponding to the position of the user's hand and displayed on the image of the virtual world, by using the hand tracking information; generating an avatar to be displayed in the image of the virtual world by using an HMD tracking information and the hand tracking information of the user in the remote area; generating an output image, in which the common space, the mask mesh, and the avatar are displayed on the image of the real world, by combining the image of the real world and the image of the virtual world; and displaying the output image on the HMD.

Description

Collaboration method with head mounted display

The present invention relates to a collaboration system using a head mounted display. More specifically, the present invention relates to a collaboration system that can provide a collaborative environment close to the real world to participants of the collaboration system in a limited environment with a minimum of devices.

Mixed Reality is a technique for combining reality and virtual images. The main issue of mixed reality is blurring the line between virtual and reality to provide the user with an image without a boundary between the reality and the virtual image. In this regard, a head mounted display (HMD) is a device that enables a virtual reality experience, but until now has been used only in a high-level controlled environment such as a laboratory.

In recent years, consumer level HMDs have become commonplace, and some devices are being offered to users at acceptable prices. Although such consumer-level HMDs still have heavy and burdensome problems, they have become an opportunity for general users to use mixed reality, just as portable devices have made augmented reality famous in the past.

For many years, teleconference systems have limited communication channels to voice and video, and have used cameras to capture users in front of the screen. The disadvantage of this method is that users do not go beyond their area. Therefore, such a system has a problem in that verbal communication or eye contact is considered more important than supporting actual cooperative work between users.

This limited area problem can be solved with immersive display technology, for example, large two-dimensional displays that can provide a depth signal of the appearance of a remote user. For example, it is used to utilize wall-sized screens that combine spaces spaced into connected rooms. However, a single display has a limited view angle at which the user must always look at the screen, even if the head position is tracked, and this problem is called the 2.5D problem. That is, recent remote support environments have only set up a single display in front of the user, and thus there is a problem of limiting the user's visual and viewing directions.

The present invention provides a method for enabling remote collaboration.

The present invention employs HMD as the primary display to overcome the 2.5D problem described above. With this option, the present invention aims to summon a remote user of a remote space as an avatar to the local space where the user exists.

In addition, since the HMD has a screen just in front of the user's eyes, the user's head orientation is free, not limited to the front of the screen, and the user's view can be extended to local space as a whole.

In addition, while existing technologies allow collaboration to be possible only within the screen, the present invention has another object to enable actual collaboration with local and remote users in a common space.

Representative configuration of the present invention for achieving the above object is as follows.

The present invention relates to a collaboration method using a head mounted display device, wherein a common space in which a user in a local area and a user in a remote area can collaborate using a common object is created in an image of a virtual world. Doing; Obtaining an image of a real world including the common space from a stereo camera; Determining a position of a user's hand in the image of the real world from the hand tracking information obtained from a depth sensor; Generating a mask mesh existing at a position corresponding to the position of the user's hand and displayed on an image of the virtual world using the hand tracking information; Generating an avatar to be displayed on an image of the virtual world using HMD tracking information and hand tracking information of a remote space user; Combining the images of the real world and the images of the virtual world, respectively, to generate an output image in which the common space, the mask mesh, and the avatar are displayed in the image of the real world; Displaying the output image on the HMD; Provided, the collaboration method using the HMD.

In the present invention, generating the avatar generates the body motion of the avatar using the body tracking information obtained from the external camera.

In the present invention, the generating of the common space may include generating a common space using a global tracker and a global tracker, wherein the local tracker is used only for the initial setting of the common space.

According to the present invention, since a remote collaboration system providing a mixed reality based on HMD is provided, a remote user and a local user can easily collaborate. In more detail, each user can maintain their local space and use the HMD to see virtual objects, virtual space within the common space, and other users summoned as avatars, so that local users share a shared virtual within the common space. Collaborate effectively with objects.

In addition, since the present invention uses vision-based hand tracking, there is an effect of allowing direct interaction with shared objects with bare hands without additional devices or controllers.

1 is a block diagram illustrating a collaboration system according to an embodiment of the present invention.

2 is a diagram illustrating an internal configuration of a control computer according to an embodiment of the present invention.

3 is a diagram for describing generating an output image of an HMD according to an embodiment of the present invention.

4 is a flowchart illustrating an operation according to an embodiment of the present invention.

DETAILED DESCRIPTION The following detailed description of the invention refers to the accompanying drawings that show, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different but need not be mutually exclusive. For example, certain shapes, structures, and characteristics described herein may be implemented with changes from one embodiment to another without departing from the spirit and scope of the invention. In addition, it is to be understood that the location or arrangement of individual components within each embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the following detailed description is not to be taken in a limiting sense, and the scope of the present invention should be taken as encompassing the scope of the claims of the claims and all equivalents thereto. Like reference numerals in the drawings indicate the same or similar elements throughout the several aspects.

Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement the present invention.

According to an embodiment of the present invention, the collaboration system of the present invention is a control computer 100, a head mounted display (HMD) 200, a stereo camera 300, depth sensor 400, external camera ( 500 and network 600. In addition, the HMD 200, the stereo camera 300, the depth sensor 400, and the external camera 500 may be present in both the local space A and the remote space B.

The collaboration system of the present invention is very useful in that it can share a workspace and a user's motion with a user located remotely. For example, the collaborative system of the present invention may be applied to remote surgery of surgeons in local and remote spaces that conduct surgery of the same patient. The primary surgeon and the patient are physically located in the local space and fellow surgeons are located in the remote space, allowing surgery to be performed using the system of the present invention. At this point, the patient becomes a shared object, and movements performed by the surgeon in the remote space are tracked in real time, and can be replicated in the local surgeon's local space as a virtual character such as an avatar, thereby allowing the local surgeon to view the view angle. Without limitation, the operation of a mirrored remote surgeon can be seen in the local space.

In order to implement such a useful collaboration system, the present invention provides an HMD 200 based collaboration system. Hereinafter, the present invention will be described based on the role of each device.

First, the control computer 100 according to an embodiment of the present invention controls devices such as the HMD 200, the stereo camera 300, the depth sensor 400, and the external camera 500, and obtains from the devices. Using one piece of information, the image is rendered and controlled to create a mixed reality that combines the real world and the virtual world. In particular, the control computer 100 according to an embodiment of the present invention is based on the information obtained from the devices of the local and remote space, the user of the local space and the user of the remote space is the same co-space Enable collaboration on That is, the control computer 100 forms the same common space in the local space and the remote space, and generates an image which displays the avatars of the user in the local space and the user in the remote space in the common space of the remote space and the remote space, respectively, It serves to output to the users HMD (200).

In addition, the control computer 100 manages the rendering, co-space coordinate system, and enables communication on the part of the collaborator during networking.

Although FIG. 1 illustrates an embodiment in which an HMD 200, a stereo camera 300, a depth sensor 400, and an external camera 500 are provided in a local space and a remote space according to an embodiment of the present invention, a device is provided. These configurations can be variously modified according to the embodiment of the invention. For example, the external camera 500 may not exist in the remote space. In this case, since body tracking is not performed, only the hand gesture of the remote user based on the hand tracking is mirrored and displayed in the common space. In addition, although only one remote space corresponding to the local space is illustrated in FIG. 1, according to another embodiment of the present invention, there may be a plurality of remote spaces and a plurality of users according to the plurality of remote spaces.

Hereinafter, the operation of the devices controlled by the control computer 100 to describe the collaboration system of the present application.

The HMD 200 is a device that can be mounted on the user's head in local and remote spaces that can provide a user with an enclosing view of an image in which virtual objects and avatars are displayed. More specifically, the HMD 200 displays the avatar of the remote space in the common space, and provides a see-through image of the hand when there is a hand between the user's eyes and the virtual object, thereby providing a common view in the common space. It is a device that can display the image of collaboration.

Basically, the HMD 200 may be mounted on the user's head to present an image directly in front of the user's eyes. The HMD 200 used in the present invention may include a left screen and a right screen. Since the left screen and the right screen are respectively visible to the user's left eye and right eye, the user may naturally provide stereoscopic images to the user. That is, in the same way as the human eye sees, the image shown in the left eye and the right eye may be different to give a sense of depth to the image.

The present invention can overcome the 2.5D problem using the HMD 200. As described above, since the screen is fixed in the past, the user's head must face toward the screen to see the remote user. However, the present invention enables the collaboration with a remote user by using the HMD 200, regardless of the user's head position and orientation.

In existing 3D systems, if a hand is present between the user's eyes and the virtual object that the screen displays, the system could not see the virtual object because the user's hand physically blocked the screen. This is called a problem of closed handling.

In order to solve this problem, the present invention uses a see-through HMD 200. The see-through HMD 200 can be used because the screen is located in front of the user's eyes, thereby overcoming 2.5D problems and closed handling problems. The see-through HMD 200 can be further divided into optical see-through and video see-through. In an optical see-through it is possible to provide a real world image without any obscuring parts, and the virtual object can be placed on the real world view, but it can appear as if the virtual object is floating in the air.

Accordingly, the present invention employs a depth mask generation that selects the video see-through HMD 200 and uses the depth sensor 400 (or near depth camera). Therefore, realism may be given to the HMD 200 image by using the distance relationship between the virtual object and the real world. With this approach of the present invention, a virtual object, such as the avatar display of the present invention, can completely create an augmented virtual image in the user's HMD 200 without a physical substitute, such as a robot.

Next, the stereo camera 300 is a camera capable of generating a stereoscopic image. The stereo camera 300 captures objects in the same direction as two images by using two photographing lenses spaced apart by a distance between human eyes, thereby generating a stereoscopic image on the same principle that humans perceive objects in three dimensions. Is a camera. Due to the captured image of the stereo camera 300, it is possible to support a real world view to the user in three dimensions.

In one embodiment of the invention, the stereo camera 300 may be a camera embedded or attached to the HMD (200). That is, the stereo camera 300 according to an exemplary embodiment of the present invention may generate a stereoscopic image of things that the eyes of the user equipped with the HMD 200 can see beyond the HMD 200. For example, the left image and the right image generated by the stereo camera 300 may correspond to a scene in the real world that the left and right eyes of the user equipped with the HMD 200 can see.

Images generated by the stereo camera 300 may be corrected by the control computer 100. For example, when the lenses of the stereo camera 300 are fisheye lenses, the images generated by the stereo camera 300 may be corrected by the control computer 100 to correctly reflect the real world.

The generated images of the stereo camera 300 may be used as a real world view of the image displayed by the HMD 200. That is, a virtual object such as an avatar may be displayed on an image generated by the stereo camera 300.

Next, the depth sensor 400 is a device that enables hand tracking, interaction with a virtual object, and generation of a mask mesh. Using the depth sensor 400, information can be generated that knows where the user's hand is located in the real world and how the finger joints move.

Depth sensor 400 according to an embodiment of the present invention may be a near depth depth camera (near range depth camera). That is, it may be a camera device capable of measuring distance based on vision. In addition, the depth sensor 400 may be embedded or attached to the HMD 200 like the stereo camera 300.

The most common and ideal method for 3D user interaction is direct interaction with bare hands and fingers. Humans are used to using their hands throughout their daily work, and human fingers have very high degrees of freedom. However, providing hand interaction in mixed reality presents difficulties for tracking hands and fingers in real time. Traditional hand tracking devices include data globes using infrared markers. These devices are very expensive and have hindered the naturalness of the user experience. Thus, the present invention can track the movement of the hand using a vision based depth sensor.

Next, the exocentric camera 500 may generate body motion information of the user. According to one embodiment of the invention, the external camera 500 is installed in a local space or a remote space to scan the user's body motion information, so that the avatar in the local or remote space mirrors the user's body motion.

Meanwhile, the network 600 connects the plurality of devices and the control computer 100. That is, the network 400 refers to a network that provides a connection path for transmitting and receiving packet data after a plurality of devices are connected to the control computer 100. That is, the network 400 according to an embodiment of the present invention may be configured regardless of a communication mode such as wired communication or wireless communication, and includes a local area network (LAN) and a metropolitan area network (MAN). ), And various networks such as a wide area network (WAN). However, the network 400 may include, at least in part, one of the known wired and wireless data networks, without needing to be limited thereto.

2 is a diagram showing the internal configuration of the control computer 100 according to an embodiment of the present invention.

2, the control computer 100 includes a control unit 101, a communication interface unit 102, a common space setting unit 110, a mask mesh generating unit 120, a hand tracking unit 130, and an avatar generating unit. 140, a calibration unit 150, and an output image generator 160.

First, the controller 101 may include the common space setting unit 110, the mask mesh generator 120, the hand tracking unit 130, the avatar generator 140, the calibration unit 150, and the output image generator 160. It is responsible for coordinating the whole process so that each can play its role.

In addition, the communication interface 102 may communicate with external devices, that is, the internal components of the HMD 200, the stereo camera 300, the depth sensor 400, the external camera 500, and the control computer 100. It can provide an interface.

The common space setting unit 110 sets a virtual common space in which a user of a local space and a remote space may collaborate in which a shared object may be located. The main object of the present invention is immersive and intuitive remote collaboration using hand-based interactions. Using the summoned avatar as a representation of the remote user, the user's motion can be mirrored on a common space. In this case, a part of the user's local space becomes a common space, and the common space is a space that allows the local and remote users to share the virtual object and manipulate the virtual objects together.

In order to register and track a coordinate system for a lightweight system without resorting to environmentally bound sensors and displays, the present invention employs a hybrid method to localize a user's HMD 200 pose and register a common space. use.

In the hybrid method of the present invention, two types of trackers are used, an outside-in global tracker and an inside-out local tracker. The global tracker allows the marker to be tracked while it is in the defined space, and has more flexibility. However, the local tracker must always have a marker in view, thus limiting the camera view direction. Although the global tracker can remove the limitation on the user's viewpoint problem, the global tracker cannot register a common space in the user's virtual world coordinates. Thus, the present invention uses a local tracker for registering local markers as the basis of a common space.

For example, the user only needs to look at local objects in the initial setup stage to use the remote collaboration system. The present invention can provide an unlimited view using a local tracker only once in the initial setup stage and a global tracker generally in stages for the remaining remote collaboration system.

When the common space setting unit 110 of the present invention poses a local object registered in the global tracker, the common space coordinate information is calculated. The pose of the registered local object is the basis of the shared virtual object in the user's space. The user's hand or body data is converted to local coordinates based on the underlying object pose and transmitted to the remote user's space.

The mask mesh generating unit 120 generates a mask mesh having the same shape as a user's hand. The resulting mask mesh is set to be transparent or opaque, thus providing a solution to the occlusion handling.

Occlusion handling is an important issue for the see-through HMD 200. The present invention uses a masking mesh to handle occlusion of the hand between the user's eyes and the virtual objects. Firstly, the present invention converts a depth image into a 3D point cloud and converts it to a mesh by applying simple triangulation to the 3D point cloud. The present invention can set the generated mask mesh to translucent or fully transparent by changing the shader. If the shader is translucent, the user's hand is see-through. Or, if the shader is completely transparent, it creates an empty space on the virtual image, so that the user's actual hand can be seen completely in the HMD 200 view and can hide anything behind the hand. In one embodiment of the present invention, fully transparent shaders are more preferred, which can provide an experience that is more closely related to how a person perceives the real world.

The hand tracking unit 130 determines the position and movement of the user's hand using the data generated from the hand tracking information.

In order to add the mask mesh generated by the mask mesh generator to the hand output on the HMD 200 image, accurate hand tracking is required. As described above, the present invention uses the depth sensor 400 to perform vision-based hand tracking. Alternatively, according to another embodiment of the present invention, a device for simple hand tracking and a device for generating depth data of a hand may be used.

The avatar generator 140 provides an expressive method of representing a remote user. The avatar generator 140 generates an avatar to be displayed in the virtual world using the head position information, the hand tracking information, and the body tracking information of the user. The avatar can mirror the physical motion or the user's body motion without using physical proxy or robotic hardware to represent the remote user. An advantage of the avatar method is that it makes the remote user aware that it is located in the local user's space. The avatar generator 140 of the present invention may generate an avatar corresponding to both a user in a local space and a remote space.

The present invention uses the hand tracking results of the local user to interact with and coordinate with the virtual objects. Hand tracking information and head poses are transmitted to the remote space via the network 600 in real time. In remote spaces, this information is replicated in avatar motion, allowing the user to collaborate with high accuracy.

As local and remote users share a common space, it is simple to summon a virtual avatar as a remote user to the local space. Initialization of the avatar in the real world is completed by placing a chess board marker on the floor. The chessboard marker acts as a virtual anchor to the summoned remote space and can be physically relocated by the local user as required. The chessboard marker also creates a virtual floor plane that is aligned with the real world floor plane. For this reason, the summoned avatar can be properly positioned on this plane.

In networking, the present invention transmits only the HMD 200 pose, skeletal joint data from the body and hands connected with the other side. Thus, bandwidth requirements are relatively light when compared to video conferencing systems.

The tracking sensor of the external camera 500 may track the full body skeleton with a certain number of joints. Using this information, the present invention can scale and control the virtual avatar according to the actual user. Thus, the avatar must mirror the real world size and body motion of the tracked user. Since the hand tracking supported by the external camera 500 is limited, the present invention relies on the depth sensor 400 for fully connected hand hand tracking. Thus, since the body tracking information is obtained from the outer core camera 500 and the hand tracking information from the depth sensor 400, the control computer 100 mixes the body tracking information and the hand tracking information as a whole and thereby displays the final. Adjust so that the results look natural when viewed from the remote side.

To this end, the present invention may combine the hand information obtained from the depth sensor 400 to the wrist joint position of the body information generated by the external camera 500. In addition, the present invention provides for use in the forearm tracked by the external camera 500 to overcome significant instability of the wrist and elbow joints tracked by occlusion problems that occur when the hand is pointing towards the external camera 500. The target of the adjustment position can be changed with the palm tracked by the depth sensor 400.

The calibration unit 140 corrects the distortion of the images obtained from the devices, and calibrates the images from the devices to match. Through the HMD 200, a user can view a virtual stereoscopic image placed on a real world background image captured by the stereo camera 300. These real-world images can be originally obtained through the fisheye lens of the stereo camera 300, and thus need to be prevented and corrected for distortion. One of the undistorted images is used as the input image of the local tracker.

In an initial implementation of the present invention, the left camera image of the stereo camera 300 may be used as the input image of the local tracker. In this case, the basis of the transformation of the calibration step may be represented as Trgb_L.

Virtual stereoscopic images are rendered by the control computer 100 based on the virtual left (Trgb_L) and virtual right (Trgb_R) camera poses, which are sub-transformations of the HMD 200 pose THMD. The present invention also sets the basic depth information of the depth camera to Tdepth to manage information from the depth sensor 400 such as connected hand tracking and real world depth. In addition, the present invention may use a tracker inside the HMD 200 to track the THMD that is the position or depth information of the HMD 200.

In preparation for the use of the collaboration system of the present application, two steps of calibration process are required to obtain the internal and external parameters of the cameras, 1) calibration in the same module and 2) calibration between other modules.

In a first step, the present invention can calibrate the stereo camera 300 using the tools and chessboard provided. In addition, according to other embodiments of the present invention, some devices, for example, the depth sensor 400, may be automatically calibrated. In addition, as described above, when two or more devices are used in the depth sensor 400 for hand tracking, calibration is performed on devices in the same module.

In a second step, the present invention calibrate devices in other device modules, here mainly images of cameras. After the first step, the present invention assumes that two or more cameras in the same module are properly calibrated. In the present invention, in order to provide the HMD 200 image without interruption, the images of the stereo camera 300, the depth sensor 400, and the external camera 500 may be calibrated to match.

The output image generator 160 combines the left image and the right image of the real world with the left image and the right image of the virtual world including the mask mesh, respectively, to generate a left image and a right image to be displayed on the HMD 200.

3 is a diagram for describing an output image generating unit 160 generating an output image of the HMD 200 according to an embodiment of the present invention.

Referring to FIG. 3A, first, a real world left image and a right image generated by the stereo camera 300 are combined with a virtual world left image and a right image including virtual objects such as a mask mesh and an avatar. In this case, as described above, the calibration of the calibration unit 140 may be performed when combining the images.

3 (b) is a photographic example showing a simulation result according to an embodiment of the present invention. Referring to FIG. 3B, it can be seen that the real world left image L and the real world right image R are obtained from the stereo camera 300. In addition, referring to the right side of the arrow of FIG. 3B, the real world left image L and the real world right image R are combined with the virtual world left image and the right image to be displayed on the HMD 200. You can see that (OUTPUT) is created.

Referring to (OUTPUT) of FIG. 3B, the avatar, the chess board, and the sky blue transparent common space are displayed on the existing real world left image L and the real world right image R. FIG. As described above, virtual objects such as avatars, chess boards, and common spaces are the result of combining images of the virtual world with real world images. In the present embodiment, the shader is set to the transparent state to show the user's hand as a completely opaque state, but in another embodiment of the present invention, the mask mesh is displayed and the user's hand can be seen as the see-through state.

First, a common space in which a user in a local area and a user in a remote area can collaborate using a common object is created in an image of a virtual world. (S1)

Next, an image of a real world including a common space is obtained from a stereo camera. (S2)

Next, the position of the user's hand in the image of the real world is determined from the hand tracking information obtained from the depth sensor. (S3)

Next, by using the hand tracking information, a mask mesh existing at a position corresponding to the position of the user's hand and displayed on an image of the virtual world is generated. (S4)

Next, an avatar to be displayed on the image of the virtual world is generated using the HMD tracking information and the hand tracking information of the remote space user. (S5)

Next, an image of the real world and an image of the virtual world are respectively combined to generate an output image in which a common space, a mask mesh, and an avatar are displayed on the image of the real world. (S6)

Finally, the output image is displayed on the HMD. (S7)

Particular implementations described in the present invention are embodiments and do not limit the scope of the present invention in any way. For brevity of description, descriptions of conventional electronic configurations, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connection or connection members of the lines between the components shown in the drawings by way of example shows a functional connection and / or physical or circuit connections, in the actual device replaceable or additional various functional connections, physical It may be represented as a connection, or circuit connections. In addition, unless specifically mentioned, such as "essential", "important" may not be a necessary component for the application of the present invention.

In the specification (particularly in the claims) of the present invention, the use of the term “above” and similar indicating terminology may correspond to both the singular and the plural. In addition, in the present invention, when the range is described, it includes the invention to which the individual values belonging to the range are applied (if not stated to the contrary), and each individual value constituting the range is described in the detailed description of the invention. Same as Finally, if there is no explicit order or contrary to the steps constituting the method according to the invention, the steps may be performed in a suitable order. The present invention is not necessarily limited to the description order of the above steps. The use of all examples or exemplary terms (eg, etc.) in the present invention is merely for the purpose of describing the present invention in detail, and the scope of the present invention is limited by the examples or exemplary terms unless defined by the claims. It doesn't happen. In addition, one of ordinary skill in the art appreciates that various modifications, combinations and changes can be made depending on design conditions and factors within the scope of the appended claims or equivalents thereof.

Embodiments according to the present invention described above can be implemented in the form of program instructions that can be executed by various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. medium) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be modified with one or more software modules to perform the processing according to the present invention, and vice versa.

Although the present invention has been described by specific matters such as specific components and limited embodiments and drawings, it is provided only to help a more general understanding of the present invention, and the present invention is not limited to the above embodiments. Those skilled in the art may make various modifications and changes from this description.

Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the spirit of the present invention is defined not only in the claims below, but also in the ranges equivalent to or equivalent to the claims. Will belong to.

Claims

As a collaboration method using a head mounted display (HMD) device,

Generating a common space in an image of a virtual world in which a user in a local area and a user in a remote area can collaborate using a common object;

Obtaining an image of a real world including the common space from a stereo camera;

Determining a position of a user's hand in the image of the real world from the hand tracking information obtained from a depth sensor;

Generating a mask mesh existing at a position corresponding to the position of the user's hand and displayed on an image of the virtual world using the hand tracking information;

Generating an avatar to be displayed on an image of the virtual world using HMD tracking information and hand tracking information of a remote space user;

Combining the images of the real world and the images of the virtual world, respectively, to generate an output image in which the common space, the mask mesh, and the avatar are displayed in the image of the real world; And

Displaying the output image on the HMD;

Including, the collaboration method using the HMD.
The method of claim 1,

The generating of the avatar may include generating a body motion of the avatar by using body tracking information obtained from an external camera.
The method of claim 1,

The generating of the common space may include generating a common space using a global tracker and a global tracker.

And the local tracker is used only for the initial setup phase of the common space.