WO2015199502A1

WO2015199502A1 - Apparatus and method for providing augmented reality interaction service

Info

Publication number: WO2015199502A1
Application number: PCT/KR2015/006591
Authority: WO
Inventors: 우운택; 하태진
Original assignee: 한국과학기술원
Priority date: 2014-06-26
Filing date: 2015-06-26
Publication date: 2015-12-30

Abstract

The present invention comprises the steps of: generating reference coordinates on the basis of a three-dimensional image including depth information acquired through a camera; dividing an area corresponding to a preconfigured object on the basis of depth information and a color space conversion of the preconfigured object from a three-dimensional image including the depth information acquired through the camera; separating a sub-object having a motion component from an object of the divided area and modeling the separated sub-object and a palm area connected to the sub-object on the basis of a preconfigured algorithm to detect a feature point; and controlling a three-dimensional object for using an augmented reality interaction service by estimating a posture of the sub-object on the basis of joint information of an object provided through a predetermined user interface.

Description

Apparatus and method for providing augmented reality interaction service

The present invention supports hand interaction with augmented virtual objects in an HMD-based wearable environment equipped with an RGB-D camera based on geometric-based registration coordinate correction using an RGB-D camera for wearable augmented reality authoring. It is about technology to do.

In order to create an augmented reality space for a real space that is not pre-modeled, a user uses a camera to acquire image feature-camera pose information about the real space, and obtain local reference coordinates (or matched coordinates). After generating, you need to match the coordinate system of the virtual space based on this. However, since the matching coordinate system is generated at an arbitrary position, a process of manually correcting the attitude of the coordinate system is necessary.

In addition, a three-dimensional virtual object modeled in units of the real space, for example, meters (m), may be augmented accurately in the augmented reality space through a correction process that accurately matches the scale between the real space and the augmented reality space.

In one embodiment of the conventional matching method, the GPS / compass sensor-based matching method has a problem that the accuracy of matching is very low due to an error of sensor information, and the 2D object-based matching method requires a pre-learned image. There is a disadvantage that the object is not suitable for any three-dimensional space registration because the object is limited to a simple two-dimensional plane.

Since 3D space-based registration generates a registration coordinate system for augmentation at an arbitrary position, it is necessary to manually correct the coordinate system attitude by the user, and in order to perform such correction, the user has expertise in computer vision / graphics, etc. If this is necessary and the user inputs incorrectly, an error of matching may occur due to incorrect input.

In addition, as an example of a conventional augmented reality system, Korean Patent Publication No. 10-0980202 relates to a mobile augmented reality system and method that can interact with a three-dimensional virtual object, the camera attached to the terminal, the camera of the terminal By using the image processing unit for generating a three-dimensional virtual object on the hand, the display unit for outputting the image of the three-dimensional virtual object and the hand and the interaction unit for controlling the three-dimensional virtual object in response to the movement of the hand, Users can access 3D virtual content anytime, anywhere using a mobile device. As described above, the technology is a technique for accessing 3D virtual content using a mobile device, and does not include automatically generating and correcting a registration coordinate system for matching virtual space.

Thus, there is a need for a method that can automatically generate and correct a registration coordinate system for matching virtual space.

On the other hand, research on immersive content and interaction technology, which combines digital technology and cultural art, is receiving much attention. In particular, with the development of augmented reality technology based on technologies such as computer graphics and computer vision, attempts have been made to combine virtual digital content into the real world. Also, as cameras and HMDs become lighter and smaller, wearable computing technologies are accelerating. Among the many user interface technologies currently being studied, the hand is attracting attention as a natural technology for wearable computing technology.

Conventionally, various interface technologies exist for obtaining digital information about objects, spaces, and situations of interest to users. Devices for such interfaces include desktop-based interfaces such as mice, keyboards, and remote controls. This interface technology can be used to handle the digital technologies shown on the 2d screen. However, this is limited in terms of space since it is intended for 2D display.

The real space we live in is 3d space. Using the interface for the existing 2d display in this real space has a limitation because the order of space is reduced by one.

Therefore, 3d interface technology is required to deal with virtual digital content combined in 3d space.

The HMD with a camera provides the user with a first-person view, unlike displays in traditional desktop environments.

However, in such a camera environment, the study of estimating the finger posture of the bare hand has the following problems.

First, the hand is an object with 26 high-dimensional parameters (palm: 6DOF, 5 fingers: 45 = 20DOF). Estimating the posture of a finger with this high dimension requires a large amount of computation.

Secondly, the hand is an object without a texture. This means that a feature-based object detection / tracking algorithm from color information cannot be applied to finger posture estimation. As described above, the task of detecting / tracking a hand and estimating the posture of a finger based on a camera has a challenging condition.

The WearTrack system is a wearable system using a magnetic tracker and an HMD equipped with a posture estimation sensor. Systems such as virtual touch screen systems, AR memo, and SixthSense are characterized by 2d interaction based on a 2d image coordinate system. However, this has the disadvantage of not interacting in 3d space because it is 2d based interaction.

Tinmith and FingARtips attach additional markers on the glove to estimate hand posture. However, since the size of the separate sensor is very large, it is not suitable for the wearable environment from the user's point of view.

A feature point based approach has also been developed. This is a method of estimating finger motion by recognizing a pattern through prior learning. The system locks an RGB-D camera, such as Kinect, to face, and estimates the movement of a user's hand wearing a glove with a specific pattern. There is a disadvantage that an additional glove is required to recognize a user's finger posture.

The Digits system demonstrates fingertip tracking for wearable devices. Time of Flight (TOF) depth sensor was worn on the wrist, and the setting was performed to prevent the finger from covering up. It uses simple carving technique to classify fingers and estimate finger posture using the relationship between finger joints. However, this method has a disadvantage in that the sensor must be attached to an additional part such as the wrist in addition to the HMD.

Therefore, in order to solve such a problem, the present invention estimates the finger posture of the bare hand, and aims to estimate the posture of the finger when the finger is bent toward the camera.

In addition, an object of the present invention is to provide a geometric recognition-based matching coordinate system correction method and apparatus for wearable augmented reality authoring that can automatically generate / correct the matching coordinate system for matching the virtual space based on the actual measurement.

According to an aspect of the present invention, a process of generating reference coordinates based on a three-dimensional image including depth information obtained through a camera, and a three-dimensional image including depth information obtained through the camera. Dividing a region corresponding to the predetermined object based on depth information and color space transformation of a predetermined object, separating a sub-object having a motion component from the divided region object, and separating Modeling the sub-object and a palm area associated with the sub-object based on a predetermined algorithm to detect a feature point, and based on joint information of the object provided through a predetermined user interface. Process of estimating posture and controlling 3D objects for using augmented reality service Characterized in that it comprises a.

According to another aspect of the present invention, a registration coordinate system correction unit for generating reference coordinates (Reference Coordinates) based on a three-dimensional image including depth information obtained through the camera, and a depth information obtained through the camera An object separation unit for dividing a region corresponding to the predetermined object based on depth information and color space transformation of a predetermined object from a 3D image, and a sub-object having a motion component from the object of the divided region An object processor which detects a feature point by modeling the separated sub-object and the palm region associated with the sub-object based on a predetermined algorithm, and skeleton information of the object provided through a predetermined user interface. Augmented reality by estimating a posture of the sub-object based on And a controller for controlling the 3D object for using the service.

In the present invention, since there is no occlusion between the fingers and the fingertips are always visible from the camera, a large self occlusion does not occur, and thus the posture of a finger having a high degree of complexity may be estimated in real time.

In addition, according to the present invention, the matching coordinate system for matching the virtual space is automatically generated / corrected based on the actual measurement, so that the matching coordinate system can be automatically generated and corrected without a correction operation by the user.

As described above, the present invention can be used as an underlying technology required for authoring augmented reality content in various fields such as augmented reality-based art galleries / museums, classrooms, industries, interior design, etc., because the matching coordinate system can be automatically corrected.

1 is a flowchart illustrating a method for providing augmented reality interaction service according to an embodiment of the present invention.

FIG. 2 is a diagram illustrating a screen to which a user's visual distance perception improvement method is applied when interacting with a bare hand in a head wearable display-based augmented reality environment according to an embodiment of the present invention.

3 is a flowchart illustrating an operation of correcting a coordinate coordinate system in the augmented reality interaction service providing method according to an exemplary embodiment.

4 is a detailed block diagram of an operation algorithm for estimating a hand posture in a method for providing augmented reality interaction service according to an exemplary embodiment of the present invention.

5 is a view illustrating a screen related to visual feedback for improving depth perception in the augmented reality interaction service providing method according to an exemplary embodiment.

6 is a view illustrating a screen related to a semi-transparent gray shadow and guideline in the augmented reality interaction service providing method according to an embodiment of the present invention.

7 is a view illustrating a finger joint related position vector in the augmented reality interaction service providing method according to an exemplary embodiment of the present invention.

8 is a diagram illustrating a screen for an overall operation to which a method for improving visual perception of a user is applied in the augmented reality interaction service providing method according to an exemplary embodiment.

9 is a diagram illustrating a registration coordinate correction correction method in the augmented reality interaction service providing method according to an embodiment of the present invention.

10 is an example of candidates of a matching coordinate system in 3D space in the method of providing augmented reality interaction service according to an embodiment of the present invention.

11 is an example of setting a rotation axis of a registration coordinate system in the augmented reality interaction service providing method according to an embodiment of the present invention.

12 is an example of a scale correction using a distance ratio between a SLAM-based registration coordinate system and a depth camera-based registration coordinate system in a method for providing augmented reality interaction service according to an embodiment of the present invention.

13 is an example of a position correction in the augmented reality interaction service providing method according to an embodiment of the present invention.

14 is a view illustrating a rotation correction in a method for providing augmented reality interaction service according to an embodiment of the present invention;

15 is a block diagram of an apparatus for providing augmented reality interaction service according to an exemplary embodiment.

16 is a block diagram of a registration coordinate system correcting unit in the apparatus for providing augmented reality interaction services according to an embodiment of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, specific details such as specific components are shown, which are provided to help a more general understanding of the present invention, and the specific details may be changed or changed within the scope of the present invention. It is self-evident to those of ordinary knowledge in Esau.

The present invention relates to providing an augmented reality interaction service, and more particularly, in authoring wearable augmented reality, based on a measurement coordinate system for matching a virtual space using information obtained by an RGB-D camera. Depth information and color space of a predetermined object from a three-dimensional image including depth information for automatically generating / correcting and estimating the pose of the object for interaction with the virtual object in augmented reality. After dividing the hand object based on the transformation, a finger having a motion component and a palm region associated with the finger are modeled through a predetermined algorithm to detect a feature point, and based on the skeleton information of the object provided through a predetermined user interface. 3D customer for using augmented reality service by estimating the posture of the sub object By controlling the user as well as to make available a variety of 3D content, to provide a technique that enables to provide an interface which effectively controls the object in three-dimensional space for the developer.

In addition, the present invention can automatically generate and correct the matching coordinate system for matching the virtual space on the basis of the actual measurement, it is possible to automatically generate and correct the matching coordinate system without the correction work by the user, furthermore augmented reality-based galleries / museums, classrooms To provide a foundation technology for authoring augmented reality content in various fields such as, industrial, interior design, etc.

Hereinafter, a method for providing augmented reality interaction service according to an exemplary embodiment of the present invention will be described in detail with reference to FIGS. 1 to 8.

First, FIG. 1 is a flowchart illustrating an augmented reality interaction service providing method according to an exemplary embodiment.

Referring to FIG. 1, in step 110, reference coordinates are generated based on a 3D image including depth information obtained through a camera.

The operation of step 110 is performed by analyzing the geometry of the real space using the depth image information photographed for the real space and generating a matching coordinate system for the real space. (E.g., Head Mounted Display) for correcting a coordinate-based coordinate coordinate system based on geometric recognition for augmented reality authoring, which is a method for more robustly performing an object pose estimation for interaction with a virtual object in augmented reality described below. Interface supported by.

Subsequently, in step 112, depth information and color space transformation of a predetermined object are performed from a three-dimensional image including depth information acquired through the camera, and based on this, the object corresponds to the preset object in step 114. Split the area.

Herein, the predetermined object refers to a hand object, and according to an embodiment of the present invention, the hand object is divided through an operation of steps 112 to 114 from an RGB image and a depth image.

More specifically, for robust skin region division, the RGB color space is converted from the RGB image to the HSV color space, and the skin color space is saturated and saturated for robust skin region division. Obtained by performing a double threshold on the (value) element.

In addition, the distance from the depth image by the distance (arm distance) from the camera where the hand is attached to the HMD is set as a threshold.

From the intersection of the result of depth segmentation and RGB segmentation, it is possible to easily and robustly divide the area of the hand. For example, the threshold is set to 60 cm, and the segmented depth image and the color image are aligned using a known calibration.

In step 116, the sub-object having the motion component is separated from the object in the divided region, and in step 118, the sub-object and the palm region associated with the sub-object are modeled based on a predetermined algorithm to generate a feature point. Detect. In this case, the feature point includes a finger reference point and an end point of the depth information-based hand, and the end point of the hand is extracted using template matching from a pre-modeled depth template.

This operation is performed since the palm and the finger must be separated from each other to estimate the posture of the finger corresponding to the sub-object from the image of the divided hand. In this operation, the hand image utilizes a mophological operation. Your fingers and palms are separated.

In this case, the morphological operation is a finger and palm are separated by using erosion and dilation, the erosion is an operation of eroding the image from the outside, the dilation is inflated in contrast to the erosion In operation, as shown in FIG. 2, when the erosion is repeatedly performed, the area of the finger gradually disappears. After that, the palm area can be modeled by expanding the area of the palm by performing a dilation. The center point of the palm is computed through a distance transform, and the calculated center point is the basis for the finger's reference point search.

In addition, the finger is modeled through the operation of step 116 with the palm, the finger is modeled by the elliptic fitting. As shown in Equation 1 below, the smallest distance between the points of the modeled ellipse (finger) and the center point of the palm is estimated as the reference point of the finger. This makes it possible to find the reference point of the finger even when the finger is bent to some extent.

Equation 1

On the other hand, using the procedure as shown in Figure 5, when the finger is bent inward to the palm side, the reference point of the finger is detected, but the end point of the finger is not detected. This is because the points of the elliptic fitting model do not include the fingertips on the image. In other words, when the finger is bent, the point at the end of the finger is detected instead of the fingertip. For this reason, when applying inverse kinematics, a large error occurs in estimating a parameter of a finger joint.

Thus, in one embodiment of the present invention, the end point of the hand is detected by using depth information, not only by detecting a fitted ellipse on 2d. To this end, the present invention utilizes a known Zero-mean Normalized Cross Correlation (ZNCC) in an image processing application to extract an end point of a hand.

As shown in FIG. 6, an end point of a hand may be extracted using template matching from a depth-template previously modeled. The red portion of the correlation map of FIG. 6 is the portion that most closely matches the depth template. This approach shows that the fingertip position can be detected even when the finger is bent. The position of the detected fingertip and finger reference point is input to the inverse kinematics algorithm in a later module.

Finally, in step 120, the posture of the sub-object is estimated based on joint information of the object provided through a predetermined user interface to control the 3D object for using augmented reality service.

More specifically, the present invention applies the above-mentioned inverse kinematics for finger posture estimation. Inverse kinematics is a parameter of joints when a reference coordinate system and an end point position are given. parameter) to estimate the base point obtained from the camera as the origin of the reference coordinate system and set the position of the fingertip to the end point.

Then, the rotation matrix of joints is estimated by applying inverse kinematics. Since there are a total of four parameters for moving the finger, there are a total of four parameters to be estimated for each finger.

Here, the inverse kinematics algorithm is an inverse-kinematics algorithm based on the damped least-square-method.

This algorithm estimates the amount that each joint should change using the difference between the target point (the position of the fingertip obtained from the camera) and the current point (the position of the fingertip of the current model).

Referring to FIG. 7, as shown in FIG. 7, is a position vector of the current finger end point, and is a position vector of the end point of the finger obtained through image processing (the origin of the reference coordinates of the two vectors is the reference point of the finger). . θ is a parameter of the rotation matrix of the finger joint, λ is a damping ratio parameter. L1, L2, and L3 are the length of each node of the finger. An optimization problem of the inverse kinematics algorithm may be defined as in Equation 2 below. For example, the higher the parameter is set to 1000, the higher the stability of the inverse kinematics algorithm is.

Equation 2

Subsequently, the 3D object is manipulated by the operation 120. The virtual object manipulation according to the present invention is performed according to the posture of a finger which can be widely used by a user. The posture of the finger being targeted here is a posture mapped from the number of fingers.

As shown in FIG. 8, the tong-shaped hand posture determines the position of the globe. Then, as the operation of pinching and spreading five fingers, the size of the globe was manipulated. From this interaction, a user wearing an HMD with an RGB-D camera can obtain virtual digital information by adjusting the position and size of the virtual globe, which is an augmented virtual object.

The operation algorithm for estimating the posture of the hand for the method of providing augmented reality interaction service according to an embodiment of the present invention described above is shown in FIG. Referring to FIG. 4, the block-specific operation of FIG. 4 is as follows.

-손 분리(Hand Segmentation, 410)Hand Segmentation (410)

The hand object is split from the RGB image and the depth image (401, 402).

First, for the influence of illumination from 401, the rgb color space is converted to the HSV color space for robust skin region division. This skin color space is obtained by performing a double threshold on the S and V elements.

From 402, the hand sets the distance to the threshold (distance) from the camera attached to the HMD, and detects the outline.

From the intersection of depth segmentation and rgb segmentation results from 401 and 402, it is possible to easily and robustly segment the area of the hand.

-손바닥과 손가락 모델-Palm and finger model 링(411, 412)Rings (411, 412)

From the image of the segmented hand, the palm and the fingers must be separated to estimate the pose of the finger. In this step, the hand image performs morphological operations (erosion, dilation) and further associates the subtraction with the dilation, resulting in the separation of the fingers and palms (palm imgae, finger image).

The palm image performs distance transform and center and radius extraction for palm center position operation.

-손가락 특징 추출(414)Finger Feature Extraction (414)

Contour detection from finger image,

Ellipse fitting,

Direction and ordering refinement with fingertip, finger base, orientation detection

-이력 관리(416)History Management (416)

Palm center position, radius, finger position, direction and length

Lost finger detection

-내부 손가락끝 검출(418)Internal fingertip detection (418)

Area Search, Template Matching, Fingertip Extraction

-인버스 키네매틱(Inverse Kinematics, 420)Inverse Kinematics (420)

Joint angle determination, stabilization

-증강 및 상호 작용(422)-Augmentation and Interaction (422)

Camera tracking and virtual hand registration,

Collision Detection and Gesture Analysis

Meanwhile, the method for providing augmented reality interaction service according to an exemplary embodiment of the present invention automatically corrects the attitude of the coordinate system through geometry recognition-based registration coordinate system correction in authoring augmented reality content, and through the flowchart of FIG. 3. Let's take a closer look.

3 is a flowchart illustrating a method for correcting a geometry-based matched coordinate system in a method for providing augmented reality interaction service according to an exemplary embodiment of the present invention.

Referring to FIG. 3, the method according to the present invention receives depth image information from a depth camera, for example, an RGB-D camera, and receives a region of interest set by a user input (S310 and S320).

In this case, the depth image information is information captured and generated by the depth camera, and may include a captured image feature, a posture information of the camera, a distance map image based on depth information, a color, and the like. Can be received after being set by the user input used.

When the ROI is received, the geometry of the ROI is analyzed using the received depth image information, and the first matched coordinate system based on the geometry is generated using the analyzed geometry (S330 and S340).

Here, step S330 may perform a geometric analysis for predicting a plane, a tangent, a tangent, an intersection point, etc. for the ROI received from the depth camera, and step S340 may perform a geometry analysis for the analyzed real space or the ROI. Through this, the coordinate system of the real space can be generated. 1) At least one of the plane, the tangent, the tangent, and the intersection point of the region of interest is predicted by analyzing the geometry of the region of interest, and the predicted plane, the tangent, the tangent, the intersection The first registration coordinate system may be generated through at least one of the above, 2) the origin and the direction are calculated through the geometric analysis of the ROI, and the front, Define one of the side and the floor, and generate a first registration coordinate system by correcting the calculated direction sign to match the predetermined left hand coordinate system of the virtual space. It may be.

When the first registration coordinate system based on the geometry is generated, the second registration coordinate system based on the SLAM is corrected using the generated first registration coordinate system based on the geometry, and then the most 3D object is created based on the corrected second registration coordinate system ( S350, S360).

Here, in operation S350, the second matching coordinate system may be corrected based on the actual measurement using the distance ratio calculation of the depth camera generating the distance image information and the distance of the SLAM-based camera.

This invention will be described with reference to FIGS. 4 to 8, and the generation of a registration coordinate system through depth-based geometric analysis and correction of the SLAM-based initial registration coordinate system will be described.

First, the generation of registration coordinate system through depth-based geometric analysis will be described.

An example of generating new local coordinates using depth information will be described. As shown in the example illustrated in FIG. 10, the origin position is used to create matching coordinates under various shape conditions having one side, two sides, and three sides. Calculate the direction and.

In an interactive manner, a user uses a mobile input device to determine the center location of an ROI from an RGB-D camera. A radial circular cursor of 50 pixels that controls the area of the depth map image based on the determined center position of the ROI is controlled. The 3D point group is reconstructed in the depth map and a local reference coordinate system, i.e., a first registration coordinate system, is generated.

To set the location of local reference coordinates, the planes are predicted from a 3D point cloud of the region of interest.

In this case, the plane estimation may be defined as an optimization problem for predicting the variables a, b, c, and d of the normal vectors constituting the plane equation, as shown in Equation 3 below, random sample consensus (RANSAC) ) Can be estimated through the algorithm.

Equation 3

1) If there is one plane, set the three-dimensional coordinates in which the 2D cursor point of the depth map is back projected to the position of the local reference coordinate. However, since there is no information such as intersecting lines or edges, there is a lack of information for finding three degrees of freedom rotation. So we can set only one degree of freedom rotation of the coordinate system from the normal vector n _plane of the plane (wall or floor). The rotation other than the vector is set by assigning the _camera's X axis direction vector (V _{camera's x axis} ) to the X axis direction vector of the local reference coordinate system. The unknown rotation parameter V _{unkonwn rotation} may be set through the cross product of the normal vector and the X axis, as shown in Equation 4 below.

Equation 4

2) When two planes intersect, the method of determining the three degrees of freedom position in the local reference coordinate is to calculate the 3D coordinates on the intersection line close to the point v _o selected by the user in the user's selection area. As shown in Equation 3 below, a point (v *) that minimizes the distance between v _o and v _i , a point in the 3D point group, is set as the reference position of the coordinate system.

Equation 5

At this time, the sum of the two plane equations is minimized, where v ₁ and v ₂ are points on planes π ₁ and π ₂ , respectively, and can be minimized if v _i is on the intersection line.

This equation is derived by the expansion of the Lagrange Multipliers, and the matrix value is calculated through QR decomposition. The rotation of the coordinate system is used to determine two normal vectors from the predicted planes, eg, the vertical and ground planes, to determine the direction of the coordinate system, eg, the vertical and bottom planes. The direction vector of the crossing line may be set by the cross product of the normal vector, and may be represented by Equation 6 below.

Equation 6

3) When three planes cross each other, the coordinate system origin is the intersection of three planes. Set the equation πv = d. Here, π means a matrix consisting of coefficients of a plane, and as shown in Equation 7 below, by minimizing the value of πv-d, it can be set as a reference position of the coordinate system.

Equation 7

At this time, the least square solution based on the SVD decomposition, which is an optimization technique, can be used to calculate the intersection point from the pseudo matrix, and the rotation can be set through the normal vectors of three planes.

In the estimated rotation in the previous step, the directions of the x, y, and z axes are not known exactly because the order and sign of the predicted normal vector may be changed. In the present invention, the order of the normal vectors follows the number of point groups. This is important for graphical rendering in a left hand or right hand based rendering system.

Therefore, as a post process, the rotation of the coordinate system is aligned in consideration of the rotation information of the RGB-D camera. After calculating the angle difference between the plane normal vector of the plane and the direction vector (front, side, maximum vector) of the camera through the cross product, the normal vector having the minimum angle difference with respect to the direction vector of each camera is found. The normal vector determines the direction vector of the camera. For example, if the i th normal vector N _i has a minimum angle difference from the forward camera vector C _Front , N _i may be set to the z axis. As in this case, other normal vectors can be defined by the x and y axes, and can correct the direction sign of the coordinates. That is, the direction vector of the camera may be determined by Equation 8 below.

Equation 8

Here, C _Side and C _Ground mean a lateral camera vector and a bottom camera vector.

Through this process, as shown in FIG. 11, the rotation axis of the registration coordinate system may be set.

Next, the correction of the SLAM-based initial registration coordinate system will be described.

As described above, in order to align the SLAM-based initial local reference coordinates to the depth camera coordinate system-based local reference coordinate system, the scale must be taken into account, and the size of the virtual model may be arbitrarily determined in the SLAM initialization.

Therefore, in order to convert the SLAM-based coordinate system scale into a unit of scale in a coherent real space, the distance from the origin coordinate of the SLAM-based coordinate system to the RGB camera is calculated. This is the position vector size of the RGB camera pose matrix and may be expressed in virtual scale units.

It then calculates the depth length from the depth camera, which is the value of the depth map and can be expressed in meters scale.

Finally, as shown in Equation 7, the scale ratio λ is calculated, and through this process, the scale unit in reality can be applied to augment the virtual object in the SLAM-based virtual reality space as shown in Equation 8. . Therefore, the present invention does not require manual scale correction, and the scale correction is automatically performed.

Equation 9

Equation 10

That is, as in the example shown in FIG. 12, the scale of the SLAM coordinate system is corrected in consideration of the ratio between the scale of the SLAM coordinate system and the scale in reality.

After calibrating the scale, calculate the position of the SLAM-based initial local reference coordinate system in mm (λP _SLAM ) and shift the offset to transform the SLAM-based coordinate system position into a new position (P _Depth ) obtained from the depth-based geometry analysis. Calculate the matrix T _P. The offset shift matrix may be calculated as shown in Equation 11 below, and the offset shift matrix T _P may be utilized to move the RT _CtoW to the RT _{Refine_trans} , as shown in FIG. 13. It may be represented by Equation 12 below.

Equation 11

Equation 12

Here, RT _CtoW refers to a matrix for converting a camera coordinate system into a virtual space coordinate system in a SLAM-based virtual space, and RT _{Refine_trans} means a corrected local reference coordinate system.

Therefore, the virtual object may be augmented based on the coordinate system aligned on the real space scale.

And, as shown in the example shown in Figure 14, it is possible to perform a rotation correction, for this purpose, as shown in Equation 13, the rotation of the current local coordinate system (R _Curr ) relative to the rotation (R _Init ) of the initial local coordinate system Compute the difference matrix (R _Diff ). The calculated difference matrix R _Diff may be reflected to correct the RT _{Refine_trans} , which may be reflected as in Equation 13 below.

Equation 13

Equation 14

As can be seen from the equation (14), the method of correcting _{Refine_trans} RT, taken from the depth estimation coordinate system in order to correct the rotation by reflecting, geometry multiplies the R ^-1 _Curr to RT _{Refine_trans} to offset the current camera rotation R Multiply _Depth In addition, rotation correction may be performed by multiplying a difference matrix R _Diff to reflect camera rotation tracking information relative to initial camera rotation.

As described above, the present invention uses an RGB-D camera for real-time modeling of an arbitrary space that has not been previously modeled and analyzes a geometric structure, and automatically generates a matching coordinate system based on the actual measurement for wearable augmented reality authoring. This allows the user to easily and precisely augmented reality authoring without additional work on the registration coordinate correction.

In the above, the method for providing augmented reality interaction service according to an embodiment of the present invention has been described.

Hereinafter, an apparatus for providing augmented reality interaction service according to an exemplary embodiment of the present invention will be described with reference to FIGS. 15 to 16.

15 is a block diagram of an apparatus for augmented reality interaction service according to an exemplary embodiment.

Referring to FIG. 15, a registration coordinate system corrector 152, an object separator 154, a controller 156, and an object processor 158 are included.

The registration coordinate system corrector 152 generates reference coordinates based on a 3D image including depth information obtained through a camera.

The object separator 154 may be configured based on the depth information and the color space transformation of a predetermined object from a three-dimensional image including depth information obtained through a camera under the control of the controller 156. Splits the area corresponding to the object.

In this case, the object separating unit 154 converts the RGB color space of the hand image area corresponding to the predetermined object from the RGB image to the HSV color space for the area corresponding to the predetermined object, and converts the converted HSV color. Segmentation is performed based on the skin color space obtained by performing a double threshold on saturation and value in space.

In addition, the object separator 154 sets a distance corresponding to the distance between the hand and the camera from a depth image as a threshold value, and corresponds to a result of depth segmentation and RGB segmentation obtained from each image. Based on the intersection, segmentation of the hands is performed.

The object processor 158 separates a sub object having a motion component from an object of a region divided by the object separator 154 under the control of the controller 156, and is connected to the separated sub object and the sub object. The palm region is modeled based on a predetermined algorithm to detect feature points.

The object processing unit 158 corresponds to a palm area associated with the sub object by using a morphological operation to estimate a posture of a finger corresponding to the sub object from the hand image corresponding to the object. The palm region modeling is performed by separating a palm and a finger.

The controller 156 controls the overall operation of the apparatus for providing augmented reality interaction service 150 and estimates a posture of the sub-object based on skeleton information of an object provided through a predetermined user interface to use the augmented reality service. Control 3D objects for

Meanwhile, as shown in FIG. 16, the matched coordinate system corrector includes a receiver 160, a generator 162, an enhancer 164, an analyzer 166, and a corrector 168.

The receiver 160 receives depth image information from a depth camera or receives or is set or input by a user input.

In this case, the receiver 160 is a depth from a depth camera, for example, an RGB-D (depth) camera, attached to a glasses display device, such as a head worm display (HWD) worn on a user's head. Depth images may be received, and a region of interest (ROI) in a real space set through a user input may be received. Here, the ROI may be set by user input using a mobile input device.

Depth image information according to the present invention is information captured and generated by the depth camera, and may include a photographed image feature, a posture information of the camera, a distance map image based on depth information, and color.

The analyzer 166 analyzes the geometry of the real space or the ROI by using the depth image information received by the receiver 160.

In this case, the analyzer 166 may perform a geometrical analysis for predicting a plane, a tangent, a tangent, an intersection point, and the like, of the ROI received from the depth camera.

The generator 162 generates a matched coordinate system for the real space through the geometric structure analysis of the real space or the ROI analyzed by the analyzer 166.

At this time, the generation unit 162 predicts at least one of the plane, the tangent, the tangent, and the intersection of the ROI through the geometrical analysis of the ROI, and generates the first through the at least one of the predicted plane, the tangent, the tangent, the intersection. You can create a registration coordinate system.

At this time, the generation unit 162 calculates the origin and direction through the geometry analysis of the ROI, and defines any one of the front, side, and bottom of the predicted plane in consideration of the relationship with the pose of the depth camera. The first registration coordinate system may be generated by correcting the direction code calculated to match the predetermined left hand coordinate system of the virtual space.

The corrector 168 measures based on a matched coordinate system generated in advance, for example, a second matched coordinate system to match the virtual space using the matched coordinate system for the real space or the ROI generated by the generator 162. Correct with

In this case, the second registration coordinate system may be a registration coordinate system generated from a SLAM (Simultaneous Localization and Mapping) algorithm, and the correction unit 168 uses the distance ratio calculation of the depth camera generating the distance and depth image information of the SLAM-based camera. The second registration coordinate system can be corrected based on the measured basis.

The augmentation unit 164 is configured to augment the virtual object based on the corrected matching coordinate system, and augment the virtual object to place the augmented virtual object in the space.

In this case, the augmentation unit 164 may arrange the virtual object in the space by using a user input through the mobile input device.

The apparatus according to the present invention acquires depth image information using the RGB-D camera shown in FIG. 9A, and points a place where the user positions the coordinate system using the mobile input device in an interactive manner using a mobile input device. Select clouds. As shown in FIG. 9B, a geometric analysis is performed on a region selected by a user, that is, a region of interest, from a distance map image included in the depth image information to predict a plane, a tangent, a tangent, an intersection point, and the like. Create a registration coordinate system for augmented reality space.

Specifically, when one, two, or three planes are predicted in space, the origin and direction are calculated by predicting intersections, tangents, and the like through a predetermined optimization method. In addition, in consideration of the relationship of the posture of the camera, for example, the front, the side, and the upside, it defines whether the plane is the front, the side, the floor, and corrects the direction code so as to match the left hand coordinate system of the virtual space.

Next, as shown in FIG. 9C, the initial registration coordinate system generated from the Simulaneous Localization and Mapping (SLAM) algorithm, that is, the second registration coordinate system described above, is corrected with the previously calculated registration coordinate system, and then a camera posture is obtained to obtain a virtual image. Augment objects in real space. At this time, in order to correct the virtual scale of the SLAM-based coordinate system to the scale of the real space, the distance ratio of the distance unit based on the depth camera and the distance unit of the SLAM-based camera based on the initial matching coordinate system Calculate

As such, when the distance ratio is applied when augmenting the virtual object, as illustrated in FIG. 9D, the virtual object may be augmented based on the registration coordinate system by reflecting the unit scale of the real space. For example, the user may arrange the virtual object in space using the mobile input device based on the corrected coordinate system.

Of course, this is used in the method according to the invention, and it is obvious that the method according to the invention described later can also be used in the apparatus according to the invention.

As described above, operations related to a method and apparatus for providing augmented reality interaction service according to the present invention can be made. Meanwhile, in the above description of the present invention, a specific embodiment has been described, but various modifications can be made without departing from the scope of the present invention. Can be implemented. Therefore, the scope of the present invention should not be defined by the described embodiments, but by the claims and equivalents of the claims.

Claims

Generating reference coordinates based on a 3D image including depth information obtained through a camera;

Dividing a region corresponding to the preset object based on depth information and color space transformation of the preset object from a three-dimensional image including depth information acquired through the camera;

Separating a sub object having a motion component from an object of the divided region, and detecting a feature point by modeling the separated sub object and a palm region associated with the sub object based on a predetermined algorithm;

And estimating a posture of the sub-object based on joint information of an object provided through a predetermined user interface to control a 3D object for using augmented reality service.
The method of claim 1, wherein the dividing of the area corresponding to the preset object comprises:

Converts the RGB color space of the hand image region corresponding to the predetermined object from the RGB image to the HSV color space, and performs a double threshold on saturation and value in the converted HSV color space. Method for providing augmented reality interaction services, characterized in that performed based on the skin (skin) color space obtained through.
The method of claim 2,

The distance corresponding to the distance between the hand and the camera from the depth image is set as a threshold, and the segmentation of the hand is performed based on the intersection corresponding to the result of the depth segmentation and the RGB segmentation obtained from each image. Method for providing augmented reality interaction services characterized in that.
The method of claim 1, wherein the generating of the reference coordinates comprises:

Analyzing the geometry of the real space using depth image information photographed for the real space;

Generating a first registration coordinate system for the real space by using the analyzed geometric structure;

And calibrating a pre-generated second registration coordinate system on an actual basis to match the virtual space using the generated first registration coordinate system for the real space.
The method of claim 1,

The palm region by separating a palm and a finger corresponding to a palm region associated with the sub-object using a morphological operation to estimate a posture of a finger corresponding to the sub-object from a hand image corresponding to the object. Method for providing augmented reality interaction services, characterized in that for performing the modeling.
The method of claim 5,

The center point of the palm is computed through a distance transform,

Augmented reality interaction, characterized in that by performing the modeling of the finger by the elliptic fitting based on the following equation, the minimum of the distance between the modeled ellipse points and the calculated palm center point as a reference point How we deliver the service.
The method of claim 1, wherein the feature point,

A finger reference point corresponding to the sub object and an end point of a depth information-based hand,

The hand end point is extracted using a template matching from a pre-modeled depth template.
The method of claim 1, wherein the estimating the posture of the sub object comprises:

A target point corresponding to an end position of a hand corresponding to an object obtained from the camera, which is performed through inverse kinematics that estimates parameters of finger joints corresponding to the sub-object based on the generated reference coordinate system and the position of the end point. And estimating the amount of change of each joint using a difference between a current point corresponding to the end position of the hand photographed with respect to the current space.
The method of claim 4, wherein the analyzing of the geometry comprises:

Method for providing an augmented reality interaction service characterized in that for analyzing the geometry of the real space using the depth image information of the real space photographed using a depth camera.
The method of claim 9,

The method may further include receiving a region of interest in the real space set through a user input.

The process of analyzing the geometric structure,

Analyze the geometry of the ROI by using the depth image information;

Generating the first registration coordinate system,

Analyze at least one of a plane, a tangent, a tangent, and an intersection of the ROI by analyzing the geometry of the ROI, and generate the first registered coordinate system through at least one of the predicted plane, the tangent, the tangent, and the intersection. Augmented reality interaction service providing method characterized in that.
The method of claim 10, wherein the generating of the first registration coordinate system comprises:

Calculate the origin and direction through the geometry analysis of the region of interest, and define any one of the front, side, and bottom of the predicted plane in consideration of the relationship with the pose of the depth camera, And generating the first matched coordinate system by correcting the calculated direction code so as to match a predetermined coordinate system direction.
The method of claim 4, wherein the second registration coordinate system,

A registration coordinate system generated from a Simulaneous Localization and Mapping (SLAM) algorithm,

The process of correcting on the basis of the measurement,

And calibrating the second registration coordinate system based on an actual measurement by using a distance ratio of a SLAM-based camera and a distance ratio calculation of a depth camera that generates the depth image information.
A matching coordinate system corrector for generating reference coordinates based on a 3D image including depth information obtained through a camera;

An object separation unit for dividing an area corresponding to the predetermined object based on depth information of a predetermined object and color space transformation from a three-dimensional image including depth information obtained through the camera;

An object processor for separating a sub-object having a motion component from the object of the divided region, and modeling the separated sub-object and a palm region associated with the sub-object based on a predetermined algorithm to detect a feature point;

And a controller configured to control a 3D object for augmented reality service by estimating a posture of the sub-object based on skeletal information of the object provided through a predetermined user interface.
The method of claim 13, wherein the object separation unit,

Converts the RGB color space of the hand image area corresponding to the predetermined object from the RGB image for the area corresponding to the predetermined object to an HSV color space, so that saturation and value in the converted HSV color space. The apparatus for providing augmented reality interaction services according to claim 1, wherein segmentation is performed based on a skin color space obtained by performing a double threshold.
The method of claim 14, wherein in the object separation unit,

The distance corresponding to the distance between the hand and the camera from the depth image is set as a threshold, and the segmentation of the hand is performed based on the intersection corresponding to the result of the depth segmentation and the RGB segmentation obtained from each image. Augmented reality interaction service providing device characterized in that.
The method of claim 13, wherein the registration coordinate system correction unit,

An analysis unit for analyzing a geometry of the real space using depth image information photographed for the real space;

A generator configured to generate a first matched coordinate system for the real space using the analyzed geometry;

And a correction unit configured to correct, based on the measurement, a second registration coordinate system generated in advance in order to match the virtual space using the generated first registration coordinate system with respect to the real space.
The method of claim 13, wherein the object processing unit,

The palm region by separating a palm and a finger corresponding to a palm region associated with the sub-object using a morphological operation to estimate a posture of a finger corresponding to the sub-object from a hand image corresponding to the object. Device for providing augmented reality interaction services, characterized in that for performing the modeling.
The method of claim 13, wherein the control unit,

A target point corresponding to an end position of a hand corresponding to an object obtained from the camera, which is performed through inverse kinematics that estimates parameters of finger joints corresponding to the sub-object based on the generated reference coordinate system and the position of the end point. And estimating an amount of change of each joint by using a difference between a current point corresponding to an end position of a hand photographed with respect to a current space and estimating a posture of the sub-object.