WO2021137349A1 - Environment-based method and system for combining three-dimensional spatial recognition and two-dimensional cognitive region segmentation for real-virtual matching object arrangement - Google Patents

Environment-based method and system for combining three-dimensional spatial recognition and two-dimensional cognitive region segmentation for real-virtual matching object arrangement Download PDF

Info

Publication number
WO2021137349A1
WO2021137349A1 PCT/KR2020/000616 KR2020000616W WO2021137349A1 WO 2021137349 A1 WO2021137349 A1 WO 2021137349A1 KR 2020000616 W KR2020000616 W KR 2020000616W WO 2021137349 A1 WO2021137349 A1 WO 2021137349A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
spatial
information
virtual object
scenario
Prior art date
Application number
PCT/KR2020/000616
Other languages
French (fr)
Korean (ko)
Inventor
한상준
Original Assignee
엔센스코리아주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 엔센스코리아주식회사 filed Critical 엔센스코리아주식회사
Publication of WO2021137349A1 publication Critical patent/WO2021137349A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration

Definitions

  • the present invention is based on a real-virtual matching technology capable of synthesizing virtual objects in real space, and performs cognitive image segmentation using a camera image to match real-virtual objects with a higher presence, and two-dimensionally A method of fusion of the obtained cognitive image segmentation information and three-dimensional spatial information using SLAM, and augmenting objects related to real space information among pre-prepared virtual objects using information obtained through cognitive image segmentation. It relates to a method and system capable of providing a high user experience.
  • augmented reality technology is a technology field derived from virtual reality technology that synthesizes and superimposes virtual objects on real space and shows them. It can increase the presence of a virtual object by creating the illusion that it actually exists in real space.
  • the position value of the target object in the virtual space is calculated from the depth image obtained using the depth camera, and the reference position database and the There is a method of generating an event execution signal by comparison.
  • the present invention is a cognitive domain that tracks the location and direction of a portable terminal based on Simultaneous Localization And Mapping (SLAM) using a camera and an inertial sensor, and infers a real space image obtained from the camera at the pixel level.
  • SLAM Simultaneous Localization And Mapping
  • Augmented reality with a higher presence as a method to cognitively know the components of each space by combining the two-dimensional area image obtained through the segmentation method and the three-dimensional spatial information of real space in a three-dimensional projection method. The purpose of this is to provide interaction for virtual objects according to real space.
  • continuous images are acquired through a camera, and feature points are extracted from the t-1 frame and the t frame using a feature point extraction algorithm such as SIFT, SURF, ORB, respectively, and a descriptor is generated for each.
  • the Euclidean distance is calculated with respect to the descriptor of the feature point to create a feature point pair of the nearest neighbor, and the movement and rotation values of the portable terminal in the three-dimensional space are calculated through the calculated matching information of the t-1 frame and the t frame.
  • a polygon approximating the outermost line of each class of the two-dimensional cognitive region segmentation image was derived, and for each corner point of the polygon, the two-dimensional spatial information map
  • the three-dimensional coordinates were projected into the three-dimensional space coordinates, and at this time, the average of the coordinate groups of at least three three-dimensional feature information in which the closest collision occurred was obtained and converted into surface information in the three-dimensional space of a two-dimensional polygon.
  • the present invention can recognize the shape, size, class, etc. of a spatial object, so that it is possible to select virtual content that can create an interaction between the user and the real space, or to express it according to a pre-planned scenario became
  • the present invention as described above can provide a higher sense of presence by implementing and providing an interaction between real space and virtual objects to the user, and through this, it is also suitable for real space in terms of providing augmented reality contents.
  • Scenarios can be configured or virtual objects can be selectively augmented, making it possible to expand into more diverse businesses.
  • FIG. 2 is a flowchart specifically illustrating an object augmentation processing unit for performing an associated operation obtained from a virtual object DB and a scenario DB by using the spatial object information generated by the spatial recognition processing unit;
  • the spatial map is initialized by estimating the initial position of the spatial map by finding the camera position that best matches the spatial map while moving the camera image in 3D space based on the matching pair of feature points between the previous frame and the current frame.
  • a moving average value is obtained by integrating successive inertial sensor values obtained through the inertial sensor of FIG. 1 to obtain a rotation value and a movement value of the portable terminal.
  • the spatial information processing unit of FIG. 1 compares the respective rotation values and movement values obtained from the image processing unit and the inertial sensor processing unit, and when the difference occurs more than a threshold, the size of the bias value of the inertial sensor and the characteristic point of the image
  • the reliability index is derived by comparing the size of the error value of tracking using an optical tracking algorithm such as Optical flow.
  • the sum of the reliability indices is normalized to be 1, and the normalized reliability index is multiplied by each rotation value and movement value, and then summed to correct it, so that the camera image and the inertial sensor value are combined to form more reliable spatial information and Camera tracking is enabled.
  • the region division inference unit of FIG. 1 is an inference of a pre-learned model capable of pixel-level cognitive region division such as Mask R-CNN, etc., and divides the region by inferring the t frame of the camera image obtained from the image processing unit. run At this time, as a result, a two-dimensional cognitive region segmentation image can be obtained.
  • the region for each class is first separated, and an outline is made by approximating it to a polygon, and Derive the corner coordinates.
  • the spatial object generating unit obtains a line segment orthogonal to the two-dimensional coordinate and the camera coordinate with respect to the corner coordinate, and projects it to the three-dimensional space information, and the Euclidean of the line segment and each feature point coordinate present in the three-dimensional space information.
  • the distance is calculated, and the average of these values is obtained by finding at least three feature points that are found first from the camera and are close to each other among the values that come within the threshold. Using this value, the distance of the two-dimensional coordinates from the camera is obtained in the three-dimensional space, and the distance is then projected onto the three-dimensional coordinates to be converted into three-dimensional coordinates.
  • the direction orthogonal to the direction of gravity is detected using the gyro sensor, the height of each corner point is obtained in the direction of gravity of the earth, and the height of each point is It compares whether it is horizontal within the threshold, and if it exceeds the threshold, removes the corresponding corner coordinates and regenerates the polygon to create a polygon containing at least three or more corners and create it as a spatial object.
  • the following operation is performed to augment the virtual object with respect to the spatial object generated by the spatial recognition processing unit.
  • the spatial object generated by the spatial recognition processing unit is transmitted to the object augmentation processing unit, and the object augmentation processing unit queries the scenario DB for the class of each spatial object.
  • the scenario DB has the class of spatial object, the list of virtual objects that require interaction, the operation type of the virtual object, the ID of the virtual object, the size of the virtual object, and the relative value of the spatial coordinates of the virtual object.
  • the object augmentation processing unit acquires at least one or more scenario information from the scenario DB, queries the ID of the virtual object among the information to the virtual object DB to acquire the contents or physical file of the virtual object, and 3D to convert the size of the object to fit the size, and to augment the virtual object by adding or subtracting the spatial coordinate relative value of the virtual object of the scenario DB based on the user-specified location using the touch screen of FIG. 2 or the central coordinate of the spatial object Modify spatial coordinates.
  • the virtual object created through the above process can implement the present invention by realizing augmented reality by synthesizing the t-frame image and the virtual object obtained from the camera and outputting them on the display.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Architecture (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

According to the present invention, a location and direction of a portable terminal are tracked by using simultaneous localization and mapping (SLAM) that is a spatial recognition technology using a camera and an inertial sensor, and three-dimensional spatial information of a real space is generated, wherein a spatial object is generated by using an image obtained from the camera and cognitive region segmentation, and is projected onto a three-dimensional space, so that a floor and a table can be distinguished from each other, which is a limitation of plane recognition in SLAM and is impossible in the prior art. Therefore, virtual content that allows a user to interact with an object in a real space according to a pre-planned scenario DB can be provided, thereby providing a high sense of presence and interaction.

Description

[규칙 제26조에 의한 보정 23.03.2020] 환경 기반 실-가상 정합 사물 배치를 위한 3차원 공간인식과 2차원 인지적 영역분할의 결합 방법 및 시스템 [Correction 23.03.2020 according to Rule 26]   Method and system for combining 3D spatial recognition and 2D cognitive domain division for environment-based real-virtual matching object placement
본 발명은 실제의 공간 상에 가상의 객체를 합성할 수 있는 실-가상 정합 기술을 기반하여, 보다 현존감 높은 실-가상 사물 정합을 위하여 카메라 영상을 이용한 인지적 영상 분할을 하고, 2차원으로 얻은 인지적 영상 분할 정보와, SLAM을 이용한 3차원의 공간 정보를 융합하는 방법과, 인지적 영상 분할로 얻은 정보를 이용하여 미리 준비된 가상의 객체 중 실공간의 정보와 연관 지어진 객체를 증강하여 보다 높은 사용자 경험을 제공할 수 있는 방법 및 시스템에 관한 것이다.The present invention is based on a real-virtual matching technology capable of synthesizing virtual objects in real space, and performs cognitive image segmentation using a camera image to match real-virtual objects with a higher presence, and two-dimensionally A method of fusion of the obtained cognitive image segmentation information and three-dimensional spatial information using SLAM, and augmenting objects related to real space information among pre-prepared virtual objects using information obtained through cognitive image segmentation. It relates to a method and system capable of providing a high user experience.
일반적으로 증강현실(Augmented Reality) 기술이란, 현실 공간 위에 가상의 물체를 합성하여 중첩하고 이를 보여주는 가상현실(Virtual Reality) 기술에서 파생된 기술 분야로써, 가상현실 기술보다 가상의 물체를 보여주는데 있어서 사용자로 하여금 가상의 물체가 실제로 현실 공간에 존재하는 것처럼 착각을 일으켜 현존감(Presence)을 높여줄 수 있다.In general, augmented reality technology is a technology field derived from virtual reality technology that synthesizes and superimposes virtual objects on real space and shows them. It can increase the presence of a virtual object by creating the illusion that it actually exists in real space.
첫번째 일례로서 뎁스 카메라를 이용하여 획득한 뎁스 영상으로부터 3D 포인트 클라우드 맵을 생성하고 3D 포인트 클라우드 맵으로 증강 콘텐츠가 투영될 실 공간의 객체를 추적하여, 프로젝터와 같은 디스플레이 장치를 이용해 실제 공간 상에 가상의 물체를 투영하여 직접 중첩시키고 사용자와의 인터랙션을 할 수 있는 방법이 있다.As a first example, a 3D point cloud map is generated from a depth image obtained using a depth camera, an object in a real space on which augmented content is to be projected with the 3D point cloud map is tracked, and virtual on the real space using a display device such as a projector. There is a way to directly overlap objects by projecting them and to interact with the user.
두번째 일례로서는 3차원의 현실 공간에서 사용자의 동작을 인식하여 이벤트를 발생시키기 위한 방법으로, 뎁스 카메라를 이용하여 획득한 뎁스 영상으로부터, 가상 공간 상에서의 대상 객체의 위치값을 연산하여 기준위치 데이터베이스와 비교하여 이벤트 실행 신호를 방생하는 방법이 있다.As a second example, as a method for generating an event by recognizing a user's motion in a three-dimensional real space, the position value of the target object in the virtual space is calculated from the depth image obtained using the depth camera, and the reference position database and the There is a method of generating an event execution signal by comparison.
하지만 상기의 일례들의 방법에서는 카메라 또는 센서를 통해 얻은 3차원 공간정보만 가지고 있으므로, 사용자가 어느 공간에 있는지 또는 앞의 사물이 탁자인지 바닥인지 등과 같은 인지적 정보를 얻을 수 없는 한계가 있다.However, since the methods of the above examples have only 3D spatial information obtained through a camera or a sensor, there is a limit in that it is impossible to obtain cognitive information such as in which space the user is or whether the object in front is a table or a floor.
또한 사용자는 실제의 공간에서 가상의 객체를 증강하기 위하여, 일반적으로는 사용자가 선택한 객체를 공간에 배치하고, 사물의 크기를 조절하거나 이동하는 등 사용자의 의도에 의하여만 증강현실을 구현하는 한계가 있다.In addition, in order for the user to augment the virtual object in the real space, there is a limitation in implementing augmented reality only by the user's intention, such as placing the object selected by the user in the space, adjusting the size of the object, or moving the object. have.
위와 같은 한계는 증강현실을 구현함에 있어 콘텐츠를 제작하거나 시나리오를 제작하는데 실공간과의 인터랙션이 무시되거나, 임의의 객체를 사전에 짜여진 각본에 의해서만 표출할 수 있으므로 현존감 높은 사용자 경험을 제공하는데 상당한 문제점이 있다.The above limitations are significant in providing a user experience with high presence because interaction with real space is ignored in creating content or scenarios in realizing augmented reality, or arbitrary objects can be expressed only by a pre-planned script. There is a problem.
본 발명은, 카메라와 관성 센서를 사용한 슬램(Simultaneous Localization And Mapping, SLAM)을 기반으로 하여 휴대용 단말의 위치와 방향을 추적하고, 이때 카메라로부터 얻은 실공간의 이미지를 픽셀레벨로 추론하는 인지적 영역분할 방법을 통해 얻은 2차원의 영역 이미지와 실공간의 3차원 공간정보를 3차원 투영의 방법으로 결합함으로써, 각 공간의 구성 요소를 인지적으로 알 수 있도록 하는 방법으로, 보다 현존감 높은 증강현실을 구현하며, 실공간에 따른 가상의 객체에 대해 인터랙션을 제공하는데 그 목적이 있다.The present invention is a cognitive domain that tracks the location and direction of a portable terminal based on Simultaneous Localization And Mapping (SLAM) using a camera and an inertial sensor, and infers a real space image obtained from the camera at the pixel level. Augmented reality with a higher presence as a method to cognitively know the components of each space by combining the two-dimensional area image obtained through the segmentation method and the three-dimensional spatial information of real space in a three-dimensional projection method. The purpose of this is to provide interaction for virtual objects according to real space.
본 발명을 해결하기 위하여, 카메라를 통한 연속 이미지를 취득하여 t-1 프레임과 t 프레임에서 각각 SIFT, SURF, ORB 등의 특징점 추출 알고리즘을 이용하여 특징점을 추출하고 각각에 대하여 설명자를 생성한다. 특징점의 설명자에 대하여 유클리디안 거리를 계산하여 최근접 이웃의 특징점 쌍을 만들고, 이렇게 계산된 t-1 프레임과 t 프레임의 정합 정보를 통하여 3차원 공간에서의 휴대용 단말의 이동값과 회전값을 얻는다. 그리고, t-1 프레임의 취득시 부터 t 프레임의 취득시 까지의 관성 센서 값을 누산하여 휴대용 단말의 이동값과 회전값을 취득하고, 이 두 이동값과 회전값의 상호 보정을 위하여 신뢰점수를 부여하여 이동값과 회전값에 곱하여 두 값의 평균을 구하여, 3차원 공간 정보를 구성하도록 하였다.In order to solve the present invention, continuous images are acquired through a camera, and feature points are extracted from the t-1 frame and the t frame using a feature point extraction algorithm such as SIFT, SURF, ORB, respectively, and a descriptor is generated for each. The Euclidean distance is calculated with respect to the descriptor of the feature point to create a feature point pair of the nearest neighbor, and the movement and rotation values of the portable terminal in the three-dimensional space are calculated through the calculated matching information of the t-1 frame and the t frame. get And, by accumulating the inertial sensor values from the acquisition of the t-1 frame to the acquisition of the t frame, the movement value and rotation value of the portable terminal are acquired, and a confidence score is calculated for mutual correction of these two movement values and rotation values. It was given and multiplied by the movement value and rotation value to obtain the average of the two values to compose three-dimensional spatial information.
또 이 때 취득한 t 프레임의 이미지를 사전에 학습한 인공신경망을 이용하여 추론하도록 하여 픽셀레벨의 인지적 영역 분할을 수행하도록 하였다. 특히 이 때 Mask R-CNN 등과 같은 인공신경망 모델을 이용하여 영상내의 바닥, 벽, 테이블, 사람, 컵 등을 구별할 수 있도록 학습하였고, 이를 통해 2차원으로 이루어진 인지적 영역 분할 이미지를 구할 수 있었다.In addition, the image of the t-frame acquired at this time was inferred using an artificial neural network learned in advance to perform pixel-level cognitive domain segmentation. In particular, at this time, using an artificial neural network model such as Mask R-CNN, it was learned to distinguish the floor, wall, table, person, and cup in the image, and through this, a two-dimensional cognitive domain segmentation image was obtained. .
본 발명에서는 이 두 결과를 결합하기 위하여, 2차원 인지적 영역 분할 이미지의 각 클래스의 최외각선을 근사화한 다각형을 도출하였고, 이 다각형의 각 코너점에 대하여 3차원 공간정보의 맵을 향하여 2차원 좌표의 3차원 공간 좌표로 투영하였고, 이 때 가장 가까이 충돌이 일어나는 적어도 3개 이상의 3차원 특징정보의 좌표군의 평균을 구하여 2차원 다각형의 3차원 공간상의 면 정보로 변환토록 하였다.In the present invention, in order to combine these two results, a polygon approximating the outermost line of each class of the two-dimensional cognitive region segmentation image was derived, and for each corner point of the polygon, the two-dimensional spatial information map The three-dimensional coordinates were projected into the three-dimensional space coordinates, and at this time, the average of the coordinate groups of at least three three-dimensional feature information in which the closest collision occurred was obtained and converted into surface information in the three-dimensional space of a two-dimensional polygon.
상기의 방법을 통하여 본 발명에서는, 종래의 발명으로는 불가능하였던 평면 검출 방법의 한계를 벗어나, 평면의 클래스를 바닥, 벽, 테이블 등으로 의미를 부여하여 공간 객체로서의 3차원으로 인식하는 것이 가능하게 되었다.Through the above method, in the present invention, it is possible to overcome the limitations of the plane detection method, which was impossible with the conventional invention, and to give meaning to the class of the plane as a floor, a wall, a table, etc. to recognize it as a three-dimensional space object. became
상기의 방법으로 본 발명은 공간 객체의 모양, 크기, 클래스 등을 인지할 수 있게되어 사용자와 실공간 사이의 인터랙션을 만들 수 있는 가상의 콘텐츠를 선택하거나, 사전에 짜여진 시나리오 대로 표현하는 것이 가능하게 되었다.In the above method, the present invention can recognize the shape, size, class, etc. of a spatial object, so that it is possible to select virtual content that can create an interaction between the user and the real space, or to express it according to a pre-planned scenario became
상기와 같은 본 발명은 사용자에게 실공간과 가상의 객체가 상호 작용을 하는 인터랙션을 구현하여 제공함으로써 보다 높은 현존감을 제공할 수 있게 되며, 또한 이를 통하여 증강현실 콘텐츠를 제공하는 측면에서 실공간에 맞는 시나리오를 구성하거나 가상의 객체를 선별적으로 증강할 수 있게 되어, 보다 다양한 비즈니스로의 전개가 가능하게 되었다.The present invention as described above can provide a higher sense of presence by implementing and providing an interaction between real space and virtual objects to the user, and through this, it is also suitable for real space in terms of providing augmented reality contents. Scenarios can be configured or virtual objects can be selectively augmented, making it possible to expand into more diverse businesses.
도 1은 카메라와 관성센서를 결합한 공간정보를 생성하고, 인지적 영역분할을 추론하고, 이 둘을 결합하여 공간객체를 생성하는 방법과 시스템을 나타낸 공간인식 처리부를 구체적으로 나타낸 흐름도1 is a flowchart specifically showing a spatial recognition processing unit showing a method and system for generating spatial information combining a camera and an inertial sensor, inferring cognitive domain division, and combining the two to create a spatial object
도 2는 공간인식 처리부로 부터 생성한 공간객체의 정보를 이용하여, 가상 객체 DB와 시나리오 DB로부터 얻은 연관 동작을 수행하기 위한 객체 증강 처리부를 구체적으로 나타낸 흐름도2 is a flowchart specifically illustrating an object augmentation processing unit for performing an associated operation obtained from a virtual object DB and a scenario DB by using the spatial object information generated by the spatial recognition processing unit;
본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다.The present invention will be described in detail with reference to the accompanying drawings as follows.
도 1은 카메라와 관성 센서를 가지는 휴대용 단말을 이용하여, 카메라를 통해 취득한 연속된 이미지 t-1 프레임과 t 프레임에서 SIFT, SURF, ORB와 같은 알고리즘을 이용하여 특징점을 추출하고, 현재 상태가 공간맵이 초기화된 상태인지 판단하여 만약 공간맵이 초기화되어있지 않다면, 상기 획득한 이미지로부터 추출된 특징점을 이용하여 카메라 자세 및 원점을 추정한다. 이 때 카메라 자세 및 원점의 추정을 위해서는 연속된 두 이미지가 필요하며, 각 이미지에서 추출한 특징점간의 유사도를 비교하기 위하여 유클리디언 거리를 구하여 이 거리가 가장 짧은 두 특징점을 서로 정합되는 특징점 쌍을 얻을 수 있다. 또한 상기 정합된 특징점 쌍의 기하학적 관계를 이용하여 t-1 프레임으로부터 t 프레임의 상대적 위치를 구할 수 있다. 이 단계에서는, 이전 프레임과 현재 프레임 사이의 특징점 정합 쌍을 기반으로 카메라 이미지를 3차원 공간상에서 움직이면서 공간맵과 제일 잘 맞는 카메라 위치를 찾아 공간맵의 초기 위치를 추정하여 공간맵을 초기화 한다.1 is a portable terminal having a camera and an inertial sensor, and extracting feature points using algorithms such as SIFT, SURF, ORB from consecutive images t-1 frames and t frames acquired through the camera, and the current state is It is determined whether the map is in an initialized state, and if the spatial map is not initialized, the camera posture and origin are estimated using the feature points extracted from the acquired image. At this time, two consecutive images are required to estimate the camera posture and origin. To compare the similarity between the feature points extracted from each image, the Euclidean distance is obtained to obtain a feature point pair matching the two feature points with the shortest distance. can In addition, the relative position of the t frame may be obtained from the t-1 frame using the geometric relationship of the matched feature point pair. In this step, the spatial map is initialized by estimating the initial position of the spatial map by finding the camera position that best matches the spatial map while moving the camera image in 3D space based on the matching pair of feature points between the previous frame and the current frame.
다음으로 도 1의 관성센서를 통해 취득한 연속된 관성센서값을 적분하여 이동평균 값을 구하여, 휴대용 단말의 회전값과 이동값을 얻는다.Next, a moving average value is obtained by integrating successive inertial sensor values obtained through the inertial sensor of FIG. 1 to obtain a rotation value and a movement value of the portable terminal.
다음으로 도 1의 공간정보 처리부는 상기의 이미지 처리부와 관성센서 처리부로부터 얻은 각각의 회전값과 이동값을 비교하고, 차이가 임계치 이상 발생하였을 경우는, 관성센서의 바이어스 값의 크기와 이미지의 특징점의 Optical flow 등의 광학 추적 알고리즘을 이용한 추적의 오차 값의 크기를 비교하여 신뢰도 지수를 도출한다. 이 때 신뢰도 지수의 합은 1이 되도록 정규화하고, 정규화한 신뢰도 지수를각각의 회전값과 이동값에 곱한 후 합산하여 보정하도록 하여, 카메라 이미지와 관성 센서 값이 결합된 보다 신뢰성 높은 공간 정보 구성 및 카메라 추적이 가능하도록 하였다.Next, the spatial information processing unit of FIG. 1 compares the respective rotation values and movement values obtained from the image processing unit and the inertial sensor processing unit, and when the difference occurs more than a threshold, the size of the bias value of the inertial sensor and the characteristic point of the image The reliability index is derived by comparing the size of the error value of tracking using an optical tracking algorithm such as Optical flow. At this time, the sum of the reliability indices is normalized to be 1, and the normalized reliability index is multiplied by each rotation value and movement value, and then summed to correct it, so that the camera image and the inertial sensor value are combined to form more reliable spatial information and Camera tracking is enabled.
다음으로 도 1의 영역분할 추론부는, Mask R-CNN등과 같은 픽셀 레벨의 인지적 영역 분할이 가능한, 사전에 학습된 모델의 추론으로, 상기 이미지 처리부에서 취득한 카메라 이미지의 t 프레임을 추론하여 영역 분할을 실행한다. 이 때 결과로서 2차원의 인지적 영역 분할 이미지를 얻을 수 있는데, 이를 도 1의 공간객체 생성부에서, 먼저 각각의 클래스에 대한 영역을 분리하고, 이를 다각형으로 근사화하여 외각선을 만들고, 다각형의 모서리 좌표를 도출한다. 공간객체 생성부는 이 모서리 좌표에 대하여, 2차원 좌표와 카메라 좌표를 직교하는 선분을 구하고, 이를 3차원 공간 정보에 투영하여, 이 선분과 3차원 공간 정보에 존재하는 각각의 특징점 좌표의 유클리디안 거리를 구하여, 임계치 이내에 들어오는 값 중 카메라로부터 가장 먼저 발견되고 거리가 가까운 적어도 3점 이상의 특징점을 구하여 이 값의 평균을 구한다. 이 값을 이용하여 3차원 공간에서 상기 2차원 좌표가 카메라로부터 떨어진 거리를 구하여 3차원에 투영함으로써 3차원 좌표로 변환한다.Next, the region division inference unit of FIG. 1 is an inference of a pre-learned model capable of pixel-level cognitive region division such as Mask R-CNN, etc., and divides the region by inferring the t frame of the camera image obtained from the image processing unit. run At this time, as a result, a two-dimensional cognitive region segmentation image can be obtained. In the spatial object generator of FIG. 1, the region for each class is first separated, and an outline is made by approximating it to a polygon, and Derive the corner coordinates. The spatial object generating unit obtains a line segment orthogonal to the two-dimensional coordinate and the camera coordinate with respect to the corner coordinate, and projects it to the three-dimensional space information, and the Euclidean of the line segment and each feature point coordinate present in the three-dimensional space information. The distance is calculated, and the average of these values is obtained by finding at least three feature points that are found first from the camera and are close to each other among the values that come within the threshold. Using this value, the distance of the two-dimensional coordinates from the camera is obtained in the three-dimensional space, and the distance is then projected onto the three-dimensional coordinates to be converted into three-dimensional coordinates.
상기 기술한 방법으로 모든 모서리 점들에 대하여 3차원 좌표로 변환한 후, 자이로 센서를 이용한 중력 방향과 직교하는 방향을 검출하고, 각 모서리 점들이 지구의 중력 방향으로의 높이를 구하고, 각 점들의 높이가 임계치 이내에서 수평을 이루는 지 비교하여, 임계치를 벗어나는 경우 해당하는 모서리 좌표를 제거하여 다각형을 다시 재생성 하여 적어도 3개 이상의 모서리를 포함하는 다각형을 만들어 이를 공간객체로 생성한다.After converting all corner points into three-dimensional coordinates by the above-described method, the direction orthogonal to the direction of gravity is detected using the gyro sensor, the height of each corner point is obtained in the direction of gravity of the earth, and the height of each point is It compares whether it is horizontal within the threshold, and if it exceeds the threshold, removes the corresponding corner coordinates and regenerates the polygon to create a polygon containing at least three or more corners and create it as a spatial object.
다음으로 본 발명에서는 도 2와 같이 상기 공간인식 처리부에서 생성한 공간 객체에 대하여 가상 객체를 증강하기 위하여 다음과 같은 동작을 수행한다.Next, in the present invention, as shown in FIG. 2 , the following operation is performed to augment the virtual object with respect to the spatial object generated by the spatial recognition processing unit.
먼저 공간인식 처리부에서 생성한 공간 객체는 객체 증강 처리부로 전달되어, 객체 증강 처리부에서는 각각의 공간 객체의 클래스에 대하여 시나리오 DB에 질의한다. 이 때, 현재 증강하고 있는 가상 객체가 있다면 이를 같이 전달하여 순차적 가상 객체의 선택이 가능하도록 한다. 시나리오 DB에는 공간 객체의 클래스, 상호 작용이 필요한 가상 객체의 나열, 가상 객체의 동작 타입, 가상 객체의 ID, 가상 객체의 크기, 가상 객체의 공간 좌표 상대값을 가진다. 객체 증강 처리부는 시나리오 DB로부터 적어도 1개 이상의 시나리오 정보를 취득하고, 상기 정보 중 가상 객체의 ID를 가상 객체 DB로 질의하여 가상 객체의 콘텐츠 또는 물리 파일을 취득하고, 시나리오 DB로 부터 취득한 가상 객체의 크기에 맞도록 객체의 크기를 변환하고, 도 2의 터치스크린을 이용한 사용자 지정 위치 또는 공간 객체의 중앙 좌표를 기준으로 시나리오 DB의 가상 객체의 공간 좌표 상대값 가감하여 가상 객체를 증강하고자 하는 3차원 공간 좌표를 수정한다.First, the spatial object generated by the spatial recognition processing unit is transmitted to the object augmentation processing unit, and the object augmentation processing unit queries the scenario DB for the class of each spatial object. At this time, if there is a virtual object currently being augmented, it is transmitted together to enable sequential selection of virtual objects. The scenario DB has the class of spatial object, the list of virtual objects that require interaction, the operation type of the virtual object, the ID of the virtual object, the size of the virtual object, and the relative value of the spatial coordinates of the virtual object. The object augmentation processing unit acquires at least one or more scenario information from the scenario DB, queries the ID of the virtual object among the information to the virtual object DB to acquire the contents or physical file of the virtual object, and 3D to convert the size of the object to fit the size, and to augment the virtual object by adding or subtracting the spatial coordinate relative value of the virtual object of the scenario DB based on the user-specified location using the touch screen of FIG. 2 or the central coordinate of the spatial object Modify spatial coordinates.
상기의 과정을 통해 생성한 가상 객체는 카메라로 부터 얻은 t 프레임의 이미지와 가상 객체를 합성하여 디스플레이에 출력함으로써 증강현실을 구현하여 본 발명을 실시 할 수 있다.The virtual object created through the above process can implement the present invention by realizing augmented reality by synthesizing the t-frame image and the virtual object obtained from the camera and outputting them on the display.

Claims (5)

  1. 카메라와 관성 센서를 탑재한 휴대형 단말에서, 3차원 공간을 인식하여 공간 정보를 구성하고자 하는 연속된 컬러 영상을 취득하는 단계, 연속된 관성 센서의 정보를 취득하여 기록하는 단계, 연속된 컬러 영상으로부터 특징을 추출하여 정합하는 단계, 연속된 관성 센서의 정보를 이용하여 정합 정보를 보정하는 단계를 가지고, 카메라로부터 취득한 컬러 영상을 영역분할 추론을 수행하는 단계를 가지고, 분할된 2차원 영역을 3차원 공간 객체로 변환하여 공간객체를 생성하는 방법 및 시스템In a portable terminal equipped with a camera and an inertial sensor, a step of recognizing a three-dimensional space and acquiring a continuous color image to compose spatial information, a step of acquiring and recording information of a continuous inertial sensor, from the continuous color image A step of extracting and matching features, a step of correcting the matching information using information from a continuous inertial sensor, and a step of performing area division inference on the color image obtained from the camera, and converting the divided 2D area into 3D Method and system for creating spatial objects by converting them into spatial objects
  2. 상기 청구항 1에서 영역분할 추론을 수행한 분할된 2차원 영역을 근사화 한 다각형을 구하는 단계, 다각형의 모서리를 3차원 공간 맵에 투영하여 최근접 3차원 좌표와 거리를 구하는 단계, 자이로 센서의 값을 통해 얻은 지구의 중력 방향과 직교하는 모서리의 3차원 공간에서의 높이를 서로 비교하여 임계치 이상 거리가 먼 모서리를 제외하여 다각형을 재생성 하는 단계,적어도 3점 이상의 모서리를 가지는 다각형을 구하여 공간객체를 생성하는 방법Obtaining a polygon approximating the divided two-dimensional region for which region division inference has been performed in claim 1, projecting the corner of the polygon to a three-dimensional space map to obtain the nearest three-dimensional coordinates and distance, the value of the gyro sensor Comparing the heights in three-dimensional space of the corners orthogonal to the Earth's gravitational direction obtained through Way
  3. 상기 청구항 2에서 생성한 공간객체의 클래스를 시나리오 DB에 질의하여 객체 정보를 취득하는 단계, 취득한 객체 정보를 이용하여 가상 객체 DB로 부터 객체를 불러오는 단계, 불러온 가상 객체의 크기를 시나리오 DB의 객체 크기로 결정하고, 불러온 가상 객체의 위치를 터치스크린의 터치 좌표로부터 시나리오 DB의 객체 위치 보정값을 가감하여 3차원 좌표를 계산하는 방법Obtaining object information by querying the scenario DB for the class of the spatial object created in claim 2, retrieving an object from the virtual object DB using the acquired object information, and determining the size of the fetched virtual object as an object of the scenario DB A method of calculating three-dimensional coordinates by determining the size of the virtual object and adding or subtracting the location correction value of the scenario DB from the touch coordinates of the touch screen to the location of the called virtual object.
  4. 상기 청구항 2에서 생성한 공간객체의 클래스를 시나리오 DB에 질의하여 객체 정보를 취득하는 단계, 취득한 객체 정보를 이용하여 가상 객체 DB로 부터 객체를 불러오는 단계, 불러온 가상 객체의 크기를 시나리오 DB의 객체 크기로 결정하고, 불러온 가상 객체의 위치를 상기 청구항 2에서 생성한 공간객체의 중앙값으로부터 시나리오 DB의 객체 위치 보정값을 가감하여 3차원 좌표를 계산하는 방법Obtaining object information by querying the scenario DB for the class of the spatial object created in claim 2, retrieving an object from the virtual object DB using the acquired object information, and determining the size of the fetched virtual object as an object of the scenario DB A method of calculating three-dimensional coordinates by determining the size of a virtual object and adding or subtracting an object position correction value of the scenario DB from the median value of the spatial object created in claim 2 for the location of the called virtual object
  5. 상기 청구항 3 내지 4에서 가상 객체를 결정하고 크기와 위치를 계산한 값을 이용하여 객체를 변형하고, 상기 청구항 1에서 취득한 컬러 영상 위에 가상의 객체를 합성하고, 합성한 이미지를 디스플레이에 표시하는 방법 및 시스템A method of determining a virtual object in the above claims 3 to 4, transforming the object using the calculated size and position, synthesizing the virtual object on the color image obtained in claim 1, and displaying the synthesized image on a display and system
PCT/KR2020/000616 2019-12-31 2020-01-13 Environment-based method and system for combining three-dimensional spatial recognition and two-dimensional cognitive region segmentation for real-virtual matching object arrangement WO2021137349A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR20190180137 2019-12-31
KR10-2019-0180137 2019-12-31

Publications (1)

Publication Number Publication Date
WO2021137349A1 true WO2021137349A1 (en) 2021-07-08

Family

ID=76686595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/000616 WO2021137349A1 (en) 2019-12-31 2020-01-13 Environment-based method and system for combining three-dimensional spatial recognition and two-dimensional cognitive region segmentation for real-virtual matching object arrangement

Country Status (1)

Country Link
WO (1) WO2021137349A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130136569A (en) * 2011-03-29 2013-12-12 퀄컴 인코포레이티드 System for the rendering of shared digital interfaces relative to each user's point of view
KR20160048874A (en) * 2013-08-30 2016-05-04 퀄컴 인코포레이티드 Method and apparatus for representing physical scene
US20160189432A1 (en) * 2010-11-18 2016-06-30 Microsoft Technology Licensing, Llc Automatic focus improvement for augmented reality displays
US20180045963A1 (en) * 2016-08-11 2018-02-15 Magic Leap, Inc. Automatic placement of a virtual object in a three-dimensional space
KR101989969B1 (en) * 2018-10-11 2019-06-19 대한민국 Contents experience system of architectural sites based augmented reality

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189432A1 (en) * 2010-11-18 2016-06-30 Microsoft Technology Licensing, Llc Automatic focus improvement for augmented reality displays
KR20130136569A (en) * 2011-03-29 2013-12-12 퀄컴 인코포레이티드 System for the rendering of shared digital interfaces relative to each user's point of view
KR20160048874A (en) * 2013-08-30 2016-05-04 퀄컴 인코포레이티드 Method and apparatus for representing physical scene
US20180045963A1 (en) * 2016-08-11 2018-02-15 Magic Leap, Inc. Automatic placement of a virtual object in a three-dimensional space
KR101989969B1 (en) * 2018-10-11 2019-06-19 대한민국 Contents experience system of architectural sites based augmented reality

Similar Documents

Publication Publication Date Title
KR102317247B1 (en) The bare hand interaction apparatus and method for augmented rearity using rgb-d images
US11462028B2 (en) Information processing device and information processing method to generate a virtual object image based on change in state of object in real space
US20230071839A1 (en) Visual-Inertial Positional Awareness for Autonomous and Non-Autonomous Tracking
CN108406731B (en) Positioning device, method and robot based on depth vision
US9928656B2 (en) Markerless multi-user, multi-object augmented reality on mobile devices
US9058661B2 (en) Method for the real-time-capable, computer-assisted analysis of an image sequence containing a variable pose
US8842162B2 (en) Method and system for improving surveillance of PTZ cameras
CN106896925A (en) The device that a kind of virtual reality is merged with real scene
CN106997618A (en) A kind of method that virtual reality is merged with real scene
US9767611B2 (en) Information processing apparatus and method for estimating depth values using an approximate plane
JP2018522348A (en) Method and system for estimating the three-dimensional posture of a sensor
US10636190B2 (en) Methods and systems for exploiting per-pixel motion conflicts to extract primary and secondary motions in augmented reality systems
KR20160098560A (en) Apparatus and methdo for analayzing motion
US11727637B2 (en) Method for generating 3D skeleton using joint-based calibration acquired from multi-view camera
CN110941996A (en) Target and track augmented reality method and system based on generation of countermeasure network
CN107016730A (en) The device that a kind of virtual reality is merged with real scene
CN106981100A (en) The device that a kind of virtual reality is merged with real scene
US20200211275A1 (en) Information processing device, information processing method, and recording medium
Shinmura et al. Estimation of Human Orientation using Coaxial RGB-Depth Images.
KR101350387B1 (en) Method for detecting hand using depth information and apparatus thereof
Li et al. A hybrid pose tracking approach for handheld augmented reality
WO2021137349A1 (en) Environment-based method and system for combining three-dimensional spatial recognition and two-dimensional cognitive region segmentation for real-virtual matching object arrangement
KR102299902B1 (en) Apparatus for providing augmented reality and method therefor
JP2017182564A (en) Positioning device, positioning method, and positioning computer program
KR20210048798A (en) Method for determining pose of camera provided in user equipment and location calculation server performing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20910650

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20910650

Country of ref document: EP

Kind code of ref document: A1