KR20210091033A

KR20210091033A - Electronic device for estimating object information and generating virtual object and method for operating the same

Info

Publication number: KR20210091033A
Application number: KR1020200165002A
Authority: KR
Inventors: 차오 장; 웨이밍 리; 치앙 왕; 홍성훈; 김우식
Original assignee: 삼성전자주식회사
Priority date: 2020-01-13
Filing date: 2020-11-30
Publication date: 2021-07-21
Also published as: CN113191462A

Abstract

Disclosed are an electronic device for estimating object information and generating a virtual object and an operating method of the electronic device. The disclosed operating method of the electronic device includes the steps of: acquiring an image; acquiring class characteristics, posture characteristics, and relational characteristics of objects included in the image; correcting the class characteristics, posture characteristics, and relational characteristics, respectively, by using the class characteristics, posture characteristics, and relational characteristics of the object; and acquiring class information, posture information, and relationship information of the object based on the corrected class characteristics, the corrected posture characteristics, and the corrected relational characteristics, respectively.

Description

An electronic device for estimating object information and generating a virtual object, and an operating method of the electronic device {ELECTRONIC DEVICE FOR ESTIMATING OBJECT INFORMATION AND GENERATING VIRTUAL OBJECT AND METHOD FOR OPERATING THE SAME}

아래 실시예들은 객체 정보 추정과 가상 객체 생성을 위한 전자 장치 및 전자 장치의 동작 방법에 관한 것이다.The following embodiments relate to an electronic device for estimating object information and generating a virtual object and an operating method of the electronic device.

객체 검출(object detection)은 입력된 이미지에서 다양한 객체를 인식하는 기술일 수 있다. 객체 인식의 정확도를 향상시키 위한 다양한 노력의 일환으로 이미지의 전체 정보를 이용하여 이미지에 포함된 객체를 검출하려는 연구가 진행되고 있다.Object detection may be a technique for recognizing various objects in an input image. As part of various efforts to improve the accuracy of object recognition, research is being conducted to detect an object included in an image by using the entire information of the image.

일 실시예에 따른 전자 장치의 동작 방법은 이미지를 획득하는 단계; 상기 이미지에 포함된 객체의 클래스 특징, 자세 특징 및 관계 특징을 획득하는 단계; 상기 객체의 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징을 이용하여, 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징을 각각 보정하는 단계; 및 상기 보정된 클래스 특징, 상기 보정된 자세 특징 및 상기 보정된 관계 특징에 각각 기반하여 상기 객체의 클래스 정보, 자세 정보 및 관계 정보를 획득하는 단계를 포함한다.According to an embodiment, a method of operating an electronic device includes: acquiring an image; acquiring class characteristics, posture characteristics, and relational characteristics of objects included in the image; correcting the class feature, the posture feature, and the relationship feature, respectively, by using the class feature, the posture feature, and the relation feature of the object; and obtaining class information, posture information, and relationship information of the object based on the corrected class feature, the corrected posture feature, and the corrected relationship feature, respectively.

일 실시예에 따른 전자 장치의 동작 방법에서 상기 보정하는 단계는 상기 객체의 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징에 미리 정해진 가중치를 적용함으로써, 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징 중 어느 하나를 보정할 수 있다.In the method of operating an electronic device according to an embodiment, the correcting may include applying a predetermined weight to the class feature, the posture feature, and the relation feature of the object, so as to select one of the class feature, the posture feature, and the relation feature. Either one can be corrected.

일 실시예에 따른 전자 장치의 동작 방법에서 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징을 상기 획득하는 단계는 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징 각각을 대응하는 서브 네트워크의 중간 레이어로부터 획득할 수 있다.In the method of operating an electronic device according to an embodiment, the obtaining of the class feature, the posture feature, and the relationship feature may include acquiring each of the class feature, the posture feature, and the relationship feature from an intermediate layer of a corresponding subnetwork. can do.

일 실시예에 따른 전자 장치의 동작 방법에서 복수의 서브 네트워크들의 중간 레이어들이 서로 연결되어 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징이 다른 서브 네트워크로 공유될 수 있다.In the method of operating an electronic device according to an embodiment, intermediate layers of a plurality of sub-networks may be connected to each other so that the class feature, the posture feature, and the relation feature may be shared with other sub-networks.

일 실시예에 따른 전자 장치의 동작 방법에서 상기 객체의 상기 클래스 정보, 상기 자세 정보 및 상기 관계 정보를 획득하는 단계는 상기 보정된 클래스 특징, 상기 보정된 자세 특징 및 상기 보정된 관계 특징 각각이 대응하는 서브 네트워크의 중간 레이어의 다음 레이어에 입력됨에 응답하여, 대응하는 서브 네트워크의 출력 레이어로부터 상기 클래스 정보, 상기 자세 정보 및 상기 관계 정보를 획득할 수 있다.In the method of operating an electronic device according to an embodiment, the obtaining of the class information, the posture information, and the relationship information of the object corresponds to each of the corrected class feature, the corrected posture feature, and the corrected relationship feature. The class information, the posture information, and the relationship information may be obtained from an output layer of the corresponding subnetwork in response to being input to a next layer of the intermediate layer of the sub-network.

일 실시예에 따른 전자 장치의 동작 방법에서 상기 클래스 정보는 상기 이미지에서 검출된 객체가 어떤 객체인지에 대한 정보를 포함하고, 상기 자세 정보는 상기 이미지에서 검출된 객체의 회전 각도를 나타내는 정보를 포함하고, 상기 관계 특징은 상기 이미지에서 검출된 객체의 액션 정보 및/또는 다른 객체와의 연결 정보를 포함할 수 있다.In the method of operating an electronic device according to an embodiment, the class information includes information on which object the object detected in the image is, and the posture information includes information indicating a rotation angle of the object detected in the image. and, the relation feature may include action information of an object detected in the image and/or connection information with another object.

일 실시예에 따른 전자 장치의 동작 방법은 상기 객체의 클래스 정보, 자세 정보 및 관계 정보에 기초하여 상기 이미지에 생성하고자 하는 가상 객체의 위치 정보, 자세 정보 및 동작 정보를 결정하는 단계; 및 상기 가상 객체의 위치 정보, 자세 정보 및 동작 정보에 기반하여 상기 이미지에 가상 객체를 추가하는 단계를 더 포함할 수 있다.According to an embodiment, a method of operating an electronic device may include determining location information, posture information, and motion information of a virtual object to be created in the image based on class information, posture information, and relationship information of the object; and adding the virtual object to the image based on location information, posture information, and motion information of the virtual object.

일 실시예에 따른 전자 장치의 동작 방법에서 상기 가상 객체를 추가하는 단계는 상기 가상 객체에 대해 결정된 위치 정보, 자세 정보 및 동작 정보 중 적어도 하나가 복수인 경우, 복수의 정보 중에서 사용자로부터 선택된 정보에 기반하여 상기 가상 객체를 상기 이미지에 추가할 수 있다.In the method of operating an electronic device according to an embodiment, the adding of the virtual object may include adding the virtual object to information selected by the user from among the plurality of pieces of information when at least one of the determined location information, posture information, and operation information for the virtual object is plural. based on the virtual object may be added to the image.

일 실시예에 따른 전자 장치의 동작 방법에서 가상 객체의 상기 위치 정보는 이미지 내에서 가상 객체가 렌더링될 수 있는 위치를 나타내는 정보를 포함하고, 상기 자세 정보는 가상 객체의 회전 각도를 나타내는 정보를 포함하며, 상기 동작 정보는 가상 객체의 동작을 나타내는 정보를 포함할 수 있다.In the method of operating an electronic device according to an embodiment, the location information of the virtual object includes information indicating a position at which the virtual object can be rendered in an image, and the posture information includes information indicating a rotation angle of the virtual object. and the operation information may include information indicating the operation of the virtual object.

일 실시예에 따른 전자 장치의 동작 방법에서 상기 이미지는 RGB-D 이미지일 수 있다.In the method of operating an electronic device according to an embodiment, the image may be an RGB-D image.

일 실시예에 따른 전자 장치는 하나 이상의 프로세서를 포함하고, 상기 하나 이상의 프로세서는 이미지를 획득하고, 상기 이미지에 포함된 객체의 클래스 특징, 자세 특징 및 관계 특징을 획득하고, 상기 객체의 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징을 이용하여, 상기 클래스 특징, 상기 자세 특징 및 상기 관계 특징을 각각 보정하며, 상기 보정된 클래스 특징, 상기 보정된 자세 특징 및 상기 보정된 관계 특징에 각각 기반하여 상기 객체의 클래스 정보, 자세 정보 및 관계 정보를 획득한다.An electronic device according to an embodiment includes one or more processors, wherein the one or more processors acquire an image, acquire class features, posture features, and relational features of an object included in the image, and obtain the class feature of the object , correcting the class feature, the posture feature, and the relationship feature, respectively, using the posture feature and the relation feature, based on the corrected class feature, the corrected posture feature, and the corrected relation feature, respectively Acquire class information, posture information, and relationship information of an object.

도 1은 일실시예에 따른 전자 장치를 설명하기 위한 도면이다.
도 2 내지 도 4는 일실시예에 따른 전체 이미지에 기반하여 객체를 인식하는 과정을 설명하기 위한 도면이다.
도 5는 일실시예에 따른 전자 장치에서 이미지에 포함된 객체의 정보를 추정하는 과정을 설명하기 위한 도면이다.
도 6은 일실시예에 따른 서브 네트워크들에 기반한 상호 보정을 설명하기 위한 도면이다.
도 7은 일실시예에 따라 객체 정보를 획득하는 과정을 설명하기 위한 도면이다.
도 8 및 도 9는 일실시예에 따라 획득된 객체 정보의 예시들을 나타낸 도면이다.
도 10은 일실시예에 따라 획득된 객체 정보에 기반하여 가상 객체를 생성하는 과정을 설명하기 위한 도면이다.
도 11은 일실시예에 따른 전자 장치에서 이미지에 가상 객체를 생성하는 과정을 설명하기 위한 도면이다.
도 12 내지 도 14는 일실시예에 따라 이미지에 포함된 객체의 정보를 추정해서 가상 객체를 생성하는 과정을 설명하기 위한 도면이다.
도 15는 일실시예에 따라 전자 장치를 나타낸 도면이다.1 is a diagram for describing an electronic device according to an exemplary embodiment.
2 to 4 are diagrams for explaining a process of recognizing an object based on an entire image according to an exemplary embodiment.
5 is a diagram for describing a process of estimating information on an object included in an image in an electronic device according to an exemplary embodiment.
6 is a diagram for explaining mutual correction based on sub-networks according to an embodiment.
7 is a diagram for explaining a process of acquiring object information according to an embodiment.
8 and 9 are diagrams illustrating examples of object information obtained according to an embodiment.
10 is a diagram for describing a process of generating a virtual object based on acquired object information, according to an exemplary embodiment.
11 is a diagram for describing a process of generating a virtual object in an image in an electronic device according to an exemplary embodiment.
12 to 14 are diagrams for explaining a process of generating a virtual object by estimating information on an object included in an image according to an exemplary embodiment.
15 is a diagram illustrating an electronic device according to an exemplary embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for purposes of illustration only, and may be changed and implemented in various forms. Accordingly, the actual implemented form is not limited to the specific embodiments disclosed, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although terms such as first or second may be used to describe various components, these terms should be interpreted only for the purpose of distinguishing one component from another. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected to” another component, it may be directly connected or connected to the other component, but it should be understood that another component may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, and includes one or more other features or numbers, It should be understood that the possibility of the presence or addition of steps, operations, components, parts or combinations thereof is not precluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same components are assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일실시예에 따른 전자 장치를 설명하기 위한 도면이다.1 is a diagram for describing an electronic device according to an exemplary embodiment.

도 1을 참조하면, 전자 장치(100)는 이미지에 포함된 객체를 검출하여 객체 정보를 추정하거나, 추정된 객체 정보에 기반하여 해당 이미지에 가상 객체를 생성할 수 있다. 이미지는 하나의 장면에 포함된 하나 이상의 객체들이 촬영된 이미지로서, 컬러 이미지와 깊이 이미지(depth image)를 포함한 RGB-D(Red-Green-Blue Depth) 이미지일 수 있다. 예를 들어, 이미지는 전자 장치(100)에 내장된 카메라에서 촬영된 이미지이거나, 외부의 카메라 장치로부터 수신된 이미지일 수 있다. 전자 장치(100)에서 추정되는 객체 정보는 이미지에 포함된 객체의 클래스 정보, 자세 정보 및 다른 객체와의 관계 정보를 포함할 수 있다. 관계 정보는 장면 그래프로서 표현될 수 있다. 장면 그래프는 NxN 크기의 행렬로 표현될 수 있고, 여기서 N은 인식된 객체의 수이고, 해당 행렬의 각 행과 각 열은 각각 객체에 대응하고, 행렬의 각 요소는 객체 간의 관계에 대응할 수 있다. 또한, 전자 장치(100)는 이미지에 포함된 하나 이상의 객체들의 클래스 정보, 자세 정보 및 관계 정보 중 적어도 하나에 기반하여, 하나 이상의 객체들에 적합하게 가상 객체를 생성될 수 있다. 이하, 도면들을 참조하여 상세히 설명한다.Referring to FIG. 1 , the electronic device 100 may detect an object included in an image to estimate object information, or may generate a virtual object in a corresponding image based on the estimated object information. The image is an image of one or more objects included in one scene, and may be a red-green-blue depth (RGB-D) image including a color image and a depth image. For example, the image may be an image captured by a camera built into the electronic device 100 or an image received from an external camera device. The object information estimated by the electronic device 100 may include class information of an object included in an image, posture information, and relationship information with other objects. The relationship information may be expressed as a scene graph. A scene graph may be expressed as a matrix of size NxN, where N is the number of recognized objects, each row and each column of the matrix corresponds to an object, and each element of the matrix may correspond to a relationship between objects . Also, the electronic device 100 may generate a virtual object suitable for one or more objects based on at least one of class information, posture information, and relationship information of one or more objects included in the image. Hereinafter, it will be described in detail with reference to the drawings.

도 2 내지 도 4는 일실시예에 따른 전체 이미지에 기반하여 객체를 인식하는 과정을 설명하기 위한 도면이다.2 to 4 are diagrams for explaining a process of recognizing an object based on an entire image according to an exemplary embodiment.

도 2를 참조하면, 이미지의 로컬 영역에 위치한 객체(210)가 예시적으로 도시된다. 경우에 따라서는 비슷한 모양을 가진 다른 유형의 객체들이 존재하거나, 다른 객체에 의해 가려질 수 있기 때문에, 일부 로컬 영역만 이용해서는 특정 객체에 대한 정보를 정확히 추정하기 어려울 수 있다. 도 2에 예시적으로 도시된 로컬 영역만으로는 검출된 객체(210)가 TV인지 액자인지 명확히 구분하기 힘들다. 만약 도 3에 도시된 전체 이미지를 고려할 수 있다면, 전체 이미지에 포함된 객체들 간의 관계 분석을 통해 검출된 객체(310)에 대해 보다 정확한 판단이 가능할 수 있다. 예를 들어, 추정하고자 하는 객체(310)가 벽에 걸려 있고, 소파 뒤에 있다는 정보를 이용한다면, 해당 객체(310)가 TV가 아니라 액자이다는 것이 쉽게 판단될 수 있다.Referring to FIG. 2 , an object 210 located in a local area of an image is illustrated by way of example. In some cases, it may be difficult to accurately estimate information about a specific object using only a partial local area because other types of objects having similar shapes may exist or may be obscured by other objects. It is difficult to clearly distinguish whether the detected object 210 is a TV or a picture frame only with the local area exemplarily shown in FIG. 2 . If the entire image shown in FIG. 3 can be considered, it may be possible to more accurately determine the object 310 detected through the analysis of the relationship between the objects included in the entire image. For example, if the information that the object 310 to be estimated is hung on the wall and behind the sofa is used, it can be easily determined that the object 310 is not a TV but a picture frame.

도 4를 참조하면, 전체 이미지를 고려해서 검출된 객체(410)를 추정하는 다른 예시가 도시된다. 객체(410)가 침대 왼편과 등 아래에 위치한 것임을 고려할 때 침대 협탁임이 쉽게 판별될 수 있다. 이처럼 전체 이미지에 포함된 다른 객체와의 관계를 고려한다면, 객체 검출의 정확도를 향상시킬 수 있다.Referring to FIG. 4 , another example of estimating the detected object 410 in consideration of the entire image is illustrated. Considering that the object 410 is located on the left side of the bed and under the back, it can be easily determined that it is a bedside table. In this way, if the relationship with other objects included in the entire image is considered, the accuracy of object detection can be improved.

도 5는 일실시예에 따른 전자 장치에서 이미지에 포함된 객체의 정보를 추정하는 과정을 설명하기 위한 도면이다.5 is a diagram for describing a process of estimating information on an object included in an image in an electronic device according to an exemplary embodiment.

도 5를 참조하면, 전자 장치에 구비된 프로세서에서 수행되는 객체 정보의 추정 방법이 도시된다. 전자 장치는 복수의 서브 네트워크들을 이용하여 이미지에 포함된 하나 이상의 객체의 클래스, 자세 및 관계를 인식할 수 있다. 복수의 서브 네트워크들 각각은 하나 이상의 컨볼루션 레이어(convolution layer) 및 완전 연결 레이어(fully connected layer)를 포함할 수 있다.Referring to FIG. 5 , a method of estimating object information performed by a processor included in an electronic device is illustrated. The electronic device may recognize the class, posture, and relationship of one or more objects included in the image by using the plurality of subnetworks. Each of the plurality of subnetworks may include one or more convolutional layers and a fully connected layer.

단계(510)에서, 전자 장치는 이미지를 획득할 수 있다. 이미지는 깊이 정보를 포함하는 컬러 이미지(예: RGB-D 이미지)일 수 있다. 예를 들어, 전자 장치는 내장된 카메라 모듈을 통해 이미지를 획득하거나 외부의 카메라 장치로부터 촬영된 이미지를 수신할 수도 있다. 또한, 이미지는 AR(augmented reality) 장치를 위한 이미지 수집 장치로부터 이미지를 획득할 수도 있다.In operation 510, the electronic device may acquire an image. The image may be a color image (eg, an RGB-D image) including depth information. For example, the electronic device may acquire an image through a built-in camera module or receive an image photographed from an external camera device. Also, the image may be obtained from an image collecting device for an augmented reality (AR) device.

단계(520)에서, 전자 장치는 이미지에 포함된 하나 이상의 객체들의 클래스 특징, 자세 특징 및 관계 특징을 획득할 수 있다. 클래스 특징은 검출된 객체가 구체적으로 어떤 객체인지에 대한 클래스 정보를 결정하는 데 이용되는 특징이고, 자세 특징은 타겟 맵(target map) 내 객체의 회전 각도를 나타내는 자세 정보를 결정하는 데 이용되는 특징이며, 관계 특징은 검출된 객체의 액션 정보나 다른 객체와의 연결 정보를 포함한 관계 정보를 결정하는 데 이용되는 특징일 수 있다. 예를 들어, '책을 읽는 사람'에서 '읽는'은 책과 사람 간 관계 정보를 나타내고, '벽에 걸린 그림'에서 '걸린'은 벽과 그림 간 관계 정보를 나타낼 수 있다.In operation 520 , the electronic device may acquire class characteristics, posture characteristics, and relation characteristics of one or more objects included in the image. The class feature is a feature used to determine class information about which object the detected object is specifically, and the posture feature is a feature used to determine posture information indicating a rotation angle of an object in a target map. , and the relationship feature may be a feature used to determine relationship information including action information of the detected object or connection information with another object. For example, 'reading' in 'a person reading a book' may indicate relationship information between a book and a person, and 'hanging' in 'a picture on a wall' may indicate relationship information between a wall and a picture.

전자 장치는 복수의 서브 네트워크들을 포함한 뉴럴 네트워크에 이미지를 입력시킬 수 있다. 복수의 서브 네트워크들은 클래스 정보 인식을 위한 클래스 인식 네트워크, 자세 정보 인식을 위한 자세 인식 네트워크 및 관계 정보 인식을 위한 관계 인식 네트워크를 포함할 수 있다. 복수의 서브 네트워크들 각각의 중간 레이어에서 클래스 특징, 자세 특징, 관계 특징이 출력될 수 있다. 중간 레이어는 서브 네트워크에 포함된 하나 이상의 히든 레이어들 중 하나에 해당할 수 있다.The electronic device may input an image to a neural network including a plurality of sub-networks. The plurality of subnetworks may include a class recognition network for class information recognition, a posture recognition network for posture information recognition, and a relationship recognition network for relationship information recognition. A class feature, a posture feature, and a relationship feature may be output from an intermediate layer of each of the plurality of subnetworks. The intermediate layer may correspond to one of one or more hidden layers included in the subnetwork.

단계(530)에서, 전자 장치는 클래스 특징, 자세 특징 및 관계 특징을 이용하여, 클래스 특징, 자세 특징 및 관계 특징을 각각 보정할 수 있다. 복수의 서브 네트워크들은 서로 조인트되어 있어, 각 서브 네트워크의 중간 레이어에서 출력된 특징들을 주고 받을 수 있다. 이를 테면, 전자 장치는 클래스 특징, 자세 특징 및 관계 특징을 이용하여, 클래스 특징을 보정할 수 있다. 또한, 전자 장치는 클래스 특징, 자세 특징 및 관계 특징을 이용하여, 자세 특징을 보정할 수 있다. 또한, 클래스 특징, 자세 특징 및 관계 특징을 이용하여, 관계 특징을 보정할 수 있다.In operation 530 , the electronic device may respectively correct the class feature, the posture feature, and the relationship feature by using the class feature, the posture feature, and the relationship feature. A plurality of subnetworks are jointed with each other, so that features output from an intermediate layer of each subnetwork can be exchanged. For example, the electronic device may correct the class feature by using the class feature, the posture feature, and the relation feature. Also, the electronic device may correct the posture characteristic by using the class feature, the posture feature, and the relation feature. In addition, the relational characteristic may be corrected by using the class characteristic, the posture characteristic, and the relational characteristic.

단계(540)에서, 전자 장치는 보정된 클래스 특징에 기반하여 객체의 클래스 정보를 획득할 수 있다. 또한, 전자 장치는 보정된 자세 특징에 기반하여 객체의 자세 정보를 결정할 수 있다. 또한, 전자 장치는 보정된 관계 특징에 기반하여 객체의 관계 정보를 결정할 수 있다. 다른 유형의 특징을 이용하여 객체 정보를 결정함으로써, 전체 이미지를 고려한 객체의 클래스 정보, 자세 정보 및 관계 정보를 높은 정확도로 인식할 수 있다.In operation 540, the electronic device may obtain class information of the object based on the corrected class feature. Also, the electronic device may determine the posture information of the object based on the corrected posture characteristics. Also, the electronic device may determine the relationship information of the object based on the corrected relationship feature. By determining object information using different types of features, class information, posture information, and relationship information of an object in consideration of the entire image can be recognized with high accuracy.

본 명세서에서는 설명의 편의를 위해 클래스 특징과 클래스 정보를 각각 카테고리 특징과 카테고리 정보로 지칭할 수 있고, 클래스 정보와 자세 정보를 포괄하여 속성 정보로 지칭할 수 있다.In the present specification, for convenience of description, class characteristics and class information may be referred to as category characteristics and category information, respectively, and class information and posture information may be collectively referred to as attribute information.

도 6은 일실시예에 따른 서브 네트워크들에 기반한 상호 보정을 설명하기 위한 도면이다.6 is a diagram for explaining mutual correction based on sub-networks according to an embodiment.

도 6을 참조하면, 뉴럴 네트워크(620)에 포함된 복수의 서브 네트워크들(621, 622, 623) 간 데이터 교환으로 각 특징들을 보정함으로써 높은 정확도의 객체 정보가 결정되는 과정이 도시된다.Referring to FIG. 6 , a process in which object information with high accuracy is determined by correcting each feature through data exchange between a plurality of sub-networks 621 , 622 , and 623 included in the neural network 620 is illustrated.

이미지(610)는 뉴럴 네트워크(620)에 입력되어 인식이 수행될 수 있다. 뉴럴 네트워크(620)는 클래스 인식 네트워크(621), 자세 인식 네트워크(622) 및 관계 인식 네트워크(623)를 포함할 수 있다. 복수의 서브 네트워크들(621, 622, 623)의 중간 레이어에서는 각각 클래스 특징, 자세 특징, 관계 특징이 출력될 수 있다. 복수의 서브 네트워크들(621, 622, 623)은 서로 데이터를 교환할 수 있고, 이를 통해 각 특징들이 보정될 수 있다.The image 610 may be input to the neural network 620 and recognition may be performed. The neural network 620 may include a class recognition network 621 , a posture recognition network 622 , and a relationship recognition network 623 . A class feature, a posture feature, and a relationship feature may be output from an intermediate layer of the plurality of subnetworks 621 , 622 , and 623 , respectively. The plurality of sub-networks 621 , 622 , and 623 may exchange data with each other, and through this, respective characteristics may be corrected.

예를 들어, 클래스 인식 네트워크(621)의 중간 레이어에서 출력된 클래스 특징은 자세 인식 네트워크(622)의 중간 레이어에서 출력된 자세 특징과 관계 인식 네트워크(623)에서 출력된 관계 특징을 이용하여 보정될 수 있다. For example, the class feature output from the middle layer of the class recognition network 621 is to be corrected using the posture feature output from the middle layer of the posture recognition network 622 and the relationship feature output from the relationship recognition network 623 . can

클래스 특징 보정에 대해 보다 상세히 설명하면, 클래스 특징, 자세 특징, 관계 특징 및 미리 설정된 제1 가중치 계수 어레이(preset first weight coefficient array)에 기반하여, 보정된 클래스 특징이 획득될 수 있다. 제1 가중치 계수 어레이는 클래스 특징을 보정하는 과정에서 이용되는 클래스 특징의 가중치 계수, 자세 특징의 가중치 계수 및 관계 특징의 가중치 계수를 포함할 수 있다.If the class feature correction is described in more detail, the corrected class feature may be obtained based on the class feature, the posture feature, the relation feature, and a preset first weight coefficient array. The first weighting coefficient array may include a weighting coefficient of a class characteristic, a weighting coefficient of a posture characteristic, and a weighting coefficient of a relational characteristic used in the process of correcting the class characteristic.

제1 가중치 계수 어레이를 [a₁₁, a₁₂, a₁₃]로 표현한다면, 보정된 클래스 특징은 다음과 같이 표현될 수 있다.If the first weighting coefficient array _{is expressed as [a 11} , a ₁₂ , a ₁₃ ], the corrected class feature may be expressed as follows.

위 수학식 1에서, A₁은 클래스 특징을 나타내고, A₂는 자세 특징을 나타내며, A₃은 관계 특징을 나타낼 수 있다. 또한, a₁₁는 보정 과정에 적용되는 클래스 특징의 가중치 계수를 나타내고, a₁₂는 보정 과정에 적용되는 자세 특징의 가중치 계수를 나타내고, a₁₃은 보정 과정에 적용되는 관계 특징의 가중치 계수를 나타낼 수 있다. 또한,

은 보정된 클래스 특징을 나타낼 수 있다.In Equation 1 above, A ₁ may indicate a class characteristic, A ₂ may indicate a posture characteristic, and A ₃ may indicate a relational characteristic. In addition, a ₁₁ may represent a weighting coefficient of a class feature applied to the correction process, a ₁₂ may represent a weighting coefficient of a posture feature applied to the correction process, and a ₁₃ may represent a weighting coefficient of a relational feature applied to the correction process. there is. Also,

may represent the corrected class feature.

마찬가지로, 보정된 자세 특징은 클래스 특징, 자세 특징, 관계 특징 및 제2 가중치 계수 어레이에 기반하여 결정될 수 있으며, 보정된 관계 특징은 클래스 특징, 자세 특징, 관계 특징 및 제3 가중치 계수 어레이에 기반하여 결정될 수 있다.Similarly, the corrected posture characteristic may be determined based on the class characteristic, the posture characteristic, the relational characteristic, and the second weighting coefficient array, the corrected relational characteristic being determined based on the class characteristic, the posture characteristic, the relational characteristic and the third weighting coefficient array. can be decided.

제1 가중치 계수 어레이, 제2 가중치 계수 어레이 및 제3 가중치 계수 어레이는 각 특징을 보정하는 데 클래스 특징, 자세 특징 및 관계 특징의 중요도에 따라 결정될 수 있다.The first weighting coefficient array, the second weighting coefficient array, and the third weighting coefficient array may be determined according to the importance of the class characteristic, the posture characteristic, and the relational characteristic in correcting each characteristic.

클래스 특징, 자세 특징 및 관계 특징 간의 상호 보정을 통해, 보정된 클래스 특징, 보정된 자세 특징 및 보정된 관계 특징을 획득함으로써, 뉴럴 네트워크를 통해 객체의 클래스 정보, 자세 정보 및 관계 정보의 정확성을 향상시킬 수 있다. 상이한 특징들 간 상호 보정이 수행되더라도 각 서브 네트워크들의 파라미터는 변경되지 않는다.Accuracy of class information, posture information and relational information of an object is improved through a neural network by obtaining corrected class characteristics, corrected posture characteristics, and corrected relational characteristics through mutual correction between class characteristics, posture characteristics, and relational characteristics can do it Even if mutual correction between different features is performed, the parameters of each sub-network are not changed.

보정된 클래스 특징은 클래스 인식 네트워크(621)에서 클래스 특징이 출력된 중간 레이어의 다음 레이어에 입력되어 클래스 인식 네트워크(621)에 포함된 나머지 레이어들을 통해 처리되며, 클래스 인식 네트워크(621)의 출력 레이어에서 클래스 정보가 출력될 수 있다. 마찬가지로, 보정된 자세 특징은 자세 인식 네트워크(622)의 중간 레이어의 다음 레이어에 입력되어 자세 인식 네트워크(622)에 포함된 나머지 레이어들을 통해 처리되며, 자세 인식 네트워크(622)의 출력 레이어에서 자세 정보가 출력될 수 있다. 또한, 보정된 관계 특징은 관계 인식 네트워크(623)의 중간 레이어의 다음 레이어에 입력되어 관계 인식 네트워크(623)에 포함된 나머지 레이어들을 통해 처리되며, 관계 인식 네트워크(623)의 출력 레이어에서 자세 정보가 출력될 수 있다.The corrected class feature is input to the next layer of the intermediate layer from which the class feature is output from the class recognition network 621 , and is processed through the remaining layers included in the class recognition network 621 , and is an output layer of the class recognition network 621 . class information may be output from . Similarly, the corrected posture feature is input to the next layer of the intermediate layer of the posture recognition network 622 and processed through the remaining layers included in the posture recognition network 622 , and posture information is obtained from the output layer of the posture recognition network 622 . can be output. In addition, the corrected relationship features are input to the next layer of the intermediate layer of the relationship recognition network 623 and processed through the remaining layers included in the relationship recognition network 623 , and posture information in the output layer of the relationship recognition network 623 . can be output.

카테고리 인식 네트워크(621), 자세 인식 네트워크(622) 및 관계 인식 네트워크(623) 각각은 CNN(Convolutional Neural Networks), Faster RCNN(Fast Region-based Convolutional Neural Network) 및 Yolo(You Only Look Once: Unified, Real-Time Object Detection) 등으로 구현될 수 있으나, 네트워크 유형이 제한되는 것은 아니다.The category recognition network 621 , the posture recognition network 622 , and the relationship recognition network 623 are respectively a Convolutional Neural Networks (CNN), a Faster Fast Region-based Convolutional Neural Network (RCNN), and You Only Look Once: Unified (Yolo). Real-Time Object Detection), but the network type is not limited.

일 실시예에 따르면, 뉴럴 네트워크(620)는 복수의 샘플 이미지들을 기반으로 트레이닝될 수 있다. 복수의 샘플 이미지들은 뉴럴 네트워크(620)의 트레이닝에 이용되는 학습 데이터로서, 각 이미지에 포함된 하나 이상의 객체들의 정답 클래스 정보, 정답 자세 정보 및 정답 관계 정보를 포함할 수 있다.According to an embodiment, the neural network 620 may be trained based on a plurality of sample images. The plurality of sample images are training data used for training the neural network 620 , and may include correct answer class information, correct correct posture information, and correct answer relationship information of one or more objects included in each image.

정답 클래스 정보, 정답 자세 정보 및 정답 관계 정보가 설정된 복수의 샘플 이미지들 각각을 뉴럴 네트워크(620)에 입력시켜서 뉴럴 네트워크(620)로부터 추론 클래스 정보, 추론 자세 정보 및 추론 관계 정보가 획득될 수 있다. 추론 클래스 정보와 정답 클래스 정보 간 손실, 추론 자세 정보와 정답 자세 정보 간 손실, 추론 관계 정보와 정답 관계 정보 간 손실에 기반하여, 뉴럴 네트워크(620)의 파라미터들이 조정될 수 있다. 트레이닝 과정에서, 복수의 서브 네트워크들(621, 622, 623) 사이의 데이터 교환에 적용되는 가중치 파라미터도 조정될 수 있다. 각 손실들이 미리 설정된 임계치 미만이 될 때까지 뉴럴 네트워크(620)의 파라미터들을 조정함으로써, 트레이닝된 뉴럴 네트워크(620)를 획득할 수 있다.Inference class information, reasoning posture information, and reasoning relationship information may be obtained from the neural network 620 by inputting each of a plurality of sample images in which correct class information, correct posture information, and correct answer relationship information are set to the neural network 620 . . The parameters of the neural network 620 may be adjusted based on the loss between the inference class information and the correct answer class information, the loss between the inference posture information and the correct posture information, and the loss between the inference relationship information and the correct answer relationship information. In the training process, a weight parameter applied to data exchange between the plurality of sub-networks 621 , 622 , and 623 may also be adjusted. By adjusting the parameters of the neural network 620 until each loss is less than a preset threshold, a trained neural network 620 may be obtained.

미리 설정된 횟수만큼 뉴럴 네트워크(620)를 트레이닝하여, 트레이닝된 인식 뉴럴 네트워크(620)를 획득하는 것이 가능하나, 뉴럴 네트워크(620)의 트레이닝 방법이 이에 한정되는 것은 아니다.It is possible to acquire the trained cognitive neural network 620 by training the neural network 620 a preset number of times, but the training method of the neural network 620 is not limited thereto.

객체의 클래스 인식, 자세 인식 및 관계 인식의 3가지 태스크를 위한 복수의 서브 네트워크들(621, 622, 623)을 조인트 트레이닝함으로써, 정보의 정확도를 효과적으로 향상시킬 수 있다. 복수의 서브 네트워크들(621, 622, 623) 간 데이터 교환을 위해 게이트 메시지 전달 시스템(Gated message passing system)이 적용될 수 있으며, 이에 기반한 특징 개선(feature refinement)으로 인식이 수행될 수 있다.By jointly training the plurality of subnetworks 621 , 622 , and 623 for three tasks of object class recognition, posture recognition, and relation recognition, it is possible to effectively improve information accuracy. A gated message passing system may be applied to exchange data between the plurality of sub-networks 621 , 622 , and 623 , and recognition may be performed through feature refinement based thereon.

도 7은 일실시예에 따라 객체 정보를 획득하는 과정을 설명하기 위한 도면이다.7 is a diagram for explaining a process of acquiring object information according to an embodiment.

도 7을 참조하면, 입력된 이미지로부터 객체 정보가 추출되는 과정이 도시된다.Referring to FIG. 7 , a process of extracting object information from an input image is illustrated.

단계(710)에서, 인식할 이미지가 획득되어, VGG16 네트워크를 기반으로 이미지 특징을 추출하여 공유 특징(shared features)이 획득될 수 있다. VGG16 네트워크는 16개의 컨볼루션 레이어와 전체 연결 레이어를 포함하는 컨볼루션 네트워크일 수 있다. VGG16 네트워크는 뉴럴 네트워크의 구조를 단순화시킬 수 있다. Faster R-CNN 네트워크에 기반하여, 공유 특징에서 객체를 인식하고, 특징 영역(feature regions)이 획득될 수 있다. Faster R-CNN 네트워크는 컨볼루션 레이어, RPN(Region Proposal Network), ROI 풀링(region of interest pooling) 및 분류와 회귀 네트워크를 포함하는 뉴럴 네트워크일 수 있다. 다만, 이미지 특징 추출 및 공유 특징에 대한 인식 과정에서 사용되는 네트워크 유형이 이에 제한되는 것은 아니다.In step 710 , an image to be recognized is obtained, and shared features may be obtained by extracting image features based on the VGG16 network. The VGG16 network may be a convolutional network including 16 convolutional layers and an entire connection layer. The VGG16 network can simplify the structure of a neural network. Based on the Faster R-CNN network, an object is recognized in a shared feature, and feature regions can be obtained. The Faster R-CNN network may be a neural network including a convolution layer, a region proposal network (RPN), a region of interest pooling, and a classification and regression network. However, the network type used in the process of extracting image features and recognizing shared features is not limited thereto.

특징 영역에 기초하여, 후보 객체 영역, 후보 객체 주변 영역 및 관련 객체 쌍이 위치한 영역이 각각 크로핑(cropping)될 수 있다. 후보 객체 영역은 특징 영역 중 객체가 위치한 영역을 나타내고, 후보 객체 주변 영역은 특징 영역 중 객체 주변의 영역을 나타내고, 관련 객체 쌍이 위치한 영역은 특징 영역 중 관련된 객체 쌍이 위치한 영역을 나타낼 수 있다.Based on the feature region, the candidate object region, the candidate object peripheral region, and the region in which the related object pair is located may be cropped, respectively. The candidate object region may indicate a region in which an object is located among the feature regions, the region around the candidate object may indicate a region around the object among the feature regions, and the region in which the related object pair is located may indicate a region in which the related object pair is located.

단계(720)에서, 크로핑한 선택된 객체 영역, 후보 객체 주변 영역 및 관련 객체 쌍이 위치한 영역이 각각 클래스 인식 네트워크에 입력되어 객체의 클래스 특징이 획득되고, 자세 인식 네트워크에 입력되어 객체의 자세 특징이 획득되고, 관계 인식 네트워크에 입력되어 객체의 관계 특징이 획득될 수 있다. 본 명세서에서 설명의 편의를 위해 관계 특징은 장면 그래프 특징으로도 지칭될 수 있다.In step 720, the cropped selected object region, the region around the candidate object, and the region in which the pair of related objects are located are respectively input to the class recognition network to obtain class characteristics of the object, and input to the posture recognition network to obtain the posture characteristics of the object obtained, and input to the relation-aware network to obtain relational characteristics of the object. In this specification, for convenience of description, the relation feature may also be referred to as a scene graph feature.

단계(730)에서, 클래스 인식 네트워크, 자세 인식 네트워크 및 관계 인식 네트워크 간 데이터 교환을 통해 클래스 특징, 자세 특징 및 관계 특징 각각이 보정될 수 있다.In operation 730 , each of the class feature, the posture feature, and the relationship feature may be corrected through data exchange between the class recognition network, the posture recognition network, and the relationship recognition network.

단계(740)에서, 클래스 인식 네트워크로부터 객체의 클래스 정보, 예를 들면, 사람, 모자, 연(kite)에 대한 정보가 출력될 수 있다. 또한, 자세 인식 네트워크로부터는 객체의 자세 정보가 출력될 수 있다. 또한, 관계 인식 네트워크로부터는 객체들 간 관계 정보, 예를 들면, 모자를 쓰고 있는 사람, 연을 날리는 사람, 풀밭 위에 서 있는 사람 등과 같은 장면 그래프가 출력될 수 있다.In operation 740 , class information of an object, for example, information about a person, a hat, and a kite, may be output from the class recognition network. Also, the posture information of the object may be output from the posture recognition network. In addition, relation information between objects, for example, a scene graph such as a person wearing a hat, a person flying a kite, and a person standing on the grass, may be output from the relationship recognition network.

도 8 및 도 9는 일실시예에 따라 획득된 객체 정보의 예시들을 나타낸 도면이다.8 and 9 are diagrams illustrating examples of object information obtained according to an embodiment.

클래스 인식 네트워크, 자세 인식 네트워크 및 관계 인식 네트워크가 서로 조인트되어 인식 과정 중 서로 간의 특징들을 보정함으로써, 입력된 이미지 내 객체 정보를 보다 정확하게 인식할 수 있다. 객체 검출, 자세 추정, 객체 간의 관계 인식을 포함하는 객체 정보에 기반한 3D 장면의 이해는 전체 장면과 객체 간의 관계를 최대한 활용한 높은 정확도의 정보 획득을 가능하게 한다. 인식된 객체 정보는 증강 현실 시스템뿐만 아니라 스마트 홈, 자율 주행, 보안 등 여러 분야에서 이용될 수 있다.The class recognition network, the posture recognition network, and the relationship recognition network are jointed with each other to correct features of each other during the recognition process, so that object information in the input image can be recognized more accurately. The understanding of a 3D scene based on object information, including object detection, posture estimation, and relationship recognition between objects, enables high-accuracy information acquisition by maximizing the relationship between the entire scene and objects. The recognized object information may be used in various fields such as smart home, autonomous driving, and security as well as augmented reality systems.

또한, 객체 정보는 다른 어플리케이션에 필요한 정보로 제공될 수도 있다. 예를 들어, 도 8에 도시된 바와 같이, 스마트 홈에서는 인식된 객체 정보에 기반하여 '사람(810)-넘어짐-바닥'과 같은 이벤트를 인식할 수 있으며, 경보를 울려 사용자에게 알릴 수 있다.In addition, the object information may be provided as information necessary for other applications. For example, as shown in FIG. 8 , the smart home may recognize an event such as 'person 810-fall-floor' based on the recognized object information, and may sound an alarm to notify the user.

또한, 특정 객체가 다른 객체에 의해 가려질 때, 주변 객체의 정보를 이용하여 가려진 객체의 클래스와 자세가 보다 잘 인식될 수 있다. 도 9의 우측의 의자 2는 많은 면적이 앞의 테이블과 왼쪽의 의자 1에 의해 가려져 있다. 앞서 설명한 객체 인식 기법을 이용하면, 가려진 객체의 클래스 정보와 3D 자세 정보가 보다 정확하게 인식될 수 있다.In addition, when a specific object is obscured by another object, the class and posture of the obscured object may be better recognized using information on the surrounding object. A large area of chair 2 on the right side of FIG. 9 is covered by the table in front and chair 1 on the left. By using the object recognition technique described above, class information and 3D posture information of an obscured object can be recognized more accurately.

도 10은 일실시예에 따라 획득된 객체 정보에 기반하여 가상 객체를 생성하는 과정을 설명하기 위한 도면이다.10 is a diagram for describing a process of generating a virtual object based on acquired object information according to an exemplary embodiment.

장면에서 실제 객체의 클래스, 자세, 관계에 기반하여, 해당 장면에 추가될 가상 객체의 가능한 위치와 자세 및 주변 객체 간의 관계가 예측될 수 있다. 이를 통해 추가된 가상 객체는 주변 환경과 현실적이고 자연스러운 상호 작용이 가능할 수 있다.Based on the class, posture, and relation of the real object in the scene, a possible position and posture of the virtual object to be added to the scene and the relation between the posture and surrounding objects may be predicted. Through this, the added virtual object may be able to interact realistically and naturally with the surrounding environment.

예를 들어, 실제 장면에서 의자 옆에 책장이 있을 때, 증강 현실 시스템이 해당 장면에 가상 캐릭터를 추가하면, 실제 장면과 자연스러운 상호 작용을 위해 의자에 앉아 책을 읽는 가상 캐릭터가 생성될 수 있다. 또는, 실제 장면에서 의자가 노트북이 놓인 테이블을 향하고 있다면, 가상 캐릭터는 의자에 앉아 테이블 위 컴퓨터를 사용할 수 있다. 만약 의자가 테이블을 뒤로 하고 TV를 향하고 있다면, 가상 캐릭터는 의자에 앉아 TV를 볼 수 있다. 이처럼 실제 장면 중 실제 객체의 클래스, 자세 및 관계에 따라 가상 객체의 가능한 위치, 자세 및 동작이 추정되고, 추정된 결과에 기반하여 가상과 현실 간의 자연스러운 상호 작용이 구현될 수 있다.For example, when there is a bookshelf next to a chair in a real scene, when the augmented reality system adds a virtual character to the scene, a virtual character sitting in a chair and reading a book for natural interaction with the real scene may be created. Alternatively, if the chair is facing the table on which the laptop is placed in the real scene, the virtual character can sit on the chair and use the computer on the table. If the chair is with the table back and facing the TV, the virtual character can sit on the chair and watch TV. As described above, a possible position, posture, and motion of a virtual object are estimated according to a class, posture, and relationship of a real object in a real scene, and a natural interaction between the virtual and the real may be implemented based on the estimated result.

도 10에 도시된 실제 장면(1010)에 가상 객체(1020)를 추가하는 경우, 제1 케이스(1030)에서와 같이 소파 위에 서 있는 가상 객체(1020)는 실제 장면(1010)에 포함된 실제 객체들과 부자연스러울 수 있다. 앞서 설명한 실제 장면(1010)으로부터 인식된 객체 정보(예컨대, 책장 옆에 소파)를 활용한다면, 제2 케이스(1040)에서와 같이 소파에 앉아서 책을 읽는 가상 객체(1030)를 추가함으로써, 보다 현실적이고 실제 객체와 자연스럽게 융합된 장면을 생성할 수 있다. 이하 도면들을 참조하여 가상 객체 생성에 대해 상세히 설명한다.When the virtual object 1020 is added to the real scene 1010 shown in FIG. 10 , the virtual object 1020 standing on the sofa as in the first case 1030 is a real object included in the real scene 1010 . It can be unnatural with people. If the object information recognized from the real scene 1010 described above (eg, a sofa next to a bookshelf) is used, as in the second case 1040 , by adding a virtual object 1030 that reads a book while sitting on the sofa, more realistic It is possible to create scenes that are naturally fused with real objects. Hereinafter, virtual object creation will be described in detail with reference to the drawings.

도 11은 일실시예에 따른 전자 장치에서 이미지에 가상 객체를 생성하는 과정을 설명하기 위한 도면이다.11 is a diagram for describing a process of generating a virtual object in an image in an electronic device according to an exemplary embodiment.

도 11을 참조하면, 전자 장치에 구비된 프로세서에서 가상 객체의 생성 방법이 도시된다.Referring to FIG. 11 , a method of generating a virtual object in a processor included in an electronic device is illustrated.

단계(1110)에서, 전자 장치는 이미지에 포함된 객체의 클래스 정보, 자세 정보 및 관계 정보를 포함한 객체 정보에 기초하여 이미지에 생성하고자 하는 가상 객체의 위치 정보, 자세 정보 및 동작 정보를 결정할 수 있다. 예를 들어, 전자 장치는 객체 정보를 렌더링 예측 네트워크에 입력하여 이미지로 렌더링할 수 있는 가상 객체의 위치 정보, 자세 정보 및 동작 정보를 획득할 수 있다. 본 명세서에서 설명의 편의를 위해 가상 객체의 위치 정보, 자세 정보 및 동작 정보는 가상 객체 정보로 지칭될 수 있다.In operation 1110, the electronic device may determine location information, posture information, and motion information of a virtual object to be created in the image based on object information including class information, posture information, and relationship information of the object included in the image. . For example, the electronic device may obtain location information, posture information, and motion information of a virtual object that can be rendered as an image by inputting object information into the rendering prediction network. In the present specification, for convenience of description, location information, posture information, and motion information of a virtual object may be referred to as virtual object information.

가상 객체의 위치 정보는 이미지 내에서 가상 객체가 렌더링될 수 있는 위치를 나타내고, 자세 정보는 가상 객체의 회전 각도를 나타내며, 동작 정보는 가상 객체의 동작을 나타낼 수 있다. 가상 객체는 가상 캐릭터나 가상 물체를 포함할 수 있다. 가상 객체를 이미지 내에 렌더링할 때 예측되는 위치 정보, 자세 정보 및 동작 정보를 활용하면, 현실적이고 자연스러운 장면이 획득될 수 있다.The location information of the virtual object may indicate a position where the virtual object can be rendered in an image, the posture information may indicate a rotation angle of the virtual object, and the motion information may indicate the operation of the virtual object. The virtual object may include a virtual character or a virtual object. When a virtual object is rendered in an image, a realistic and natural scene can be obtained by utilizing predicted position information, posture information, and motion information.

렌더링 예측 네트워크는 가상 객체의 위치 정보, 자세 정보 및 동작 정보를 각각 예측하는 3개의 서브 네트워크들을 포함할 수 있다. 3개의 서브 네트워크들은 위치 회귀 네트워크, 자세 예측 네트워크 및 동작 후보 네트워크를 포함할 수 있다.The rendering prediction network may include three sub-networks each predicting location information, posture information, and motion information of a virtual object. The three subnetworks may include a position regression network, a posture prediction network, and a motion candidate network.

위치 회귀 네트워크는 객체 특징을 입력으로 사용하여, 컨볼루션 레이어(convolutional layer), 풀링 레이어(pooling layer) 및 완전 연결 레이어(fully connected layer)을 통해 가상 객체의 적절한 위치를 예측할 수 있다. 자세 예측 네트워크는 장면에서 가상 객체의 3D 자세를 추정하는데 사용되는 회귀 네트워크일 수 있다. 동작 후보 네트워크는 가상 객체와 주변 객체 간의 관계를 예측하여 가상 객체와 실제 객체를 포함하는 장면 그래프(scene graph)를 출력할 수 있다.A position regression network can predict an appropriate position of a virtual object through a convolutional layer, a pooling layer, and a fully connected layer using object features as input. The pose prediction network may be a regression network used to estimate the 3D pose of a virtual object in a scene. The motion candidate network may output a scene graph including the virtual object and the real object by predicting a relationship between the virtual object and the surrounding object.

단계(1120)에서, 전자 장치는 가상 객체의 위치 정보, 자세 정보 및 동작 정보에 기반하여 이미지에 가상 객체를 추가할 수 있다.In operation 1120 , the electronic device may add the virtual object to the image based on location information, posture information, and motion information of the virtual object.

렌더링 예측 네트워크로부터 획득된 가상 객체의 위치 정보는 적어도 하나의 위치를 포함할 수 있고, 자세 정보는 각 위치에서 가상 객체의 다른 자세를 포함할 수 있으며, 동작 정보는 가상 객체의 적어도 하나의 동작을 포함할 수 있다. 만약 여러 위치들, 자세들 및 동작들이 예측된다면, 사용자는 예측된 다양한 위치들, 자세들 및 동작들 중에서 하나의 위치, 자세 및 동작을 선택할 수 있고, 선택한 위치, 자세 및 동작에 따라 가상 객체가 이미지에 렌더링될 수 있다.The position information of the virtual object obtained from the rendering prediction network may include at least one position, the posture information may include different postures of the virtual object at each position, and the motion information may include at least one operation of the virtual object. may include If several positions, postures and motions are predicted, the user can select one position, posture and motion from among various predicted positions, postures and motions, and the virtual object is created according to the selected position, posture and motion. It can be rendered into an image.

일 실시예에서, 렌더링 예측 네트워크에 앞서 인식된 객체의 클래스 정보, 자세 정보 및 관계 정보가 입력됨에 응답하여, 이미지에 렌더링할 수 있는 가상 객체의 위치 정보, 자세 정보 및 동작 정보를 획득함으로써, 이미지 내 실제 객체의 클래스, 자세 및 관계에 따라 실제 객체와 자연스러운 상호작용을 하는 가상 객체를 생성할 수 있다.In an embodiment, in response to input of class information, posture information, and relationship information of an object recognized prior to the rendering prediction network, by obtaining location information, posture information, and motion information of a virtual object that can be rendered in the image, the image Depending on the class, posture, and relationship of my real object, I can create a virtual object that interacts naturally with the real object.

렌더링 예측 네트워크 내 3개의 서브 네트워크들은 서로 결합될 수도 있으며, 가상 객체의 위치 정보, 자세 정보 및 동작 정보를 예측하는 과정에서, 각 정보가 교환되어 다른 정보를 보정하는 데 이용될 수 있으며, 이를 통해 실제 객체와 보다 자연스러운 상호작용을 하는 가상 객체가 획득될 수 있다.The three subnetworks in the rendering prediction network may be combined with each other, and in the process of predicting location information, posture information, and motion information of a virtual object, each information may be exchanged and used to correct other information, through which A virtual object having a more natural interaction with a real object may be obtained.

도 12 내지 도 14는 일실시예에 따라 이미지에 포함된 객체의 정보를 추정해서 가상 객체를 생성하는 과정을 설명하기 위한 도면이다.12 to 14 are diagrams for explaining a process of generating a virtual object by estimating information on an object included in an image according to an exemplary embodiment.

도 12를 참조하면, 입력된 이미지로부터 객체 정보를 추출하고, 객체 정보에 기반하여 가상 객체가 이미지에 추가되는 과정이 도시된다. 전자 장치는 깊이 이미지를 포함하는 컬러 이미지인 RGB-D 이미지(1210)를 획득할 수 있다. 전자 장치는 RGB-D 이미지(1210)를 조인트 추정 모듈(1220)에 입력하여 이미지에 포함된 객체 정보(1230)를 추정할 수 있다. 예를 들어, 조인트 추정 모듈(1220)은 객체를 분류한 결과와 객체의 3D 자세, 장면 그래프를 추정된 장면 정보(Estimated scene information)(1230)로 출력하고, 이에 기반하여 객체의 클래스 정보, 자세 정보 및 관계 정보를 포함하는 객체 정보(1240)가 획득될 수 있다. 객체 정보(1240)는 가상 객체 예측 모듈(1250)에 입력되어 RGB-D 이미지(1210)에 렌더링될 수 있는 가상 객체(1260)의 위치 정보, 자세 정보 및 동작 정보가 획득될 수 있다. 가상 객체 예측 모듈(1250)은 렌더링 예측 네트워크로도 지칭될 수 있다.Referring to FIG. 12 , a process of extracting object information from an input image and adding a virtual object to an image based on the object information is illustrated. The electronic device may acquire an RGB-D image 1210 that is a color image including a depth image. The electronic device may input the RGB-D image 1210 into the joint estimation module 1220 to estimate object information 1230 included in the image. For example, the joint estimation module 1220 outputs a result of classifying the object, a 3D posture of the object, and a scene graph as estimated scene information 1230 , and based on this, class information and posture of the object Object information 1240 including information and relationship information may be obtained. The object information 1240 is input to the virtual object prediction module 1250 to obtain position information, posture information, and motion information of the virtual object 1260 that can be rendered in the RGB-D image 1210 . The virtual object prediction module 1250 may also be referred to as a rendering prediction network.

이하, 렌더링 예측 네트워크의 트레이닝 기법에 대해 설명한다. 렌더링 예측 네트워크의 트레이닝을 위한 복수의 샘플 이미지들 각각에서 미리 설정된 객체를 제외한 나머지 장면 부분이 획득될 수 있다. 나머지 장면 부분의 객체의 클래스 정보, 자세 정보 및 관계 정보가 입력되었을 때 미리 설정된 객체의 위치 정보, 자세 정보 및 동작 정보가 출력되도록, 렌더링 예측 네트워크가 트레이닝될 수 있다.Hereinafter, a training technique of the rendering prediction network will be described. From each of the plurality of sample images for training of the rendering prediction network, the remaining scene parts except for the preset object may be obtained. The rendering prediction network may be trained so that, when class information, posture information, and relationship information of an object of the remaining scene portion are input, preset object position information, posture information, and motion information are output.

예를 들어, 샘플 이미지에 의자에 앉은 사람이 포함되어 있다면, 의자와 사람을 분리하여 의자의 속성 정보와 의자 및 지면 간 관계 정보를 획득하고, 사람의 위치 정보, 자세 정보 및 동작 정보를 획득하여, 의자의 속성 정보와 의자와 지면 간 관계 정보가 입력되었을 때 사람의 위치 정보, 자세 정보 및 동작 정보가 출력되도록 렌더링 예측 네트워크가 트레이닝될 수 있다.For example, if the sample image includes a person sitting in a chair, the chair and the person are separated to obtain the chair attribute information and the relationship information between the chair and the ground, and the location information, posture information, and motion information of the person are obtained. , the rendering prediction network may be trained to output position information, posture information, and motion information of a person when the attribute information of the chair and the relationship information between the chair and the ground are input.

이를 구현하기 위해, 학습 데이터가 다음과 같이 생성될 수 있다. 예를 들어, 먼저 기존 이미지 세트에서 사람이 포함된 이미지를 선택하고, 조인트 추정 모듈을 통해, 선택된 이미지에서 미리 설정된 객체(다시 말해, 사람)의 클래스 정보, 자세 정보 및 관계 정보가 추출될 수 있다. 사람의 객체 정보는 다른 정보와 분리되어 정답 학습 데이터로 이용되고, 기타 객체의 정보는 입력 학습 데이터로 이용될 수 있다.To implement this, training data may be generated as follows. For example, first, an image including a person is selected from an existing image set, and class information, posture information, and relationship information of a preset object (that is, a person) can be extracted from the selected image through the joint estimation module. . Object information of a person may be separated from other information and used as correct answer learning data, and information of other objects may be used as input learning data.

도 13을 참조하면, 이미지에 가상 객체를 렌더링하는 예시가 도시된다. 이미지(1310)에 포함된 실제 객체의 클래스 정보, 자세 정보 및 관계 정보가 획득될 수 있다. 전자 장치가 이미지(1310)에 가상 객체(1330)를 추가하는 경우, 조인트 추정 모듈을 통해 이미지(1310) 내 객체들의 정보를 획득할 수 있다. 이러한 객체 정보를 이용하여, 전자 장치는 가상 객체(1330)의 렌더링 가능한 위치, 자세 및 주변 실제 객체와의 관계(1320)를 예측할 수 있다. 이러한 예측 결과에 기반하여 가상 객체가 자연스럽게 렌더링된 이미지(1340)가 획득될 수 있다.Referring to FIG. 13 , an example of rendering a virtual object on an image is illustrated. Class information, posture information, and relationship information of a real object included in the image 1310 may be acquired. When the electronic device adds the virtual object 1330 to the image 1310 , information on the objects in the image 1310 may be acquired through the joint estimation module. Using such object information, the electronic device may predict the renderable position and posture of the virtual object 1330 and the relationship 1320 with the surrounding real objects. An image 1340 in which the virtual object is naturally rendered may be obtained based on the prediction result.

도 14를 참조하면, AR 디바이스(1410)는 RGB-D 이미지를 촬영할 수 있다. 조인트 추정 모듈(1420)은 AR 디바이스(1410)로부터 획득한 이미지 내 하나 이상의 객체들 각각의 클래스 정보, 자세 정보 및 관계 정보를 추정할 수 있다. AR 디바이스(1410)가 가상 객체에 대한 렌더링 명령(다시 말해, 제어 명령)을 사용자나 설계자로부터 수신하면, 객체의 클래스 정보, 자세 정보 및 관계 정보가 가상 객체 예측 모듈(1430)에 입력되어, 이미지에 렌더링할 가상 객체의 위치 정보, 자세 정보 및 동작 정보가 출력될 수 있다. CG 엔진(1440)은 가상 객체의 위치 정보, 자세 정보 및 동작 정보에 기반하여 이미지에 가상 객체를 렌더링할 수 있다. Referring to FIG. 14 , the AR device 1410 may capture an RGB-D image. The joint estimation module 1420 may estimate class information, posture information, and relationship information of each of one or more objects in the image obtained from the AR device 1410 . When the AR device 1410 receives a rendering command (that is, a control command) for a virtual object from a user or a designer, class information, posture information, and relationship information of the object are input to the virtual object prediction module 1430 to obtain an image Position information, posture information, and motion information of a virtual object to be rendered may be output to the . The CG engine 1440 may render a virtual object in an image based on location information, posture information, and motion information of the virtual object.

도 15는 일실시예에 따라 전자 장치를 나타낸 도면이다.15 is a diagram illustrating an electronic device according to an exemplary embodiment.

도 15를 참조하면, 전자 장치(1500)는 프로세서(1510) 및 메모리(1520)를 포함한다. 선택적으로, 전자 장치(1500)는 송수신기(1530)를 더 포함할 수 있다. 프로세서(1510), 메모리(1520) 및 송수신기(1530)는 버스(1540)를 통해 서로 연결될 수 있다. 전자 장치(1500)는 스마트 폰, 태블릿, 랩탑, 퍼스널 컴퓨터 등 다양한 컴퓨팅 장치, 스마트 시계, 스마트 안경, 스마트 의류 등 다양한 웨어러블 기기, 스마트 스피커, 스마트 TV, 스마트 냉장고 등 다양한 가전장치, 스마트 자동차, 스마트 키오스크, IoT(Internet of Things) 기기, WAD(Walking Assist Device), 드론, 로봇 등을 포함할 수 있다.Referring to FIG. 15 , an electronic device 1500 includes a processor 1510 and a memory 1520 . Optionally, the electronic device 1500 may further include a transceiver 1530 . The processor 1510 , the memory 1520 , and the transceiver 1530 may be connected to each other through a bus 1540 . The electronic device 1500 includes various computing devices such as smart phones, tablets, laptops and personal computers, various wearable devices such as smart watches, smart glasses, and smart clothing, various home appliances such as smart speakers, smart TVs, and smart refrigerators, smart cars, and smart devices. It may include a kiosk, an Internet of Things (IoT) device, a Walking Assist Device (WAD), a drone, a robot, and the like.

프로세서(1510)는 CPU(Central Processing Unit), 일반 프로세서, DSP(Digital Signal Processor), ASIC(Application Specific Integrated Circuit), FPGA(Field Programmable Gate Array) 또는 기타 프로그램 가능 논리 장치, 트랜지스터 논리 장치, 하드웨어 구성 요소 또는 이들의 임의의 조합일 수 있다. 이는 본 명세서에서 설명된 다양한 예시적 논리 블록, 모듈 및 회로를 결합하여 구현 또는 실행될 수 있다. 프로세서(1510)는 또한 하나 이상의 마이크로 프로세서 조합, DSP 및 마이크로 프로세서의 조합 등과 같은 컴퓨팅 기능을 실현하는 조합일 수 있다.The processor 1510 may include a central processing unit (CPU), a general processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, a transistor logic device, and a hardware configuration. elements or any combination thereof. It may be implemented or implemented in combination with the various illustrative logical blocks, modules, and circuits described herein. Processor 1510 may also be a combination that realizes computing functions, such as a combination of one or more microprocessors, a combination of DSP and microprocessors, and the like.

메모리(1520)는 ROM(Read Only Memory) 또는 정적 정보 및 명령을 저장할 수 있는 다른 유형의 정적 저장 장치, RAM(Random Access Memory) 또는 정보 및 명령을 저장할 수 있는 다른 유형의 동적 저장 장치일 수 있고, 또는 EEPROM(Electrically Erasable Programmable Read Only Memory), CD-ROM(Compact Disc Read Only Memory) 또는 기타 광 디스크 저장 장치, 광 디스크 저장 장치(예: 콤팩트 디스크, 레이저 디스크, 광 디스크, 디지털 범용 광 디스크, 블루레이 디스크 등), 디스크 저장 매체 또는 다른 자기 저장 장치, 또는 명령 또는 데이터 구조의 형태로 원하는 프로그램 코드를 휴대 또는 저장하는데 사용되고 컴퓨터에 의해 액세스될 수 있는 임의의 다른 매체일 수 있으며, 이에 제한되지 않는다.Memory 1520 may be read only memory (ROM) or other type of static storage that may store static information and instructions, random access memory (RAM), or other type of dynamic storage that may store information and instructions, and , or Electronically Erasable Programmable Read Only Memory (EEPROM), Compact Disc Read Only Memory (CD-ROM), or other optical disc storage device, optical disc storage device such as compact disc, laser disc, optical disc, digital universal optical disc; Blu-ray disc, etc.), a disk storage medium or other magnetic storage device, or any other medium that is used to carry or store the desired program code in the form of instructions or data structures and which can be accessed by a computer, but is not limited thereto. does not

메모리(1520)는 앞서 설명한 동작들을 수행하기 위한 응용 프로그램 코드를 저장하는데 사용되고, 실행은 프로세서(1510)에 의해 제어될 수 있다. 프로세서(1510)는 메모리(1520)에 저장된 응용 프로그램 코드를 실행하여 앞서 설명한 동작들을 구현하는데 사용될 수 있다.The memory 1520 is used to store application program codes for performing the above-described operations, and execution may be controlled by the processor 1510 . The processor 1510 may be used to implement the above-described operations by executing the application program code stored in the memory 1520 .

버스(1540)는 컴포넌트 간의 정보를 전송하기 위한 경로를 포함할 수 있다. 버스(1540)는 PCI(Peripheral Component Interconnect) 버스 또는 EISA(Extended Industry Standard Architecture) 버스 등일 수 있다. 버스(1540)는 주소 버스, 데이터 버스, 제어 버스 등으로 분류될 수 있다. 표현의 편의를 위해, 도 15에서는 하나의 굵은 선만 사용하지만, 이것이 버스가 하나만 있거나 버스 유형이 하나만 있는 것을 나타내지는 않는다. The bus 1540 may include a path for transmitting information between components. The bus 1540 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 1540 may be classified into an address bus, a data bus, a control bus, and the like. For convenience of expression, only one bold line is used in FIG. 15, but this does not indicate that there is only one bus or that there is only one bus type.

그 밖에, 전자 장치(1500)에 관해서는 상술된 동작을 처리할 수 있다.In addition, the above-described operation may be processed with respect to the electronic device 1500 .

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and a software application running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination, and the program instructions recorded on the medium are specially designed and configured for the embodiment, or are known and available to those skilled in the art of computer software. may be Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and carry out program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or a plurality of software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, a person skilled in the art may apply various technical modifications and variations based thereon. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

acquiring an image;
acquiring class characteristics, posture characteristics, and relational characteristics of objects included in the image;
correcting the class feature, the posture feature, and the relationship feature, respectively, by using the class feature, the posture feature, and the relation feature of the object; and
obtaining class information, posture information, and relationship information of the object based on the corrected class feature, the corrected posture feature, and the corrected relationship feature, respectively
containing
A method of operation of an electronic device.

According to claim 1,
The correcting step is
correcting any one of the class feature, the posture feature, and the relationship feature by applying a predetermined weight to the class feature, the posture feature, and the relation feature of the object,
A method of operation of an electronic device.

According to claim 1,
The acquiring of the class feature, the posture feature, and the relation feature includes:
obtaining each of the class feature, the posture feature, and the relation feature from an intermediate layer of a corresponding subnetwork,
A method of operation of an electronic device.

4. The method of claim 3,
Intermediate layers of a plurality of sub-networks are connected to each other and the class feature, the posture feature, and the relation feature are shared by other sub-networks,
A method of operation of an electronic device.

According to claim 1,
The step of obtaining the class information, the posture information, and the relationship information of the object
In response to each of the corrected class feature, the corrected posture feature, and the corrected relational feature being input to a next layer of an intermediate layer of a corresponding subnetwork, the class information and the posture information from an output layer of a corresponding subnetwork and obtaining the relationship information,
A method of operation of an electronic device.

According to claim 1,
The class information includes information about which object the object detected in the image is,
The posture information includes information indicating the rotation angle of the object detected in the image,
The relation feature includes action information of an object detected in the image and/or connection information with another object,
A method of operation of an electronic device.

According to claim 1,
determining position information, posture information, and motion information of a virtual object to be created in the image based on class information, posture information, and relationship information of the object; and
adding a virtual object to the image based on location information, posture information, and motion information of the virtual object
further comprising,
A method of operation of an electronic device.

8. The method of claim 7,
The step of adding the virtual object is
adding the virtual object to the image based on information selected by a user from among the plurality of pieces of information when at least one of location information, posture information, and motion information determined for the virtual object is plural;
A method of operation of an electronic device.

8. The method of claim 7,
The location information of the virtual object includes information indicating a location where the virtual object can be rendered in an image,
The posture information includes information indicating the rotation angle of the virtual object,
The operation information includes information indicating the operation of the virtual object,
A method of operation of an electronic device.

According to claim 1,
The image is an RGB-D image,
A method of operation of an electronic device.

A computer-readable storage medium in which a program for executing the method of any one of claims 1 to 10 is recorded.

one or more processors;
the one or more processors
acquire an image,
Obtaining class characteristics, posture characteristics, and relational characteristics of the object included in the image,
correcting the class feature, the posture feature, and the relationship feature, respectively, by using the class feature, the posture feature, and the relation feature of the object,
Obtaining class information, posture information, and relationship information of the object based on the corrected class feature, the corrected posture feature, and the corrected relationship feature, respectively
electronic device.

13. The method of claim 12,
the one or more processors
correcting any one of the class feature, the posture feature, and the relationship feature by applying a predetermined weight to the class feature, the posture feature, and the relation feature of the object,
electronic device.

13. The method of claim 12,
the one or more processors
obtaining each of the class feature, the posture feature, and the relation feature from an intermediate layer of a corresponding subnetwork,
electronic device.

15. The method of claim 14,
Intermediate layers of a plurality of sub-networks are connected to each other and the class feature, the posture feature, and the relation feature are shared by other sub-networks,
electronic device.

13. The method of claim 12,
the one or more processors
In response to each of the corrected class feature, the corrected posture feature, and the corrected relational feature being input to a next layer of an intermediate layer of a corresponding subnetwork, the class information and the posture information from an output layer of a corresponding subnetwork and obtaining the relationship information,
electronic device.

13. The method of claim 12,
The class information includes information about which object the object detected in the image is,
The posture information includes information indicating the rotation angle of the object detected in the image,
The relation feature includes action information of an object detected in the image and/or connection information with another object,
electronic device.

13. The method of claim 12,
the one or more processors
determining position information, posture information, and motion information of a virtual object to be created in the image based on class information, posture information, and relationship information of the object;
adding a virtual object to the image based on location information, posture information, and motion information of the virtual object
electronic device.

19. The method of claim 18,
the one or more processors
adding the virtual object to the image based on information selected by a user from among the plurality of pieces of information when at least one of location information, posture information, and motion information determined for the virtual object is plural;
electronic device.

19. The method of claim 18,
The location information of the virtual object includes information indicating a location where the virtual object can be rendered in an image,
The posture information includes information indicating the rotation angle of the virtual object,
The operation information includes information indicating the operation of the virtual object,
electronic device.