KR102583675B1

KR102583675B1 - Method and system for classifying image

Info

Publication number: KR102583675B1
Application number: KR1020210051388A
Authority: KR
Inventors: 구형일; 김용균
Original assignee: 아주대학교산학협력단
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2023-09-27
Also published as: KR20220144718A

Abstract

본 개시의 기술적 사상에 의한 일 양태에 따른 이미지 분류 시스템은, 입력된 이미지에 대한 3차원 특성을 추정하는 이미지 특성 추정 모듈, 및 상기 이미지 특성 추정 모듈에 의해 추정된 3차원 특성에 기초하여 상기 이미지를 분류하는 이미지 분류 모듈을 포함하고, 상기 이미지 특성 추정 모듈은, 상기 입력된 이미지의 3차원적 특징, 및 상기 입력된 이미지 내의 픽셀들 간의 색상 변화를 반영하여 상기 3차원 특성을 추정하도록 학습된 딥러닝 기반의 제1 네트워크를 포함한다.An image classification system according to an aspect according to the technical idea of the present disclosure includes an image characteristic estimation module for estimating three-dimensional characteristics of an input image, and an image classification system based on the three-dimensional characteristics estimated by the image characteristic estimation module. and an image classification module that classifies, wherein the image characteristic estimation module is trained to estimate the three-dimensional characteristics by reflecting three-dimensional characteristics of the input image and color changes between pixels in the input image. Includes a first network based on deep learning.

Description

Image classification method and system {METHOD AND SYSTEM FOR CLASSIFYING IMAGE}

본 개시(disclosure)의 기술적 사상은 이미지 분류 방법 및 시스템에 관한 것으로, 더욱 상세하게는 2차원 이미지의 3차원 특성을 추정하고, 추정된 특성에 기초하여 이미지를 분류할 수 있는 방법 및 시스템에 관한 것이다.The technical idea of the present disclosure relates to an image classification method and system, and more specifically, to a method and system that can estimate three-dimensional characteristics of a two-dimensional image and classify the image based on the estimated characteristics. will be.

컴퓨터 비전(computer vision)은 인공 지능(artificial intelligence)의 일 분야로서, 컴퓨터를 이용하여 인간의 시각적인 인식 능력 일반을 재현하는 연구 분야를 의미한다. 컴퓨터와 같은 기기가 이미지의 내용을 이해하기 위해서는 상기 컴퓨터 비전과 머신 러닝 알고리즘을 사용하여 이미지에서 의미를 추출하는 이미지 분류를 적용하여야 한다.Computer vision is a field of artificial intelligence and refers to a field of research that reproduces general human visual recognition abilities using computers. In order for a device such as a computer to understand the content of an image, image classification must be applied to extract meaning from the image using the computer vision and machine learning algorithms described above.

일반적인 이미지는 객체를 2차원 형태로 나타낸 것이나, 실제 객체는 대부분 3차원 형상을 갖는 바, 2차원 정보만으로는 이미지의 정확한 분류가 어렵다는 한계가 존재한다. 따라서, 이미지 분류 시 정확도를 향상시키기 위해, 2차원 이미지로부터 객체의 3차원 형태를 추정한 후, 추정된 결과를 이용하여 이미지를 분류하려는 시도가 증가하고 있다.A typical image represents an object in a two-dimensional form, but most real objects have a three-dimensional shape, so there is a limitation that it is difficult to accurately classify the image using only two-dimensional information. Therefore, in order to improve accuracy when classifying images, attempts are increasing to estimate the 3D shape of an object from a 2D image and then classify the image using the estimated result.

본 발명이 해결하고자 하는 일 과제는, 이미지의 3차원 특성을 추정하여 이미지 분류 시 정확도를 향상시킬 수 있는 방법을 제공하는 것이다.One problem that the present invention seeks to solve is to provide a method to improve accuracy when classifying images by estimating three-dimensional characteristics of images.

상기와 같은 목적을 달성하기 위하여, 본 개시의 기술적 사상에 의한 일 양태(aspect)에 따른 이미지 분류 시스템은, 입력된 이미지에 대한 3차원 특성을 추정하는 이미지 특성 추정 모듈, 및 상기 이미지 특성 추정 모듈에 의해 추정된 3차원 특성에 기초하여 상기 이미지를 분류하는 이미지 분류 모듈을 포함하고, 상기 이미지 특성 추정 모듈은, 상기 입력된 이미지의 3차원적 특징, 및 상기 입력된 이미지 내의 픽셀들 간의 색상 변화를 반영하여 상기 3차원 특성을 추정하도록 학습된 딥러닝 기반의 제1 네트워크를 포함한다.In order to achieve the above object, an image classification system according to an aspect according to the technical idea of the present disclosure includes an image characteristic estimation module for estimating three-dimensional characteristics of an input image, and the image characteristic estimation module. and an image classification module that classifies the image based on three-dimensional characteristics estimated by, wherein the image characteristic estimation module classifies three-dimensional characteristics of the input image and color changes between pixels in the input image. It includes a deep learning-based first network learned to estimate the three-dimensional characteristics by reflecting .

일 실시 예에 따라, 상기 이미지 특성 추정 모듈은 상기 이미지의 3차원 특성에 대응하는 복수의 3차원 포인트들의 위치를 추정하고, 추정된 위치에 대응하는 값들을 상기 3차원 특성의 추정 결과로서 출력할 수 있다.According to one embodiment, the image characteristic estimation module estimates the positions of a plurality of 3D points corresponding to the 3D characteristics of the image, and outputs values corresponding to the estimated positions as an estimation result of the 3D characteristics. You can.

일 실시 예에 따라, 상기 이미지 분류 시스템은 상기 이미지의 3차원 특성을 나타내는 메쉬를 추정하고, 상기 복수의 3차원 포인트들은 상기 메쉬에 포함된 복수의 정점들에 대응할 수 있다.According to one embodiment, the image classification system estimates a mesh representing 3D characteristics of the image, and the plurality of 3D points may correspond to a plurality of vertices included in the mesh.

일 실시 예에 따라, 상기 이미지 분류 시스템은 상기 제1 네트워크의 학습을 위한 학습 모듈을 더 포함하고, 상기 학습 모듈은 상기 추정된 3차원 특성을 이용한 렌더링 결과와, 상기 입력된 이미지의 3차원 특성에 대응하는 정답 데이터 간의 차이에 기초하여 상기 제1 네트워크를 업데이트할 수 있다.According to one embodiment, the image classification system further includes a learning module for learning the first network, and the learning module provides a rendering result using the estimated 3D characteristics and 3D characteristics of the input image. The first network can be updated based on the difference between the correct answer data corresponding to .

일 실시 예에 따라, 상기 학습 모듈은 상기 추정된 3차원 특성에 기초하여 재구성된 이미지와, 상기 입력된 이미지 사이에 대응하는 픽셀 간의 색상 차이에 기초하여 상기 제1 네트워크를 업데이트할 수 있다.According to one embodiment, the learning module may update the first network based on color differences between corresponding pixels between the image reconstructed based on the estimated 3D characteristics and the input image.

일 실시 예에 따라, 상기 이미지 분류 모듈은 상기 복수의 3차원 포인트들의 위치에 대응하는 값들에 기초하여 상기 이미지를 분류하도록 학습된 딥러닝 기반의 제2 네트워크를 더 포함할 수 있다.According to one embodiment, the image classification module may further include a deep learning-based second network learned to classify the image based on values corresponding to the positions of the plurality of 3D points.

일 실시 예에 따라, 상기 이미지 분류 모듈은 상기 복수의 3차원 포인트들에 기초하여 렌더링된 깊이 이미지와 상기 입력된 이미지에 기초하여 상기 입력된 이미지를 분류하도록 학습된 딥러닝 기반의 제2 네트워크를 더 포함할 수 있다.According to one embodiment, the image classification module includes a deep learning-based second network learned to classify the input image based on the depth image and the input image rendered based on the plurality of 3D points. More may be included.

일 실시 예에 따라, 상기 이미지 분류 시스템은 적어도 하나의 컴퓨팅 장치로 구현되고, 상기 적어도 하나의 컴퓨팅 장치 각각은 프로세서 및 메모리를 포함할 수 있다.According to one embodiment, the image classification system is implemented with at least one computing device, and each of the at least one computing device may include a processor and memory.

본 개시의 기술적 사상에 의한 일 양태에 따른 이미지 분류 방법은, 입력된 이미지에 대한 3차원 특성을 추정하는 단계, 및 추정된 3차원 특성에 기초하여 상기 입력된 이미지를 분류하는 단계를 포함하고, 상기 3차원 특성을 추정하는 단계는, 상기 입력된 이미지의 3차원적 특징, 및 상기 입력된 이미지 내의 픽셀들 간의 색상 변화를 반영하여 상기 3차원 특성을 추정하도록 학습된 딥러닝 기반의 제1 네트워크를 이용하여, 상기 입력된 이미지에 대한 3차원 특성을 추정하는 단계를 포함한다.An image classification method according to an aspect according to the technical spirit of the present disclosure includes estimating three-dimensional characteristics of an input image, and classifying the input image based on the estimated three-dimensional characteristics, The step of estimating the 3D characteristics includes a deep learning-based first network learned to estimate the 3D characteristics by reflecting the 3D characteristics of the input image and color changes between pixels in the input image. It includes the step of estimating three-dimensional characteristics of the input image using .

본 개시의 실시 예에 따르면, 딥러닝 기반의 학습된 네트워크를 이용하여, 2차원 이미지로부터 추정되는 3차원 특성을 통해 이미지를 보다 정확히 분류할 수 있다.According to an embodiment of the present disclosure, images can be more accurately classified through 3D characteristics estimated from 2D images using a deep learning-based learned network.

뿐만 아니라, 본 개시의 실시 예에 따르면 3차원 특성의 추정 시 2차원 이미지 내의 픽셀들의 색상 변화까지 고려하도록 상기 네트워크를 학습시킴으로써, 3차원 특성의 추정 정확도를 극대화할 수 있다.In addition, according to an embodiment of the present disclosure, the estimation accuracy of 3D characteristics can be maximized by training the network to consider color changes of pixels in a 2D image when estimating 3D characteristics.

본 개시의 기술적 사상에 따른 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects according to the technical idea of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

본 개시에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1은 본 개시의 예시적 실시 예에 따른 이미지 분류 시스템을 통한 이미지 분류 동작이 개략적으로 도시된 도면이다.
도 2는 도 1에 도시된 이미지 분류 시스템의 이미지 특성 추정 모듈 및 이미지 분류 모듈 각각에 포함되는 네트워크의 예로서 컨볼루션 신경망의 개략적인 구성을 나타내는 도면이다.
도 3은 본 개시의 예시적 실시 예에 따른 이미지 분류 방법을 설명하기 위한 플로우차트이다.
도 4는 본 개시의 예시적 실시 예에 따른 메쉬 기반의 3차원 특성 추정 동작을 설명하기 위한 도면이다.
도 5는 추정된 3차원 특성에 기반하여 이미지를 분류하는 동작의 예들을 나타낸 도면이다.
도 6은 본 개시의 예시적 실시 예에 따른 이미지 분류 시스템의 학습 모듈이 이미지 특성 추정 모듈의 네트워크에 대한 학습을 수행하는 동작을 설명하기 위한 플로우차트이다.
도 7 내지 도 8은 도 6에 도시된 학습 모듈의 학습 동작과 관련된 예시도들이다.
도 9는 본 개시의 예시적 실시 예에 따른 이미지 분류 방법을 수행하는 디바이스의 개략적인 블록도이다.In order to more fully understand the drawings cited in this disclosure, a brief description of each drawing is provided.
1 is a diagram schematically illustrating an image classification operation using an image classification system according to an exemplary embodiment of the present disclosure.
FIG. 2 is a diagram illustrating a schematic configuration of a convolutional neural network as an example of a network included in each of the image characteristic estimation module and the image classification module of the image classification system shown in FIG. 1.
Figure 3 is a flowchart for explaining an image classification method according to an exemplary embodiment of the present disclosure.
FIG. 4 is a diagram for explaining a mesh-based 3D characteristic estimation operation according to an exemplary embodiment of the present disclosure.
Figure 5 is a diagram showing examples of an operation for classifying an image based on estimated 3D characteristics.
FIG. 6 is a flowchart illustrating an operation in which a learning module of an image classification system performs learning on a network of an image feature estimation module according to an exemplary embodiment of the present disclosure.
Figures 7 and 8 are exemplary diagrams related to the learning operation of the learning module shown in Figure 6.
Figure 9 is a schematic block diagram of a device performing an image classification method according to an exemplary embodiment of the present disclosure.

본 개시의 기술적 사상에 따른 예시적인 실시 예들은 당해 기술 분야에서 통상의 지식을 가진 자에게 본 개시의 기술적 사상을 더욱 완전하게 설명하기 위하여 제공되는 것으로, 아래의 실시 예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 개시의 기술적 사상의 범위가 아래의 실시 예들로 한정되는 것은 아니다. 오히려, 이들 실시 예들은 본 개시를 더욱 충실하고 완전하게 하며 당업자에게 본 발명의 기술적 사상을 완전하게 전달하기 위하여 제공되는 것이다.Illustrative embodiments according to the technical idea of the present disclosure are provided to more completely explain the technical idea of the present disclosure to those skilled in the art, and the examples below can be modified into various other forms. may be, and the scope of the technical idea of the present disclosure is not limited to the examples below. Rather, these embodiments are provided to make the present disclosure more faithful and complete and to completely convey the technical idea of the present invention to those skilled in the art.

본 개시에서 제1, 제2 등의 용어가 다양한 부재, 영역, 층들, 부위 및/또는 구성 요소들을 설명하기 위하여 사용되지만, 이들 부재, 부품, 영역, 층들, 부위 및/또는 구성 요소들은 이들 용어에 의해 한정되어서는 안 됨은 자명하다. 이들 용어는 특정 순서나 상하, 또는 우열을 의미하지 않으며, 하나의 부재, 영역, 부위, 또는 구성 요소를 다른 부재, 영역, 부위 또는 구성 요소와 구별하기 위하여만 사용된다. 따라서, 이하 상술할 제1 부재, 영역, 부위 또는 구성 요소는 본 개시의 기술적 사상의 가르침으로부터 벗어나지 않고서도 제2 부재, 영역, 부위 또는 구성 요소를 지칭할 수 있다. 예를 들면, 본 개시의 권리 범위로부터 이탈되지 않은 채 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Although the terms first, second, etc. are used in this disclosure to describe various members, regions, layers, portions, and/or components, these members, parts, regions, layers, portions, and/or components are referred to by these terms. It is obvious that it should not be limited by . These terms do not imply any particular order, superiority, inferiority, or superiority or inferiority, and are used only to distinguish one member, region, region, or component from another member, region, region, or component. Accordingly, the first member, region, portion, or component to be described in detail below may refer to the second member, region, portion, or component without departing from the teachings of the technical idea of the present disclosure. For example, a first component may be referred to as a second component without departing from the scope of the present disclosure, and similarly, the second component may also be referred to as a first component.

달리 정의되지 않는 한, 여기에 사용되는 모든 용어들은 기술 용어와 과학 용어를 포함하여 본 개시의 개념이 속하는 기술 분야에서 통상의 지식을 가진 자가 공통적으로 이해하고 있는 바와 동일한 의미를 지닌다. 또한, 통상적으로 사용되는, 사전에 정의된 바와 같은 용어들은 관련되는 기술의 맥락에서 이들이 의미하는 바와 일관되는 의미를 갖는 것으로 해석되어야 하며, 여기에 명시적으로 정의하지 않는 한 과도하게 형식적인 의미로 해석되어서는 아니 될 것이다.Unless otherwise defined, all terms used herein, including technical terms and scientific terms, have the same meaning as commonly understood by those skilled in the art in the technical field to which the concept of the present disclosure pertains. Additionally, commonly used terms, as defined in dictionaries, should be interpreted to have meanings consistent with what they mean in the context of the relevant technology, and should not be used in an overly formal sense unless explicitly defined herein. It should not be interpreted.

어떤 실시 예가 달리 구현 가능한 경우에 특정한 공정 순서는 설명되는 순서와 다르게 수행될 수도 있다. 예를 들면, 연속하여 설명되는 두 공정이 실질적으로 동시에 수행될 수도 있고, 설명되는 순서와 반대의 순서로 수행될 수도 있다.In cases where an embodiment can be implemented differently, a specific process sequence may be performed differently from the described sequence. For example, two processes described in succession may be performed substantially at the same time, or may be performed in an order opposite to the order in which they are described.

첨부한 도면에 있어서, 예를 들면, 제조 기술 및/또는 공차에 따라, 도시된 형상의 변형들이 예상될 수 있다. 따라서, 본 개시의 기술적 사상에 의한 실시 예들은 본 개시에 도시된 영역의 특정 형상에 제한된 것으로 해석되어서는 아니 되며, 예를 들면, 제조 과정에서 초래되는 형상의 변화를 포함하여야 한다. 도면 상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고, 이들에 대한 중복된 설명은 생략한다.In the accompanying drawings, variations of the depicted shape may be expected, depending, for example, on manufacturing techniques and/or tolerances. Accordingly, embodiments based on the technical spirit of the present disclosure should not be construed as being limited to the specific shape of the area shown in the present disclosure, but should include, for example, changes in shape that occur during the manufacturing process. The same reference numerals are used for the same components in the drawings, and duplicate descriptions thereof are omitted.

여기에서 사용된 '및/또는' 용어는 언급된 부재들의 각각 및 하나 이상의 모든 조합을 포함한다.As used herein, the term 'and/or' includes each and every combination of one or more of the mentioned elements.

이하에서는 첨부한 도면들을 참조하여 본 개시의 기술적 사상에 의한 실시 예들에 대해 상세히 설명한다.Hereinafter, embodiments based on the technical idea of the present disclosure will be described in detail with reference to the attached drawings.

도 1은 본 개시의 예시적 실시 예에 따른 이미지 분류 시스템을 통한 이미지 분류 동작이 개략적으로 도시된 도면이다. 도 2는 도 1에 도시된 이미지 분류 시스템의 이미지 특성 추정 모듈 및 이미지 분류 모듈 각각에 포함되는 네트워크의 예로서 컨볼루션 신경망의 개략적인 구성을 나타내는 도면이다.1 is a diagram schematically illustrating an image classification operation using an image classification system according to an exemplary embodiment of the present disclosure. FIG. 2 is a diagram illustrating a schematic configuration of a convolutional neural network as an example of a network included in each of the image characteristic estimation module and the image classification module of the image classification system shown in FIG. 1.

도 1을 참조하면, 본 개시의 실시 예에 따른 이미지 분류 시스템(100)은, 특정 객체에 대한 이미지가 입력되면, 입력된 이미지에 포함된 상기 특정 객체를 인식 또는 식별함으로써, 상기 이미지를 분류하는 동작을 수행할 수 있다. Referring to FIG. 1, the image classification system 100 according to an embodiment of the present disclosure classifies the image by recognizing or identifying the specific object included in the input image when an image for a specific object is input. The action can be performed.

이러한 이미지 분류 시스템(100)은 적어도 하나의 컴퓨팅 장치를 포함할 수 있다. 예컨대, 상기 적어도 하나의 컴퓨팅 장치 각각은 프로세서, 메모리, 통신 인터페이스, 입력부, 및/또는 출력부 등을 포함하는 하드웨어 기반의 장치에 해당한다. 이 경우, 이미지 분류 시스템(100)에 포함되는 구성들(모듈들)은 하드웨어, 소프트웨어, 또는 이들의 조합으로 구현될 수 있으며, 상기 적어도 하나의 컴퓨팅 장치에 통합 또는 분할되어 구현될 수 있다.This image classification system 100 may include at least one computing device. For example, each of the at least one computing device corresponds to a hardware-based device including a processor, memory, communication interface, input unit, and/or output unit. In this case, components (modules) included in the image classification system 100 may be implemented as hardware, software, or a combination thereof, and may be implemented by being integrated or divided into the at least one computing device.

본 개시의 일 실시 예에 따른 이미지 분류 시스템(100)은 이미지 특성 추정 모듈(110), 이미지 분류 모듈(120), 및 학습 모듈(130)을 포함할 수 있으나, 이에 한정되는 것은 아니고 보다 많은 구성을 포함할 수도 있다.The image classification system 100 according to an embodiment of the present disclosure may include an image characteristic estimation module 110, an image classification module 120, and a learning module 130, but is not limited thereto and has many more components. It may also include .

이미지 특성 추정 모듈(110)은, 입력된 미지에 대한 3차원 특성을 추정할 수 있다. 일반적으로, 상기 입력된 이미지는 2차원 형태의 정보만을 포함하고 있으며, 이미지 특성 추정 모듈(110)은 상기 입력된 이미지로부터 깊이(depth) 정보를 추정함으로써 입력된 이미지에 대한 3차원 특성을 추정할 수 있다. 본 개시의 실시 예에 따른 이미지 특성 추정 모듈(110)은 딥러닝 기반으로 학습된 네트워크(인공 신경망)를 포함하고, 상기 네트워크를 이용하여 입력된 이미지로부터 3차원 특성을 추정할 수 있다. 예컨대 '소장(small intestine)'의 이미지가 입력되는 경우, 이미지 특성 추정 모듈(110)은 입력된 이미지에 대한 3차원 특성을 나타내는 복수의 3차원 포인트들의 정보를 제공할 수 있다. 상기 복수의 3차원 포인트들에 대해서는 추후 도 4를 통해 보다 상세히 설명한다.The image characteristic estimation module 110 can estimate three-dimensional characteristics of the input unknown. Generally, the input image contains only two-dimensional information, and the image characteristic estimation module 110 estimates three-dimensional characteristics of the input image by estimating depth information from the input image. You can. The image characteristic estimation module 110 according to an embodiment of the present disclosure includes a network (artificial neural network) learned based on deep learning, and can estimate three-dimensional characteristics from an input image using the network. For example, when an image of a 'small intestine' is input, the image characteristic estimation module 110 may provide information on a plurality of 3D points representing 3D characteristics of the input image. The plurality of 3D points will be described in more detail later with reference to FIG. 4.

이미지 분류 모듈(120)은, 이미지 특성 추정 모듈(110)에 의해 추정된 3차원 특성에 기초하여 상기 입력된 이미지에 대한 분류 동작을 수행할 수 있다. 이미지 분류 모듈(120) 또한 딥러닝 기반으로 학습된 네트워크를 포함하고, 상기 네트워크를 이용하여 상기 3차원 특성 정보로부터 이미지 분류 결과를 제공할 수 있다. 예컨대 상기 네트워크는 분류 결과로서 제공 가능한 항목들 각각에 대한 확률값을 출력하고, 이미지 분류 모듈(120)은 확률값이 가장 높은 항목을 상기 이미지 분류 결과로서 출력하도록 구현될 수 있다. 예컨대, 이미지 분류 모듈(120)은 '소장'에 대한 3차원 특성 정보가 입력되면, 입력된 정보에 기초하여 상기 이미지를 '소장'에 대한 이미지로서 분류할 수 있다.The image classification module 120 may perform a classification operation on the input image based on the 3D characteristics estimated by the image characteristic estimation module 110. The image classification module 120 also includes a network learned based on deep learning, and can provide image classification results from the 3D characteristic information using the network. For example, the network outputs a probability value for each item that can be provided as a classification result, and the image classification module 120 may be implemented to output the item with the highest probability value as the image classification result. For example, when 3D characteristic information for 'small intestine' is input, the image classification module 120 may classify the image as an image for 'small intestine' based on the input information.

이미지 특성 추정 모듈(110) 및 이미지 분류 모듈(120)에 포함되는 네트워크는 컨볼루션 신경망(convolutional neural network (CNN))으로 구현될 수 있으나, 이에 한정되는 것은 아니고 공지된 다양한 종류의 신경망으로 구현될 수 있다. 도 2를 참조하면, 상기 CNN은 하나 이상의 콘볼루션 계층(convolutional layer), 통합 계층(pooling layer), 및 완전하게 연결된 계층(fully connected layer)으로 구성될 수 있다. 상기 CNN은 이미지 등과 같이 2차원 형태를 갖는 데이터에 대해 콘볼루션(convolution) 및 풀링(pooling) 과정을 반복하면서, 상기 데이터로부터 특징들을 획득(출력 특징 맵을 획득)하고, 추출된 특징들을 이용하여 분류(classification) 과정을 수행함으로써 결과를 출력할 수 있다.The network included in the image characteristic estimation module 110 and the image classification module 120 may be implemented as a convolutional neural network (CNN), but is not limited to this and may be implemented as various types of known neural networks. You can. Referring to FIG. 2, the CNN may be composed of one or more convolutional layers, a pooling layer, and a fully connected layer. The CNN repeats the convolution and pooling process for two-dimensional data such as images, acquires features from the data (obtains an output feature map), and uses the extracted features. The results can be output by performing a classification process.

도 1을 계속 참조하면, 학습 모듈(130)은 이미지 특성 추정 모듈(110) 및 이미지 분류 모듈(120) 각각에 포함된 네트워크의 학습 동작을 수행할 수 있다. 학습 모듈(130)은 상기 네트워크들 각각의 학습을 위한 목적 함수를 포함할 수 있다.Continuing to refer to FIG. 1 , the learning module 130 may perform a learning operation of the network included in each of the image characteristic estimation module 110 and the image classification module 120. The learning module 130 may include an objective function for learning each of the networks.

이미지 특성 추정 모듈(110)에 포함된 네트워크의 학습과 관련하여, 학습 모듈(130)은 이미지 특성 추정 모듈(110)에 의해 추정된 3차원 특성에 기초하여 이미지를 렌더링 및/또는 재구성하고, 렌더링 및/또는 재구성된 이미지와 정답 데이터(정답 이미지 등) 사이의 비교를 통해 이미지 특성 추정 모듈(110)의 네트워크에 대한 학습(네트워크의 업데이트)을 수행할 수 있다. 상기 목적 함수는 렌더링 및/또는 재구성된 이미지와 정답 데이터(정답 이미지 등) 사이의 차이가 최소화되는 방향으로 상기 네트워크를 학습하도록 구현될 수 있다. 네트워크의 학습이란, 상기 목적 함수에 기초하여 네트워크에 포함된 노드들 간의 가중치(weight)를 수정함으로써 상기 네트워크를 업데이트하는 것을 의미할 수 있다. 이미지 특성 추정 모듈(110)의 네트워크에 대한 학습 동작은 추후 도 6 내지 도 8을 통해 보다 상세히 설명하기로 한다.In relation to learning of the network included in the image characteristic estimation module 110, the learning module 130 renders and/or reconstructs the image based on the three-dimensional characteristics estimated by the image characteristic estimation module 110, and renders And/or learning (updating the network) for the network of the image characteristic estimation module 110 may be performed through comparison between the reconstructed image and the correct answer data (correct answer image, etc.). The objective function may be implemented to learn the network in a way that minimizes the difference between the rendered and/or reconstructed image and the correct answer data (correct answer image, etc.). Learning a network may mean updating the network by modifying the weights between nodes included in the network based on the objective function. The learning operation of the network of the image characteristic estimation module 110 will be described in more detail later with reference to FIGS. 6 to 8.

또한, 이미지 분류 모듈(120)에 포함된 네트워크의 학습과 관련하여, 상기 목적 함수는 예측된 분류 결과와 정답(ground truth)에 기초한 cross-entropy loss가 최소화되는 방향으로 상기 네트워크를 학습하도록 구현될 수 있으나, 이에 한정되는 것은 아니고 기 공지된 다양한 방식에 따라 상기 네트워크를 학습하도록 구현될 수 있다.In addition, with regard to learning of the network included in the image classification module 120, the objective function will be implemented to learn the network in a way that minimizes cross-entropy loss based on the predicted classification result and the ground truth. However, it is not limited to this and can be implemented to learn the network according to various known methods.

한편, 본 명세서에서 사용되는 '학습'은 러닝(learning), 훈련(또는 트레이닝(training))과 동일한 의미일 수 있으며, 학습을 수행한다는 의미는 네트워크가 학습을 수행하거나, 네트워크를 학습시키는 의미 모두를 포함할 수 있다.Meanwhile, 'learning' used in this specification may have the same meaning as learning or training (or training), and performing learning means that the network performs learning or trains the network. may include.

도 3은 본 개시의 예시적 실시 예에 따른 이미지 분류 방법을 설명하기 위한 플로우차트이다. 도 4는 본 개시의 예시적 실시 예에 따른 메쉬 기반의 3차원 특성 추정 동작을 설명하기 위한 도면이다. 도 5는 추정된 3차원 특성에 기반하여 이미지를 분류하는 동작의 예들을 나타낸 도면이다.Figure 3 is a flowchart for explaining an image classification method according to an exemplary embodiment of the present disclosure. FIG. 4 is a diagram for explaining a mesh-based 3D characteristic estimation operation according to an exemplary embodiment of the present disclosure. Figure 5 is a diagram showing examples of an operation for classifying an image based on estimated 3D characteristics.

도 3을 참조하면, 본 개시의 예시적 실시 예에 따른 이미지 분류 방법은 이미지가 입력되면(S300), 입력된 이미지에 대한 3차원 특성을 추정하는 단계(S310)를 포함할 수 있다.Referring to FIG. 3 , when an image is input (S300), the image classification method according to an exemplary embodiment of the present disclosure may include estimating three-dimensional characteristics of the input image (S310).

이와 관련하여 도 4를 참조하면, 이미지 분류 시스템(100)의 이미지 특성 추정 모듈(110)은 입력된 이미지에 대한 3차원 특성을 추정하고, 추정 결과를 출력할 수 있다. 상술한 바와 같이, 이미지 특성 추정 모듈(110)은 CNN 등과 같은 딥러닝 기반의 네트워크를 포함하여, 상기 네트워크를 통해 상기 3차원 특성을 추정할 수 있다.In this regard, referring to FIG. 4 , the image characteristic estimation module 110 of the image classification system 100 may estimate 3D characteristics of an input image and output an estimation result. As described above, the image characteristic estimation module 110 includes a deep learning-based network such as CNN, and can estimate the 3D characteristics through the network.

한편, 본 개시의 실시 예에 따르면, 이미지 특성 추정 모듈(110)은 상기 3차원 특성에 대한 정보를 메쉬(mesh) 형태로 추정하도록 구현될 수 있다. 상기 메쉬(mesh)는 복수의 정점(vertex; 402, 412), 상기 복수의 정점 중 두 개의 정점 사이에 연결되는 복수의 변(edge; 404, 414), 및 상기 복수의 정점 및 복수의 변에 의해 형성되는 영역을 나타내는 면(face; 406, 416)으로 구성될 수 있다. 상기 복수의 정점은 3차원 특성을 나타내는 3차원 포인트(좌표)에 해당할 수 있다. 본 명세서에서는 상기 3차원 특성에 대한 정보를 사각형 메쉬 형태로 표현하는 것으로 도시되어 있으나, 메쉬의 형태는 삼각형 등 다양한 다각형 형태로 표현될 수도 있다.Meanwhile, according to an embodiment of the present disclosure, the image characteristic estimation module 110 may be implemented to estimate information on the 3D characteristics in a mesh form. The mesh includes a plurality of vertices 402 and 412, a plurality of edges 404 and 414 connected between two of the vertices, and a plurality of vertices and a plurality of sides. It may be composed of faces 406 and 416 representing the area formed by. The plurality of vertices may correspond to 3D points (coordinates) representing 3D characteristics. In this specification, the information on the three-dimensional characteristics is shown as being expressed in the form of a square mesh, but the shape of the mesh may be expressed in various polygonal forms such as triangles.

이미지 특성 추정 모듈(110)은, 초기 메쉬(initial mesh; 400)를 상기 입력된 이미지의 3차원 특성에 대응하도록 변형함으로써, 상기 입력된 이미지의 3차원 특성을 추정하도록 구현될 수 있다. 구체적으로, 이미지 특성 추정 모듈(110)은 입력된 이미지의 윤곽이나 깊이 등 3차원적 특징에 따라 초기 메쉬(400)의 정점들(402) 중 적어도 일부의 위치를 변경함으로써, 초기 메쉬(400)로부터 변형된 메쉬(410)를 추정할 수 있다. 이에 더하여, 본 개시의 실시 예에 따른 이미지 특성 추정 모듈(110)의 네트워크는, 입력된 이미지의 색상 변화(color variance)를 더 반영하여 변형된 메쉬(410)를 추정하도록 학습될 수 있다. 실시 예에 따라, 상기 네트워크는 변형된 메쉬(410)의 면들(416) 간의 겹침을 방지하도록 학습될 수 있다. 이를 위해, 상기 네트워크는 변형된 메쉬(410)의 변들(414)의 길이 편차가 최소화되거나, 면들(416) 간의 넓이 차이가 최소화되도록 학습될 수 있다.The image characteristic estimation module 110 may be implemented to estimate the 3D characteristics of the input image by transforming the initial mesh 400 to correspond to the 3D characteristics of the input image. Specifically, the image characteristic estimation module 110 changes the position of at least some of the vertices 402 of the initial mesh 400 according to three-dimensional characteristics such as the outline or depth of the input image, thereby The deformed mesh 410 can be estimated from. In addition, the network of the image characteristic estimation module 110 according to an embodiment of the present disclosure may be trained to estimate the deformed mesh 410 by further reflecting the color variance of the input image. Depending on the embodiment, the network may be trained to prevent overlap between faces 416 of the deformed mesh 410. To this end, the network may be trained so that the length deviation of the sides 414 of the deformed mesh 410 is minimized or the area difference between the sides 416 is minimized.

이미지 특성 추정 모듈(110)은 변형된 메쉬(410)의 정점(412)에 대한 정보를 출력함으로써, 상기 입력된 이미지의 3차원 특성에 대한 정보를 제공할 수 있다. 예컨대 상기 출력되는 정보는 초기 메쉬(400)의 정점들(402) 각각으로부터의 위치 변화량(좌표 변화량 등)(412a)의 형태로 제공되거나, 변형된 메쉬(410)의 정점들(412) 각각의 위치값(좌표값 등)(412b)의 형태로 제공될 수 있다.The image characteristic estimation module 110 may provide information about the three-dimensional characteristics of the input image by outputting information about the vertices 412 of the deformed mesh 410. For example, the output information is provided in the form of a position change amount (coordinate change amount, etc.) 412a from each of the vertices 402 of the initial mesh 400, or each of the vertices 412 of the deformed mesh 410. It may be provided in the form of a position value (coordinate value, etc.) 412b.

다시 도 3을 설명한다.Figure 3 will be described again.

상기 이미지 분류 방법은, 추정된 3차원 특성에 기초하여 이미지를 분류하고(S320), 분류 결과를 제공하는 단계(S330)를 포함할 수 있다.The image classification method may include classifying an image based on the estimated 3D characteristics (S320) and providing a classification result (S330).

이미지 분류 시스템(100)의 이미지 분류 모듈(120)은, 이미지 특성 추정 모듈(110)에 의해 추정된 3차원 특성 정보에 기초하여, 상기 입력된 이미지(이미지에 포함된 객체)를 식별 및 분류할 수 있다.The image classification module 120 of the image classification system 100 identifies and classifies the input image (object included in the image) based on the 3D characteristic information estimated by the image characteristic estimation module 110. You can.

이와 관련하여 도 5의 (a)를 참조하면, 도 4에서 상술한 바와 같이 이미지 특성 추정 모듈(110)은 초기 메쉬(400)의 정점들(402) 각각으로부터의 위치 변화량(좌표 변화량 등)(412a)의 형태, 또는 변형된 메쉬(410)의 정점들(412) 각각의 위치값(좌표값 등)(412b)의 형태로서 상기 3차원 특성 정보를 제공할 수 있다.In this regard, referring to (a) of FIG. 5, as described above in FIG. 4, the image characteristic estimation module 110 calculates the amount of position change (coordinate change amount, etc.) from each of the vertices 402 of the initial mesh 400 ( The three-dimensional characteristic information may be provided in the form of 412a) or in the form of position values (coordinates, etc.) 412b of each vertex 412 of the deformed mesh 410.

이미지 분류 모듈(120)은 상기 3차원 특성 정보를 네트워크에 입력함으로써 상기 이미지에 대한 분류 결과를 획득할 수 있다. 예컨대 상기 네트워크의 출력값들 중 '소장(small intestine)'에 대응하는 출력값(확률값 등)이 가장 높을 경우, 이미지 분류 모듈(120)은 '소장'을 상기 분류 결과로서 제공할 수 있다.The image classification module 120 may obtain a classification result for the image by inputting the 3D characteristic information into a network. For example, if the output value (probability value, etc.) corresponding to 'small intestine' is the highest among the output values of the network, the image classification module 120 may provide 'small intestine' as the classification result.

도 5의 (b)의 실시 예를 참조하면, 이미지 분류 모듈(120)은 상기 입력된 이미지, 및 상기 3차원 특성 정보에 기초하여 렌더링된 깊이 이미지에 기초하여, 상기 입력된 이미지에 대한 분류 결과를 획득할 수도 있다. 이 경우, 이미지 분류 모듈(120)에 포함된 네트워크는 CNN으로 구현된 네트워크일 수 있다.Referring to the embodiment of FIG. 5 (b), the image classification module 120 provides a classification result for the input image based on the input image and a depth image rendered based on the 3D characteristic information. You can also obtain . In this case, the network included in the image classification module 120 may be a network implemented as a CNN.

상술한 동작에 의해, 이미지 분류 시스템(100)은 입력된 이미지에 대한 분류 결과를 제공할 수 있다. 한편, 분류 결과의 정확도를 향상시키기 위해서는, 이미지 특성 추정 모듈(110) 및 이미지 분류 모듈(120) 각각에 포함된 네트워크의 학습 과정이 요구된다.Through the above-described operation, the image classification system 100 can provide classification results for the input image. Meanwhile, in order to improve the accuracy of classification results, a learning process of the network included in each of the image characteristic estimation module 110 and the image classification module 120 is required.

도 6은 본 개시의 예시적 실시 예에 따른 이미지 분류 시스템의 학습 모듈이 이미지 특성 추정 모듈의 네트워크에 대한 학습을 수행하는 동작을 설명하기 위한 플로우차트이다. 도 7 내지 도 8은 도 6에 도시된 학습 모듈의 학습 동작과 관련된 예시도들이다.FIG. 6 is a flowchart illustrating an operation in which a learning module of an image classification system performs learning on a network of an image feature estimation module according to an exemplary embodiment of the present disclosure. Figures 7 and 8 are exemplary diagrams related to the learning operation of the learning module shown in Figure 6.

도 6을 참조하면, 이미지 특성 추정 모듈(110)의 네트워크에 대한 학습 동작은, 이미지에 대해 추정된 3차원 특성을 이용한 렌더링을 수행하고(S600), 렌더링 결과와 정답 데이터 간의 깊이(depth) 정보 차이에 기초하여, 이미지 특성 추정 모듈(110)에 포함된 네트워크의 학습을 수행하는 단계(S602)를 포함할 수 있다.Referring to FIG. 6, the learning operation for the network of the image characteristic estimation module 110 performs rendering using the 3D characteristics estimated for the image (S600) and depth information between the rendering result and the correct answer data. Based on the difference, a step (S602) of performing learning of the network included in the image characteristic estimation module 110 may be included.

S600 단계 내지 S602 단계와 관련하여 도 7을 참조하면, 이미지 분류 시스템(100) (또는 학습 모듈(130))은, 이미지 특성 추정 모듈(110)이 이미지(학습용 이미지 등)로부터 3차원 특성을 추정한 3차원 특성 정보를 출력하면, 출력된 3차원 특성 정보에 기초하여 렌더링을 수행할 수 있다. 이미지 분류 시스템(100)은 렌더링 결과(700)로부터 깊이 데이터(702)를 획득할 수 있다. 예컨대, 깊이 데이터(702)는 상기 이미지를 획득한 카메라의 방향에서 렌더링 결과(700)를 바라보았을 때의 깊이 데이터에 해당할 수 있다.Referring to FIG. 7 in relation to steps S600 to S602, the image classification system 100 (or learning module 130) has the image characteristic estimation module 110 estimate three-dimensional characteristics from images (training images, etc.). When 3D characteristic information is output, rendering can be performed based on the output 3D characteristic information. The image classification system 100 may obtain depth data 702 from the rendering result 700. For example, the depth data 702 may correspond to depth data when the rendering result 700 is viewed from the direction of the camera that acquired the image.

그리고, 이미지 분류 시스템(100)은 상기 이미지의 3차원 특성을 나타내는 정답 데이터(710)로부터 깊이 데이터(712)를 획득할 수 있다. 상기 깊이 데이터(702)와 마찬가지로, 깊이 데이터(712)는 상기 이미지를 획득한 카메라의 방향에서 정답 데이터(710)를 바라보았을 때의 깊이 데이터에 해당할 수 있다.Additionally, the image classification system 100 may obtain depth data 712 from the correct answer data 710 representing the three-dimensional characteristics of the image. Like the depth data 702, the depth data 712 may correspond to depth data when looking at the correct answer data 710 from the direction of the camera that acquired the image.

학습 모듈(130)은, 렌더링 결과(700)로부터 획득된 깊이 데이터(702)와, 정답 데이터(710)로부터 획득된 깊이 데이터(712) 사이의 차이에 기초하여, 이미지 특성 추정 모듈(110)의 네트워크에 대한 학습을 수행할 수 있다. 즉, 이미지 특성 추정 모듈(110)의 네트워크 학습을 위한 목적 함수는, 깊이 데이터(702, 712) 간의 차이가 최소화되는 방향으로 상기 네트워크를 학습하도록 구현될 수 있다. 상기 학습에 의해, 이미지 특성 추정 모듈(110)이 이미지로부터 3차원 특성 중 특히 깊이(depth)와 관련된 특성의 추정 정확도가 향상될 수 있다.The learning module 130 performs the image characteristic estimation module 110 based on the difference between the depth data 702 obtained from the rendering result 700 and the depth data 712 obtained from the correct answer data 710. Learning about the network can be performed. That is, the objective function for network learning of the image characteristic estimation module 110 may be implemented to learn the network in a way that minimizes the difference between the depth data 702 and 712. Through the above-mentioned learning, the image characteristic estimation module 110 can improve the estimation accuracy of 3D characteristics from the image, especially characteristics related to depth.

또한, 이미지 특성 추정 모듈(110)의 네트워크에 대한 학습 동작은, 상기 추정된 3차원 특성에 기초하여 이미지를 재구성하고(S610), 재구성된 이미지와 정답 이미지 간의 픽셀 값 차이에 기초하여 상기 네트워크의 학습을 수행하는 단계(S612)를 포함할 수 있다.In addition, the learning operation for the network of the image characteristic estimation module 110 reconstructs the image based on the estimated three-dimensional characteristics (S610), and determines the network based on the pixel value difference between the reconstructed image and the correct image. It may include a step of performing learning (S612).

S610 단계 내지 S612 단계와 관련하여 도 8을 참조하면, 이미지 분류 시스템(100)(또는 학습 모듈(130))은, 이미지 특성 추정 모듈(110)이 입력된 이미지(학습용 이미지 등)로부터 3차원 특성을 추정한 3차원 특성 정보를 출력하면, 출력된 3차원 특성 정보에 기초하여 상기 이미지를 재구성할 수 있다. Referring to FIG. 8 in relation to steps S610 to S612, the image classification system 100 (or learning module 130) calculates three-dimensional characteristics from the image that the image characteristic estimation module 110 inputs (learning image, etc.). If the estimated 3D characteristic information is output, the image can be reconstructed based on the output 3D characteristic information.

구체적으로, 이미지 분류 시스템(100)은 이미지 특성 추정 모듈(110)의 추정 결과에 기초하여 도 4에서 상술한 변형된 메쉬(410)를 획득할 수 있다. 변형된 메쉬(410)의 면(416)에는 복수의 픽셀들이 대응될 수 있다. 이미지 분류 시스템(100)은 변형된 메쉬(410)의 동일한 면(416)에 대응하는 픽셀들이 동일한 픽셀값(색상)을 갖도록 함으로써, 재구성된 이미지(800)를 생성할 수 있다. 예컨대, 재구성된 이미지(800)에서 동일한 면(416)에 대응하는 픽셀들의 픽셀값은, 상기 입력된 이미지의 픽셀들 중 해당 면(416)의 픽셀들의 평균 픽셀값에 해당할 수 있으나, 이에 한정되는 것은 아니다. Specifically, the image classification system 100 may acquire the deformed mesh 410 described above in FIG. 4 based on the estimation result of the image characteristic estimation module 110. A plurality of pixels may correspond to the face 416 of the deformed mesh 410. The image classification system 100 may generate a reconstructed image 800 by ensuring that pixels corresponding to the same side 416 of the deformed mesh 410 have the same pixel value (color). For example, the pixel value of the pixels corresponding to the same side 416 in the reconstructed image 800 may correspond to the average pixel value of the pixels of the corresponding side 416 among the pixels of the input image, but is limited to this. It doesn't work.

이미지 분류 시스템(100)은 재구성된 이미지(800)와 정답 이미지(810) 사이의 색상 차이(픽셀값 차이)에 기초하여, 이미지 특성 추정 모듈(110)의 네트워크에 대한 학습을 수행할 수 있다. 즉, 이미지 특성 추정 모듈(110)의 네트워크 학습을 위한 목적 함수는, 재구성된 이미지(800)와 정답 이미지(810) 간의 색상 차이가 최소화되는 방향으로 상기 네트워크를 학습하도록 구현될 수 있다. 즉, 이미지 특성 추정 모듈(110)은 3차원 특성의 추정 시 이미지 내 픽셀들의 색상 변화(color variance)를 보다 정확히 고려하여 3차원 특성을 추정하도록 학습될 수 있다.The image classification system 100 may perform learning on the network of the image characteristic estimation module 110 based on the color difference (pixel value difference) between the reconstructed image 800 and the correct image 810. That is, the objective function for network learning of the image characteristic estimation module 110 can be implemented to learn the network in a way that minimizes the color difference between the reconstructed image 800 and the correct image 810. That is, the image characteristic estimation module 110 can be trained to estimate 3D characteristics by more accurately considering the color variance of pixels in the image when estimating 3D characteristics.

따라서, 본 개시의 실시 예에 따른 이미지 특성 추정 모듈(110)은 이미지의 3차원 특성 추정 시 깊이 정보뿐만 아니라 이미지 내의 색상 변화를 고려하여 3차원 특성을 추정함으로써, 보다 정확한 추정 결과를 제공하여 이미지 분류의 정확도를 향상시킬 수 있다.Therefore, the image characteristic estimation module 110 according to an embodiment of the present disclosure estimates the 3D characteristics by considering color changes within the image as well as depth information when estimating the 3D characteristics of the image, thereby providing more accurate estimation results and The accuracy of classification can be improved.

도시되지는 않았으나, 실시 예에 따라 이미지 특성 추정 모듈(110)은 변형된 메쉬(410)의 변들 간의 길이 편차가 최소화되는 방향으로 학습되거나, 면들 간의 넓이 차이가 최소화되는 방향으로 학습될 수도 있다.Although not shown, depending on the embodiment, the image characteristic estimation module 110 may be trained in a direction that minimizes the length difference between the sides of the deformed mesh 410, or may be trained in a direction that minimizes the area difference between the sides.

도 9는 본 개시의 예시적 실시 예에 따른 이미지 분류 방법을 수행하는 디바이스의 개략적인 블록도이다.Figure 9 is a schematic block diagram of a device performing an image classification method according to an exemplary embodiment of the present disclosure.

도 9를 참조하면, 본 개시의 실시 예에 따른 디바이스(900)는 도 1에서 상술한 이미지 분류 시스템(100)을 구성하는 적어도 하나의 컴퓨팅 장치 중 어느 하나에 대응할 수 있다. 이 경우, 디바이스(900)는 본 명세서에서 상술한 이미지의 3차원 특성 추정 동작, 추정된 3차원 특성에 기초한 이미지 분류 동작 및/또는 학습 동작을 수행하는 디바이스에 해당할 수 있다. Referring to FIG. 9, a device 900 according to an embodiment of the present disclosure may correspond to any one of at least one computing device constituting the image classification system 100 described above in FIG. 1. In this case, the device 900 may correspond to a device that performs a 3D characteristic estimation operation of an image, an image classification operation based on the estimated 3D characteristics, and/or a learning operation described above in this specification.

이러한 디바이스(900)는 프로세서(910) 및 메모리(920)를 포함할 수 있다. 다만, 디바이스(900)의 구성 요소가 전술한 예에 한정되는 것은 아니다. 예를 들어, 디바이스(900)는 전술한 구성 요소들보다 더 많은 구성 요소를 포함하거나 더 적은 구성 요소를 포함할 수 있다. 또한, 프로세서(910)는 적어도 하나일 수 있으며, 메모리(920) 또한 적어도 하나일 수 있다. 또한, 프로세서(910) 및 메모리(920) 중 둘 이상이 하나의 칩으로 결합된 형태일 수도 있다.This device 900 may include a processor 910 and memory 920. However, the components of the device 900 are not limited to the examples described above. For example, device 900 may include more or fewer components than the components described above. Additionally, there may be at least one processor 910 and there may be at least one memory 920. Additionally, two or more of the processor 910 and the memory 920 may be combined into one chip.

일 실시 예에 따라, 프로세서(910)는 상술한 이미지 특성 추정 모듈(110), 이미지 분류 모듈(120), 및 학습 모듈(130) 중 적어도 하나에 대응하거나, 상기 모듈들 중 적어도 하나를 실행 또는 제어할 수 있다.According to one embodiment, the processor 910 corresponds to at least one of the above-described image feature estimation module 110, image classification module 120, and learning module 130, or executes at least one of the modules. You can control it.

이러한 프로세서(910)는 CPU, AP(application processor), 집적 회로, 마이크로컴퓨터, ASIC(application specific integrated circuit), FPGA(field programmable gate array), 및/또는 NPU(neural processing unit) 등의 하드웨어를 포함할 수 있다.This processor 910 includes hardware such as a CPU, an application processor (AP), an integrated circuit, a microcomputer, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), and/or a neural processing unit (NPU). can do.

본 개시의 일 실시 예에 따르면, 메모리(920)는 디바이스(900)의 동작에 필요한 프로그램 및 데이터를 저장할 수 있다.According to an embodiment of the present disclosure, the memory 920 may store programs and data necessary for the operation of the device 900.

또한, 메모리(920)는 프로세서(910)를 통해 생성되거나 획득된 데이터 중 적어도 하나를 저장할 수 있다. 실시 예에 따라, 메모리(920)는 이미지 특성 추정 모듈(110), 이미지 분류 모듈(120), 및/또는 학습 모듈(130)와 관련된 데이터, 명령어, 알고리즘 등을 저장할 수 있다.Additionally, the memory 920 may store at least one of data generated or acquired through the processor 910. Depending on the embodiment, the memory 920 may store data, commands, algorithms, etc. related to the image characteristic estimation module 110, the image classification module 120, and/or the learning module 130.

메모리(920)는 롬(ROM), 램(RAM), 플래시 메모리, SSD, HDD 등의 저장 매체 또는 저장 매체들의 조합으로 구성될 수 있다.The memory 920 may be composed of a storage medium such as ROM, RAM, flash memory, SSD, or HDD, or a combination of storage media.

상기한 실시 예들의 설명은 본 개시의 더욱 철저한 이해를 위하여 도면을 참조로 예를 든 것들에 불과하므로, 본 개시의 기술적 사상을 한정하는 의미로 해석되어서는 안될 것이다. The description of the above-described embodiments is merely an example with reference to the drawings for a more thorough understanding of the present disclosure, and should not be construed as limiting the technical idea of the present disclosure.

또한, 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 개시의 기본적 원리를 벗어나지 않는 범위 내에서 다양한 변화와 변경이 가능함은 명백하다 할 것이다.In addition, it will be clear to those skilled in the art to which this disclosure pertains that various changes and modifications can be made without departing from the basic principles of the present disclosure.

Claims

In an image classification system,
An image feature estimation module that estimates three-dimensional features of an input image; and
An image classification module that classifies the image based on three-dimensional characteristics estimated by the image characteristic estimation module,
The image feature estimation module is,
A first network based on deep learning learned to estimate the three-dimensional characteristics by reflecting the outline of the input image, depth, and color changes between pixels in the input image,
The image classification system further includes a learning module for learning the first network,
The learning module is,
Update the first network based on color differences between corresponding pixels between the image reconstructed based on the estimated three-dimensional characteristics and the input image,
Is the image reconstructed based on the three-dimensional characteristics an image reconstructed so that pixels corresponding to the same side of the deformed mesh have the same pixel value?
Image classification system.

According to paragraph 1,
The image feature estimation module is,
Estimating the positions of a plurality of 3D points corresponding to the 3D characteristics of the image, and outputting values corresponding to the estimated positions as estimation results of the 3D characteristics,
Image classification system.

According to paragraph 2,
Estimate a mesh representing the three-dimensional characteristics of the image,
The plurality of three-dimensional points correspond to a plurality of vertices included in the mesh,
Image classification system.

According to paragraph 1,
The learning module is,
Updating the first network based on the difference between a rendering result using the estimated 3D characteristics and the correct answer data corresponding to the 3D characteristics of the input image,
Image classification system.

delete

According to paragraph 3,
The image classification module is,
Further comprising a deep learning-based second network learned to classify the image based on values corresponding to the positions of the plurality of 3D points,
Image classification system.

According to paragraph 3,
The image classification module is,
Further comprising a deep learning-based second network learned to classify the input image based on the depth image and the input image rendered based on the plurality of 3D points,
Image classification system.

According to paragraph 1,
The image classification system is implemented with at least one computing device,
Each of the at least one computing device includes a processor and memory,
Image classification system.

estimating three-dimensional characteristics of the input image; and
Classifying the input image based on estimated three-dimensional characteristics,
The step of estimating the three-dimensional characteristics is,
Using a deep learning-based first network learned to estimate the three-dimensional characteristics by reflecting the outline, depth, and color changes between pixels in the input image, 3 for the input image This is the step of estimating dimensional characteristics,
The step of updating the first network is,
Updating the first network based on color differences between corresponding pixels between the image reconstructed based on the estimated three-dimensional characteristics and the input image,
The image reconstructed based on the three-dimensional characteristics is an image reconstructed so that pixels corresponding to the same side of the deformed mesh have the same pixel value,
Image classification method.

According to clause 9,
The step of estimating the three-dimensional characteristics is,
estimating the positions of a plurality of 3D points corresponding to 3D characteristics of the input image; and
Including outputting values corresponding to the estimated position as an estimation result of the three-dimensional characteristic,
Image classification method.

According to clause 10,
The plurality of 3D points correspond to a plurality of vertices included in a mesh estimated to represent the 3D characteristics of the input image.
Image classification method.

According to clause 9,
Further comprising updating the first network based on the difference between a rendering result using the estimated 3D characteristics and the correct answer data corresponding to the 3D characteristics of the input image,
Image classification method.

delete

According to clause 11,
The step of classifying the input image is,
A step of classifying the input image using a second deep learning-based network learned to classify the image based on values corresponding to the positions of the plurality of 3D points,
Image classification method.

According to clause 11,
The step of classifying the input image is,
generating a depth image based on the plurality of 3D points;
A step of classifying the input image using a deep learning-based second network learned to classify the input image based on the generated depth image and the input image,
Image classification method.