KR20240106312A

KR20240106312A - Apparatus and method for haptic texture prediction

Info

Publication number: KR20240106312A
Application number: KR1020220189057A
Authority: KR
Inventors: 전석희; 비비 줄리 줄레카; 와심 하산
Original assignee: 경희대학교 산학협력단
Filing date: 2022-12-29
Publication date: 2024-07-08

Abstract

이미지 특징으로부터 햅틱 질감 특징을 예측하는 햅틱 질감 예측 장치 및 방법이 개시된다. 일 실시예에 따른 햅틱 질감 예측 장치는 입력된 이미지로부터 이미지 특징 벡터를 생성하는 이미지 특징 추출부; 및 이미지 특징 벡터에 기초하여 입력된 이미지에 포함된 물체의 표면에 대한 촉각 특징 벡터를 생성하는 촉각 특징 예측부를 포함할 수 있다.A haptic texture prediction apparatus and method for predicting haptic texture features from image features are disclosed. A haptic texture prediction device according to an embodiment includes an image feature extraction unit that generates an image feature vector from an input image; and a tactile feature prediction unit that generates a tactile feature vector for the surface of an object included in the input image based on the image feature vector.

Description

Haptic texture prediction device and method {Apparatus and method for haptic texture prediction}

이미지 특징으로부터 햅틱 질감 특징을 예측하는 햅틱 질감 예측 장치 및 방법에 관한 것이다.It relates to a haptic texture prediction device and method for predicting haptic texture features from image features.

인간이 질감에 대한 정보를 얻기 위해 사용하는 최초의 매체는 시각적 감각이다. 질감의 외관은 대부분의 경우 질감의 물리적 속성을 성공적으로 식별할 수 있는 충분한 정보를 제공할 수 있다. 나아가, 질감에 대한 심층적인 정보를 얻기 위하여 인간은 촉각에 의존한다. 일상 생활의 상호 작용에서 인간은 이 두 가지 감각을 사용하여 그들 주변의 모든 촉각의 촉각적 속성을 식별할 수 있다.The first medium humans use to obtain information about texture is the visual sense. The appearance of a texture can, in most cases, provide sufficient information to successfully identify the physical properties of the texture. Furthermore, humans rely on the sense of touch to obtain in-depth information about texture. In everyday life interactions, humans can use these two senses to identify the tactile properties of all the tactile sensations around them.

시각적 특징은 RGB 모델로 쉽게 설명할 수 있다. RGB 모델은 물체의 시각적 특징을 식별하기 위한 낮은 수준의 설명자이다. 반면, 물체의 촉각적 특징은 RGB 모델과 유사한 시스템을 가지고 있지 않다.Visual features can be easily described with the RGB model. The RGB model is a low-level descriptor for identifying visual characteristics of objects. On the other hand, the tactile characteristics of objects do not have a system similar to the RGB model.

한국등록특허공보 제10-2398389호(2022.05.16)Korean Patent Publication No. 10-2398389 (2022.05.16)

이미지 특징으로부터 햅틱 질감 특징을 예측하는 햅틱 질감 예측 장치 및 방법을 제공하는데 목적이 있다.The purpose is to provide a haptic texture prediction device and method for predicting haptic texture features from image features.

일 양상에 따르면, 햅틱 질감 예측 장치는 입력된 이미지로부터 이미지 특징 벡터를 생성하는 이미지 특징 추출부; 및 이미지 특징 벡터에 기초하여 입력된 이미지에 포함된 물체의 표면에 대한 촉각 특징 벡터를 생성하는 촉각 특징 예측부를 포함할 수 있다. According to one aspect, a haptic texture prediction device includes an image feature extraction unit that generates an image feature vector from an input image; and a tactile feature prediction unit that generates a tactile feature vector for the surface of an object included in the input image based on the image feature vector.

이미지 추출부는 서로 다른 둘 이상의 이미지 특징 추출 모듈을 포함하며, 서로 다른 둘 이상의 이미지 특징 추출 모듈의 출력을 연결(concatenation)하여 이미지 특징 벡터를 생성할 수 있다. The image extraction unit includes two or more different image feature extraction modules, and can generate an image feature vector by concatenating the outputs of two or more different image feature extraction modules.

이미지 특징 추출 모듈은 ResNet50, 지역 이진 패턴(Local Binary Pattern, LBP) 및 명암도 동시발생 행렬(Gray-Level Co-occurrence Matrix, GLCM) 중 적어도 둘 이상일 수 있다. The image feature extraction module may be at least two of ResNet50, Local Binary Pattern (LBP), and Gray-Level Co-occurrence Matrix (GLCM).

촉각 특징 예측부는 이미지 특징 벡터로부터 촉각 특징을 예측하여 촉각 특징 벡터를 생성도록 학습된 인공 신경망을 포함할 수 있다. The tactile feature prediction unit may include an artificial neural network trained to generate a tactile feature vector by predicting a tactile feature from an image feature vector.

촉각 특징 벡터는 거친-매끄러운(rough-smooth), 평평한-울퉁불퉁한(flat-bumpy), 끈적한-미끄러운(sticky-slippery) 및 단단한-부드러운(hard-soft)에 대한 4차원 공간으로 구성될 수 있다. The tactile feature vector can be organized in a four-dimensional space for rough-smooth, flat-bumpy, sticky-slippery and hard-soft. .

인공 신경망은 하나 이상의 학습 이미지 및 하나 이상의 학습 이미지 각각에 대한 정답값인 촉각 특징 벡터로 구성된 학습 데이터를 기초로 학습될 수 있다. The artificial neural network can be trained based on learning data consisting of one or more learning images and a tactile feature vector that is the correct answer value for each of the one or more learning images.

인공 신경망은 1차원 합성곱 신경망(Convolutional Neural Network, CNN)일 수 있다. The artificial neural network may be a one-dimensional convolutional neural network (CNN).

인공 신경망은 서로 다른 크기의 커널이 적용되는 둘 이상의 하위-합성곱 신경망(sub-CNN)으로 구성될 수 있다. An artificial neural network may be composed of two or more sub-convolutional neural networks (sub-CNN) to which kernels of different sizes are applied.

둘 이상의 하위-합성곱 신경망은 병렬로 구성되어 각각 이미지 특징 벡터를 입력 받으며, 둘 이상의 하위-합성곱 신경망의 출력은 완전 연결층(fully connected layer)에서 연결(concatenate)될 수 있다. Two or more sub-convolutional neural networks are configured in parallel and each receives an image feature vector, and the outputs of two or more sub-convolutional neural networks can be concatenated in a fully connected layer.

일 양상에 따르면, 하나 이상의 프로세서들, 및 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 컴퓨팅 장치에서 수행되는 햅틱 질감 예측 방법은 입력된 이미지로부터 이미지 특징 벡터를 생성하는 이미지 특징 추출 단계; 및 이미지 특징 벡터에 기초하여 입력된 이미지에 포함된 물체의 표면에 대한 촉각 특징 벡터를 생성하는 촉각 특징 예측 단계를 포함할 수 있다. According to one aspect, a haptic texture prediction method performed on a computing device having one or more processors and a memory storing one or more programs executed by the one or more processors includes generating an image feature vector from an input image. feature extraction step; and a tactile feature prediction step of generating a tactile feature vector for the surface of an object included in the input image based on the image feature vector.

이미지 추출 단계는 서로 다른 둘 이상의 이미지 특징 추출 모듈을 이용하며, 서로 다른 둘 이상의 이미지 특징 추출 모듈의 출력을 연결(concatenation)하여 이미지 특징 벡터를 생성할 수 있다. The image extraction step uses two or more different image feature extraction modules, and can generate an image feature vector by concatenating the outputs of two or more different image feature extraction modules.

촉각 특징 예측 단계는 이미지 특징 벡터로부터 촉각 특징을 예측하여 촉각 특징 벡터를 생성도록 학습된 인공 신경망을 이용할 수 있다. The tactile feature prediction step may use an artificial neural network trained to generate a tactile feature vector by predicting a tactile feature from an image feature vector.

일 양상에 따르면, 비일시적 컴퓨터 판독 가능한 저장 매체(non-transitory computer readable storage medium)에 저장된 컴퓨터 프로그램으로서, 컴퓨터 프로그램은 하나 이상의 명령어들을 포함하고, 명령어들은 하나 이상의 프로세서들을 갖는 컴퓨팅 장치에 의해 실행될 때, 컴퓨팅 장치로 하여금, 입력된 이미지로부터 이미지 특징 벡터를 생성하는 이미지 특징 추출 단계; 및 이미지 특징 벡터에 기초하여 입력된 이미지에 포함된 물체의 표면에 대한 촉각 특징 벡터를 생성하는 촉각 특징 예측 단계를 수행하도록 할 수 있다. According to one aspect, a computer program stored on a non-transitory computer readable storage medium, wherein the computer program includes one or more instructions, the instructions being executed by a computing device having one or more processors. , an image feature extraction step of generating an image feature vector from an input image by a computing device; and a tactile feature prediction step of generating a tactile feature vector for the surface of an object included in the input image based on the image feature vector.

일 실시예에 다른 햅틱 질감 예측 장치를 이용하는 경우, 질감 표면을 인간 피실험자에 의해 등급이 매겨진 촉각 속성에 의해 정의할 수 있다. 또한, 4차원의 햅틱 속성 공간에 기초하여 특정 물체의 질감을 인지적으로 확인할 수 있게 된다.In one embodiment, using another haptic texture prediction device, a textured surface may be defined by tactile properties rated by human subjects. Additionally, it is possible to cognitively confirm the texture of a specific object based on the four-dimensional haptic attribute space.

도 1은 일 실시예에 따른 햅틱 질감 예측 장치의 구성도이다.
도 2는 일 실시예에 따른 햅틱 질감 예측 장치의 구성을 설명하기 위한 예시도이다.
도 3은 일 예에 따른 촉각 특징 벡터를 설명하기 위한 예시도이다.
도 4는 일 예에 따른 학습 데이터를 설명하기 위한 예시도이다.
도 5는 일 실시예에 따른 인공 신경망의 구조를 설명하기 위한 예시도이다.
도 6은 일 실시예에 따른 햅틱 질감 예측 방법을 도시한 흐름도이다.
도 7은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도이다.1 is a configuration diagram of a haptic texture prediction device according to an embodiment.
Figure 2 is an exemplary diagram for explaining the configuration of a haptic texture prediction device according to an embodiment.
Figure 3 is an example diagram for explaining a tactile feature vector according to an example.
Figure 4 is an example diagram for explaining learning data according to an example.
Figure 5 is an example diagram for explaining the structure of an artificial neural network according to an embodiment.
Figure 6 is a flowchart illustrating a haptic texture prediction method according to an embodiment.
7 is a block diagram illustrating and illustrating a computing environment including a computing device suitable for use in example embodiments.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The detailed description below is provided to facilitate a comprehensive understanding of the methods, devices and/or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing the embodiments of the present invention, if it is determined that a detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the terms described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the contents throughout this specification. The terminology used in the detailed description is only for describing embodiments of the present invention and should in no way be limiting. Unless explicitly stated otherwise, singular forms include plural meanings. In this description, expressions such as “comprising” or “including” are intended to indicate certain features, numbers, steps, operations, elements, parts or combinations thereof, and one or more than those described. It should not be construed to exclude the existence or possibility of any other characteristic, number, step, operation, element, or part or combination thereof.

또한, 제1, 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 구성 요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로 사용될 수 있다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다. Additionally, terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. Terms may be used for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may be referred to as the first component without departing from the scope of the present invention.

도 1은 일 실시예에 따른 햅틱 질감 예측 장치의 구성도이다.1 is a configuration diagram of a haptic texture prediction device according to an embodiment.

일 실시예에 따르면, 햅틱 질감 예측 장치(100)는 입력된 이미지로부터 이미지 특징 벡터를 생성하는 이미지 특징 추출부(110) 및 이미지 특징 벡터에 기초하여 입력된 이미지에 포함된 물체의 표면에 대한 촉각 특징 벡터를 생성하는 촉각 특징 예측부(120)를 포함할 수 있다. According to one embodiment, the haptic texture prediction device 100 includes an image feature extractor 110 that generates an image feature vector from an input image, and a tactile sense for the surface of an object included in the input image based on the image feature vector. It may include a tactile feature prediction unit 120 that generates a feature vector.

일 예로, 햅틱 질감 예측 장치(100)는 지각적으로 의미 있는 햅틱 속성을 기초로 질감 표면이 위치한 표준화된 공간을 제공할 수 있다. 예를 들어, 햅틱 질감 예측 장치(100)는 이미지에서 질감 표면의 햅틱 속성을 정확하게 예측한 후 햅틱 속성 공간에서 정량 가능한 햅틱 특징 측면에서 햅틱 질감을 위치시킬 수 있다. As an example, the haptic texture prediction device 100 may provide a standardized space where a textured surface is located based on perceptually meaningful haptic properties. For example, the haptic texture prediction device 100 may accurately predict the haptic properties of a textured surface in an image and then locate the haptic texture in terms of quantifiable haptic features in the haptic property space.

여기서, 햅틱 속성 공간은 거친-매끄러운(rough-smooth), 평평한-울퉁불퉁한(flat-bumpy), 끈적한-미끄러운(sticky-slippery) 및 단단한-부드러운(hard-soft)의 4차원으로 이루어지는 공간일 수 있다. 이러한 4차원의 햅틱 속성 공간을 정의함에 따라 햅틱 속성 공간의 값만 가지고도 특정 물체의 표면 질감을 인지적으로 확인할 수 있게 된다. Here, the haptic property space can be a space consisting of four dimensions: rough-smooth, flat-bumpy, sticky-slippery, and hard-soft. there is. By defining this four-dimensional haptic attribute space, it becomes possible to cognitively confirm the surface texture of a specific object using only the values of the haptic attribute space.

예를 들어, 시각적 질감은 RGB 값을 통해 인지적으로 확인할 수 있듯이, 표면 질감은 개시되는 실시예에서 정의되는 4차원의 햅틱 속성 공간의 값을 통해 인지적으로 확인할 수 있게 된다. 햅틱 속성 공간은 촉각 특징 벡터로 표현할 수 있으며 이에 대한 자세한 설명은 후술하기로 한다. For example, just as visual texture can be perceptually confirmed through RGB values, surface texture can be perceptually confirmed through values of the four-dimensional haptic attribute space defined in the disclosed embodiment. The haptic attribute space can be expressed as a tactile feature vector, which will be described in detail later.

일 실시예에 따르면, 이미지 추출부(110)는 서로 다른 둘 이상의 이미지 특징 추출 모듈을 포함할 수 있다. 예를 들어, 이미지 특징 추출 모듈은 ResNet50, 지역 이진 패턴(Local Binary Pattern, LBP) 및 명암도 동시발생 행렬(Gray-Level Co-occurrence Matrix, GLCM) 중 적어도 둘 이상일 수 있다. According to one embodiment, the image extraction unit 110 may include two or more different image feature extraction modules. For example, the image feature extraction module may be at least two of ResNet50, Local Binary Pattern (LBP), and Gray-Level Co-occurrence Matrix (GLCM).

도 2를 참조하면, 이미지 추출부(110)는 ResNet50, 지역 이진 패턴(LBP) 및 명암도 동시발생 행렬(GLCM)을 포함하여 구성될 수 있다. 이때, 이미지 추출부(110)는 ResNet50, 지역 이진 패턴(LBP) 및 명암도 동시발생 행렬(GLCM)은 각각 이미지를 입력 받을 수 있다. Referring to FIG. 2, the image extraction unit 110 may be configured to include ResNet50, local binary pattern (LBP), and brightness co-occurrence matrix (GLCM). At this time, the image extraction unit 110 can receive images of ResNet50, local binary pattern (LBP), and brightness co-occurrence matrix (GLCM), respectively.

일 예로, ResNet50 모델은 입력 이미지에 포함된 표면 이미지에서 더 높은 수준의 심층 특징을 캡처할 수 있다. 예를 들어, ResNet50 모델은 입력 이미지의 크기를 224Х224로 조절할 수 있으며, 1Х1000 크기의 이미지 특징 벡터를 출력할 수 있다. As an example, the ResNet50 model can capture higher-level, deep features from surface images included in the input image. For example, the ResNet50 model can adjust the size of the input image to 224Х224 and output an image feature vector with a size of 1Х1000.

일 예로, 이미지 추출부(110)는 지역 이진 패턴(LBP)을 사용하여 이미지의 로컬 픽셀 정보를 계산할 수 있다. 지역 이진 패턴(LBP)은 원형 이웃 영역을 임계화하여 이미지의 픽셀 값을 비교하며, 이를 통하여 지역 공간 패턴을 계산할 수 있다. 지역 이진 패턴(LBP)는 입력 이미지를 224Х224 크기의 여러 셀로 나눌 수 있으며, 각 셀에 대해 LBP 연산을 수행하여 1Х59 크기의 특징 벡터를 생성할 수 있다. 이후, 지역 이진 패턴(LBP)은 각 셀에서 얻은 특징 벡터를 조합하여 1Х2891 크기의 이미지 특징 벡터를 출력할 수 있다.As an example, the image extractor 110 may calculate local pixel information of the image using a local binary pattern (LBP). Local binary pattern (LBP) compares pixel values of an image by thresholding circular neighboring regions, thereby calculating local spatial patterns. Local binary pattern (LBP) can divide the input image into several cells of size 224Х224, and perform the LBP operation on each cell to generate a feature vector of size 1Х59. Afterwards, the local binary pattern (LBP) can output an image feature vector of size 1Х2891 by combining the feature vectors obtained from each cell.

일 예로, 명암도 동시발생 행렬(GLCM)은 입력 이미지를 1568Х1568로 크기를 조정할 수 있으며, 크기 조정된 입력 이미지에 GLCM 방법을 적용하여 8Х8 매트릭스를 생성할 수 있다. 이후, 명암도 동시발생 행렬(GLCM)은 이 행렬을 평탄화하여 1Х64 크기의 이미지 특징 벡터를 생성할 수 있다. As an example, the brightness co-occurrence matrix (GLCM) can resize the input image to 1568Х1568, and apply the GLCM method to the resized input image to generate an 8Х8 matrix. Afterwards, the intensity co-occurrence matrix (GLCM) can flatten this matrix to generate an image feature vector of size 1Х64.

일 실시예에 따르면, 이미지 추출부(110)는 서로 다른 둘 이상의 이미지 특징 추출 모듈의 출력을 연결(concatenation)하여 이미지 특징 벡터를 생성할 수 있다. According to one embodiment, the image extraction unit 110 may generate an image feature vector by concatenating the outputs of two or more different image feature extraction modules.

일 예에 따르면, 도 2에서와 같이 이미지 추출부(110)는 ResNet50, 지역 이진 패턴(LBP) 및 명암도 동시발생 행렬(GLCM)의 출력을 입력 받아 특징 벡터를 연결할 수 있다. 예를 들어, ResNet50, 지역 이진 패턴(LBP) 및 명암도 동시발생 행렬(GLCM)을 사용하여 표면 특징을 캡처한 후 특징 벡터를 연결하여 크기가 1Х3955인 이미지 특징 벡터를 생성한 후 촉각 특징 예측부(120)에 대한 입력으로 전달할 수 있다.According to one example, as shown in FIG. 2, the image extractor 110 may receive outputs of ResNet50, local binary pattern (LBP), and brightness co-occurrence matrix (GLCM) and connect feature vectors. For example, after capturing the surface features using ResNet50, local binary pattern (LBP) and intensity co-occurrence matrix (GLCM), and then concatenating the feature vectors to generate an image feature vector with size 1Х3955, the tactile feature prediction unit ( 120) can be passed as input.

일 실시예에 따르면, 촉각 특징 예측부(120)는 이미지 특징 벡터로부터 촉각 특징을 예측하여 촉각 특징 벡터를 생성도록 학습된 인공 신경망을 포함할 수 있다. According to one embodiment, the tactile feature prediction unit 120 may include an artificial neural network trained to generate a tactile feature vector by predicting a tactile feature from an image feature vector.

일 예에 따르면, 촉각 특징 벡터는 거친-매끄러운(rough-smooth), 평평한-울퉁불퉁한(flat-bumpy), 끈적한-미끄러운(sticky-slippery) 및 단단한-부드러운(hard-soft)에 대한 4차원 공간으로 구성될 수 있다. 도 3을 참조하면, 특정 입력 이미지에 대한 촉각 특징 벡터는 거친-매끄러운(rough-smooth), 평평한-울퉁불퉁한(flat-bumpy), 끈적한-미끄러운(sticky-slippery) 및 단단한-부드러운(hard-soft)으로 구분되는 4개의 축을 이용하여 4차원으로 표시될 수 있다. According to one example, the tactile feature vector is a four-dimensional space for rough-smooth, flat-bumpy, sticky-slippery, and hard-soft. It can be composed of: Referring to Figure 3, the tactile feature vectors for a specific input image are rough-smooth, flat-bumpy, sticky-slippery, and hard-soft. ) can be displayed in four dimensions using four axes separated by ).

일 실시예에 따르면, 인공 신경망은 하나 이상의 학습 이미지 및 하나 이상의 학습 이미지 각각에 대한 정답값인 촉각 특징 벡터로 구성된 학습 데이터를 기초로 학습될 수 있다. According to one embodiment, the artificial neural network may be trained based on learning data consisting of one or more learning images and tactile feature vectors that are correct values for each of the one or more learning images.

일 예로, 학습 데이터를 구성하는 학습 이미지는 도 4와 같이 나타낼 수 있다. 예를 들어, 학습 이미지는 100개의 표면 특징에 대한 이미지로 구성될 수 있다. 또한, 학습 데이터는 학습 이미지에 대한 촉각 특징 벡터를 포함할 수 있다. As an example, learning images constituting learning data may be shown as shown in FIG. 4. For example, a training image may consist of images of 100 surface features. Additionally, the training data may include tactile feature vectors for the training images.

일 예에 따르면, 학습 데이터의 촉각 특징 벡터는 100개의 다른 질감 표면에 대한 학습 이미지에 대하여 인간 피험자의 다차원 척도법 (Multidimensional scaling, MDS)을 통해 생성될 수 있다. 예를 들어, 학습 이미지에 대한 촉각 특징 벡터는 도 3과 같이 4차원 공간으로 표시될 수 있다. 이에 따라, 모든 촉각은 4차원 공간에 위치할 수 있다.According to one example, tactile feature vectors of training data can be generated through multidimensional scaling (MDS) of human subjects on training images of 100 different textured surfaces. For example, the tactile feature vector for a learning image can be displayed in a 4-dimensional space as shown in FIG. 3. Accordingly, all tactile sensations can be located in four-dimensional space.

일 실시예에 따르면, 인공 신경망은 1차원 합성곱 신경망(Convolutional Neural Network, CNN)일 수 있다. 예를 들어, 이미지 특징 벡터와 촉각 특징 벡터의 관계는 도 5와 같이 다중 스케일 1차원 합성곱 신경망을 사용하여 설정될 수 있다. 1차원 합성곱 신경망은 이미지 특징 추출부(110)에서 추출된 이미지 특징 벡터를 입력으로 취하고, 이미지에 대한 촉각 특징을 예측할 수 있다. According to one embodiment, the artificial neural network may be a one-dimensional convolutional neural network (CNN). For example, the relationship between the image feature vector and the tactile feature vector can be established using a multi-scale one-dimensional convolutional neural network as shown in FIG. 5. The one-dimensional convolutional neural network takes the image feature vector extracted from the image feature extractor 110 as input and can predict tactile features for the image.

일 실시예에 따르면, 인공 신경망은 서로 다른 크기의 커널이 적용되는 둘 이상의 하위-합성곱 신경망(sub-CNN)으로 구성될 수 있다. 도 5를 참조하면, 인공 신경망(121)은 두 개의 하위 합성곱 신경망(122, 123)으로 구성될 수 있다. According to one embodiment, the artificial neural network may be composed of two or more sub-convolutional neural networks (sub-CNN) to which kernels of different sizes are applied. Referring to FIG. 5, the artificial neural network 121 may be composed of two lower convolutional neural networks 122 and 123.

일 예로, 두 개의 하위-합성곱 신경망은 서로 다른 규모에서 세부 정보를 캡처하므로 매크로 및 마이크로 수준 정보를 별도로 캡처할 수 있다. 예를 들어, 각각의 하위-합성곱 신경망은 5개의 1차원 합성곱 층, 2개의 1차원 최대 풀링 층 및 2개의 완전 연결 층으로 구성될 수 있다. 합성곱 층은 특징 추출을 담당하며, 최대 풀링층은 각 특징 맵의 차원을 줄일 수 있다. 이때, 합성곱 층은 서로 다른 수의 커널이 서로 다른 스케일로 적용되어 서로 다른 스케일의 로컬 공간 정보가 캡처 될 수 있다. As an example, two sub-convolutional neural networks capture detail at different scales, allowing macro- and micro-level information to be captured separately. For example, each sub-convolutional neural network may consist of five one-dimensional convolutional layers, two one-dimensional max pooling layers, and two fully connected layers. The convolution layer is responsible for feature extraction, and the max pooling layer can reduce the dimensionality of each feature map. At this time, different numbers of kernels are applied to the convolution layer at different scales so that local spatial information at different scales can be captured.

일 예에 따르면, 두 개의 하위-합성곱 신경망은 각각 컨볼루션 연산에서는 1Х3, 1Х5 크기의 커널을 연산할 수 있다. 그리고, 최대 풀링층은 1Х2 블록에서 수행될 수 있다. 예를 들어, 두 개의 하위-합성곱 신경망 각각의 첫 번째 컨볼루션 계층은 32개의 커널, 두 번째 컨볼루션 계층은 64개의 커널, 세 번째와 네 번째는 128개의 커널, 다섯 번째 컨볼루션 계층은 256개의 커널을 사용할 수 있다. 예를 들어, 1차원 합성곱 층은 아래 수학식과 같은 계산을 수행할 수 있다. According to one example, two sub-convolutional neural networks can operate kernels of size 1Х3 and 1Х5 in convolution operations, respectively. And, the maximum pooling layer can be performed in 1Х2 blocks. For example, the first convolutional layer of each of two sub-convolutional neural networks has 32 kernels, the second convolutional layer has 64 kernels, the third and fourth have 128 kernels, and the fifth convolutional layer has 256 kernels. Kernels can be used. For example, a one-dimensional convolution layer can perform calculations such as the equation below.

[수학식 1][Equation 1]

여기서 g_i는 i번째 필터의 계산 결과, a_n은 크기 1ХN인 입력 데이터, w_i는 크기 1ХN인 i번째 컨벌루션 커널 벡터, b_i는 i번째 필터의 바이어스를 나타내며, ReLU 비선형 활성화 함수는 f로 표시된다.Here, g _i is the calculation result of the ith filter, a _n is the input data of size 1ХN, w _i is the ith convolution kernel vector of size 1ХN, b _i represents the bias of the ith filter, and the ReLU nonlinear activation function is f. displayed.

일 실시예에 따르면, 인공 신경망에 포함된 둘 이상의 하위-합성곱 신경망은 병렬로 구성되어 각각 이미지 특징 벡터를 입력 받으며, 둘 이상의 하위-합성곱 신경망의 출력은 완전 연결층(fully connected layer)에서 연결(concatenate)될 수 있다. According to one embodiment, two or more sub-convolutional neural networks included in the artificial neural network are configured in parallel and each receives an image feature vector, and the output of the two or more sub-convolutional neural networks is in a fully connected layer. Can be concatenated.

예를 들어, 각각의 하위-합성곱 신경망은 각각 100개와 50개의 노드가 있는 두 개의 완전 연결(FC) 층으로 끝날 수 있다. 이후, 100개의 노드가 있는 또 다른 완전 연결층을 이용하여 둘 이상의 하위-합성곱 신경망을 통해 얻은 특징을 연결할 수 있다. For example, each sub-convolutional network might end up with two fully connected (FC) layers with 100 and 50 nodes respectively. Afterwards, another fully connected layer with 100 nodes can be used to connect the features obtained through two or more sub-convolutional neural networks.

도 6은 일 실시예에 따른 햅틱 질감 예측 방법을 도시한 흐름도이다.Figure 6 is a flowchart illustrating a haptic texture prediction method according to an embodiment.

일 실시예에 따르면, 햅틱 질감 예측 장치는 입력된 이미지로부터 이미지 특징 벡터를 생성할 수 있다(610). 일 예로, 햅틱 질감 예측 장치는 이미지 추출을 위하여 서로 다른 둘 이상의 이미지 특징 추출 모듈을 이용할 수 있다. 햅틱 질감 예측 장치는 서로 다른 둘 이상의 이미지 특징 추출 모듈의 출력을 연결(concatenation)하여 이미지 특징 벡터를 생성할 수 있다. 이때, 이미지 특징 추출 모듈은 ResNet50, 지역 이진 패턴(Local Binary Pattern, LBP) 및 명암도 동시발생 행렬(Gray-Level Co-occurrence Matrix, GLCM) 중 적어도 둘 이상일 수 있다.According to one embodiment, the haptic texture prediction device may generate an image feature vector from an input image (610). As an example, a haptic texture prediction device may use two or more different image feature extraction modules to extract images. The haptic texture prediction device can generate an image feature vector by concatenating the outputs of two or more different image feature extraction modules. At this time, the image feature extraction module may be at least two of ResNet50, Local Binary Pattern (LBP), and Gray-Level Co-occurrence Matrix (GLCM).

일 실시예에 따르면, 햅틱 질감 예측 장치는 이미지 특징 벡터에 기초하여 입력된 이미지에 포함된 물체의 표면에 대한 촉각 특징 벡터를 생성할 수 있다. 일 예로, 햅틱 질감 예측 장치는 촉각 특징 예측을 위하여 이미지 특징 벡터로부터 촉각 특징을 예측하여 촉각 특징 벡터를 생성도록 학습된 인공 신경망을 이용할 수 있다. 여기서, 촉각 특징 벡터는 거친-매끄러운(rough-smooth), 평평한-울퉁불퉁한(flat-bumpy), 끈적한-미끄러운(sticky-slippery) 및 단단한-부드러운(hard-soft)에 대한 4차원 공간으로 구성될 수 있다.According to one embodiment, the haptic texture prediction device may generate a haptic feature vector for the surface of an object included in an input image based on the image feature vector. As an example, a haptic texture prediction device may use an artificial neural network that has been trained to generate a tactile feature vector by predicting a tactile feature from an image feature vector to predict a tactile feature. Here, the tactile feature vector will consist of a four-dimensional space for rough-smooth, flat-bumpy, sticky-slippery, and hard-soft. You can.

일 예에 따르면, 햅틱 질감 예측 장치에 포함된 인공 신경망은 하나 이상의 학습 이미지 및 하나 이상의 학습 이미지 각각에 대한 정답값인 촉각 특징 벡터로 구성된 학습 데이터를 기초로 학습될 수 있다. 또한, 인공 신경망은 1차원 합성곱 신경망(Convolutional Neural Network, CNN)일 수 있다. 구체적으로, 인공 신경망은 서로 다른 크기의 커널이 적용되는 둘 이상의 하위-합성곱 신경망(sub-CNN)으로 구성될 수 있다. 이때, 둘 이상의 하위-합성곱 신경망은 병렬로 구성되어 각각 이미지 특징 벡터를 입력 받으며, 둘 이상의 하위-합성곱 신경망의 출력은 완전 연결층(fully connected layer)에서 연결(concatenate)될 수 있다.According to one example, the artificial neural network included in the haptic texture prediction device may be trained based on learning data consisting of one or more learning images and tactile feature vectors that are correct values for each of the one or more learning images. Additionally, the artificial neural network may be a one-dimensional convolutional neural network (CNN). Specifically, an artificial neural network may be composed of two or more sub-convolutional neural networks (sub-CNN) to which kernels of different sizes are applied. At this time, two or more sub-convolutional neural networks are configured in parallel and each receives an image feature vector, and the outputs of two or more sub-convolutional neural networks may be concatenated in a fully connected layer.

도 6의 실시예 중 도 1 내지 도 5를 참조하여 설명한 내용과 중복되는 내용은 생략한다.Among the embodiments of FIG. 6, content that overlaps with content described with reference to FIGS. 1 to 5 will be omitted.

도 7은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술된 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.FIG. 7 is a block diagram illustrating and illustrating a computing environment 10 including computing devices suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities in addition to those described below, and may include additional components in addition to those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 햅틱 질감 예측 장치(12)는 비디오 표현 학습 장치일 수 있다.The illustrated computing environment 10 includes a computing device 12 . In one embodiment, the haptic texture prediction device 12 may be a video representation learning device.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 12 includes at least one processor 14, a computer-readable storage medium 16, and a communication bus 18. Processor 14 may cause computing device 12 to operate in accordance with the example embodiments noted above. For example, processor 14 may execute one or more programs stored on computer-readable storage medium 16. The one or more programs may include one or more computer-executable instructions, which, when executed by the processor 14, cause computing device 12 to perform operations according to example embodiments. It can be.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 20 stored in the computer-readable storage medium 16 includes a set of instructions executable by the processor 14. In one embodiment, computer-readable storage medium 16 includes memory (volatile memory, such as random access memory, non-volatile memory, or an appropriate combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, another form of storage medium that can be accessed by computing device 12 and store desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.Communication bus 18 interconnects various other components of computing device 12, including processor 14 and computer-readable storage medium 16.

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide an interface for one or more input/output devices 24. The input/output interface 22 and the network communication interface 26 are connected to the communication bus 18. Input/output device 24 may be coupled to other components of computing device 12 through input/output interface 22. Exemplary input/output devices 24 include, but are not limited to, a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touch screen), a voice or sound input device, various types of sensor devices, and/or imaging devices. It may include input devices and/or output devices such as display devices, printers, speakers, and/or network cards. The exemplary input/output device 24 may be included within the computing device 12 as a component constituting the computing device 12, or may be connected to the computing device 12 as a separate device distinct from the computing device 12. It may be possible.

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although representative embodiments of the present invention have been described in detail above, those skilled in the art will understand that various modifications can be made to the above-described embodiments without departing from the scope of the present invention. . Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the claims described later but also by equivalents to the claims.

100: 햅틱 질감 예측 장치
110: 이미지 특징 추출부
120: 촉각 특징 예측부
121: 인공 신경망100: Haptic texture prediction device
110: Image feature extraction unit
120: Tactile feature prediction unit
121: Artificial neural network

Claims

an image feature extraction unit that generates an image feature vector from the input image; and
A haptic texture prediction device comprising a tactile feature prediction unit that generates a tactile feature vector for a surface of an object included in the input image based on the image feature vector.

According to claim 1,
The image extraction unit
Contains two or more different image feature extraction modules,
A haptic texture prediction device that generates the image feature vector by concatenating outputs of the two or more different image feature extraction modules.

According to claim 2,
The image feature extraction module is at least two of ResNet50, Local Binary Pattern (LBP), and Gray-Level Co-occurrence Matrix (GLCM). A haptic texture prediction device.

According to claim 1,
The tactile feature prediction unit
A haptic texture prediction device comprising an artificial neural network trained to predict tactile features from image feature vectors and generate a tactile feature vector.

According to claim 4,
The tactile feature vector is
A haptic texture prediction device, consisting of a four-dimensional space for rough-smooth, flat-bumpy, sticky-slippery and hard-soft.

According to claim 4,
The artificial neural network is
A haptic texture prediction device that is learned based on learning data consisting of one or more learning images and tactile feature vectors that are correct values for each of the one or more learning images.

According to claim 4,
The artificial neural network is a 1-dimensional convolutional neural network (CNN), a haptic texture prediction device.

According to claim 4,
The artificial neural network is
A haptic texture prediction device consisting of two or more sub-convolutional neural networks (sub-CNN) with kernels of different sizes applied.

According to claim 8,
The two or more sub-convolutional neural networks are configured in parallel and each receives the image feature vector as input,
The outputs of the two or more sub-convolutional neural networks are concatenated in a fully connected layer.

one or more processors, and
A haptic texture prediction method performed on a computing device having a memory storing one or more programs executed by the one or more processors, comprising:
An image feature extraction step of generating an image feature vector from the input image; and
A haptic texture prediction method comprising a tactile feature prediction step of generating a tactile feature vector for a surface of an object included in the input image based on the image feature vector.

According to claim 10,
The image extraction step is
Using two or more different image feature extraction modules,
A haptic texture prediction method for generating the image feature vector by concatenating outputs of the two or more different image feature extraction modules.

According to claim 11,
The image feature extraction module is at least two of ResNet50, Local Binary Pattern (LBP), and Gray-Level Co-occurrence Matrix (GLCM). A haptic texture prediction method.

According to claim 10,
The tactile feature prediction step is
A haptic texture prediction method that uses an artificial neural network trained to predict tactile features from image feature vectors and generate tactile feature vectors.

According to claim 13,
The tactile feature vector is
A haptic texture prediction method, consisting of a four-dimensional space for rough-smooth, flat-bumpy, sticky-slippery and hard-soft.

According to claim 13,
The artificial neural network is
A haptic texture prediction method that is learned based on learning data consisting of one or more learning images and tactile feature vectors that are correct values for each of the one or more learning images.

According to claim 13,
A haptic texture prediction method wherein the artificial neural network is a one-dimensional convolutional neural network (CNN).

According to claim 13,
The artificial neural network is
A haptic texture prediction method consisting of two or more sub-convolutional neural networks (sub-CNN) with different sized kernels applied.

According to claim 17,
The two or more sub-convolutional neural networks are configured in parallel and each receives the image feature vector as input,
A method for predicting haptic texture, wherein the outputs of the two or more sub-convolutional neural networks are concatenated in a fully connected layer.

A computer program stored on a non-transitory computer readable storage medium,
The computer program includes one or more instructions that, when executed by a computing device having one or more processors, cause the computing device to:
An image feature extraction step of generating an image feature vector from the input image; and
A computer program that performs a tactile feature prediction step of generating a tactile feature vector for the surface of an object included in the input image based on the image feature vector.