KR102291111B1

KR102291111B1 - Zero Shot Recognition Apparatus Based on Self-Supervision and Method Thereof

Info

Publication number: KR102291111B1
Application number: KR1020200011824A
Authority: KR
Inventors: 변혜란; 김호성; 이제욱; 홍기범
Original assignee: 연세대학교 산학협력단
Priority date: 2020-01-31
Filing date: 2020-01-31
Publication date: 2021-08-17
Also published as: KR20210098082A

Abstract

본 실시예들은 단일 카테고리에 관한 속성 정보를 변환하고 변환된 속성 정보를 기반으로 적대적 생성 신경망을 통해 범용 특징 정보를 생성하여, 복합 카테고리 데이터 세트에 대하여 제로샷 인식 성능을 향상시킬 수 있는 제로샷 인식 방법 및 장치를 제공한다. Zero-shot recognition capable of improving zero-shot recognition performance for a complex category data set by transforming attribute information on a single category and generating general-purpose feature information through an adversarial generation neural network based on the transformed attribute information Methods and apparatus are provided.

Description

Zero Shot Recognition Apparatus Based on Self-Supervision and Method Thereof}

본 실시예가 속하는 기술 분야는 제로샷 인식 장치 및 방법에 관한 것이다. The technical field to which this embodiment belongs relates to a zero-shot recognition apparatus and method.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information for the present embodiment and does not constitute the prior art.

제로샷 학습(Zero Shot Learning)은 학습 데이터에 포함되지 않은 처음 보는(Unseen) 클래스를 인식할 수 있는 기술이다. 제로샷 학습 기술의 원리는 학습에 사용된(Seen) 클래스의 데이터로부터 해당 클래스와 관련된 속성 정보(Attribute)를 올바르게 추론할 수 있도록 모델을 학습하면, 처음 보는 클래스의 데이터에 대해서도 해당 클래스의 속성 정보를 추론하여 처음 보는 클래스를 인식할 수 있다.Zero-shot learning is a technology that can recognize unseen classes that are not included in the training data. The principle of the zero-shot learning technology is that if the model is trained to correctly infer the attribute information related to the class from the data of the class used for learning (Seen), the attribute information of the corresponding class even for the data of the first class By inferring the class, it is possible to recognize the class it sees for the first time.

데이터 세트에 정의된 속성 정보(given attribute)는 클래스를 대표하는 주요 특성을 의미하며, 예를 들어 새의 부리 길이, 날개 색상, 몸 크기 등이 있다. 기존 제로샷 학습 기술들은 속성 정보를 핵심 단서로 사용하기 때문에, 데이터 세트에 학습용으로 미리 정의되어 있지 않다면 정확도에 영향을 준다.The given attribute defined in the data set means the main characteristics representing the class, such as the length of the bird's beak, wing color, and body size. Existing zero-shot learning techniques use attribute information as key clues, so if the data set is not predefined for training, it affects the accuracy.

사용자가 정의한 속성 정보는 제로샷 학습에서 서로 다른 카테고리를 구분함에 있어서 필수 요소 중 하나이다. 속성 정보를 다양한 카테고리에 대해 새롭게 정의하는 것은 사용자의 노동력과 전문적 지식이 필요하고 많은 시간과 비용이 필요하므로, 사용자가 속성 정보를 전부 설정하는 것은 현실적으로 불가능하다.User-defined attribute information is one of the essential elements in classifying different categories in zero-shot learning. Newly defining attribute information for various categories requires the user's labor and professional knowledge, and requires a lot of time and money, so it is practically impossible for the user to set all of the attribute information.

기존의 제로샷 학습 모델은 사용자에 의해 정의되어 있지 않은 개념을 전혀 이해를 할 수 없는 문제가 있다. 예를 들어, 말과 얼룩말을 구분하기 위해서 필요한 "줄무늬"라는 속성 정보가 사용자에 의해 정의되어 있지 않은 경우, 기존의 제로샷 학습 모델은 말과 얼룩말을 구분하지 못하는 실정이다.The existing zero-shot learning model has a problem in that it cannot understand concepts that are not defined by the user at all. For example, when attribute information of “stripes” necessary for distinguishing between horses and zebras is not defined by the user, the existing zero-shot learning model cannot distinguish between horses and zebras.

단일 카테고리 데이터 세트는 하나의 카테고리(클래스의 집합)에 한정된 클래스로만 이루어진 데이터 세트이다. 예를 들면 꽃 종류들만 포함된 데이터 세트 또는 새 종류들만 포함된 데이터 세트 등이 있다.A single-category data set is a data set that consists only of classes limited to one category (a set of classes). For example, a data set containing only flower types or a data set containing only bird types.

복합 카테고리 데이터 세트는 둘 이상의 카테고리로 이루어져 있으며, 단일화된 속성 정보를 정의하기 어려운 데이터 세트이다. 예를 들면 강아지, 꽃, 자동차, 가구 등을 포함하는 서로 다른 카테고리들로 구성된다.A complex category data set consists of two or more categories, and it is a data set in which it is difficult to define unified attribute information. It consists of different categories, including, for example, dogs, flowers, cars, furniture, etc.

하나의 카테고리에서 학습된 모델은 다른 카테고리에서 사용할 수 없는 한계가 있다. 속성 정보의 체계가 다르므로, 새 종류로 학습된 제로샷 모델은 꽃 종류의 데이터 세트에 사용할 수 없다. 다른 카테고리를 인식 가능한 제로샷 학습 모델이 없는 실정이다.A model trained in one category has limitations that cannot be used in other categories. Since the system of attribute information is different, a zero-shot model trained as a new type cannot be used for a data set of a flower type. There is no zero-shot learning model that can recognize other categories.

US 2018/0197050 (2018.07.12)US 2018/0197050 (2018.07.12)

본 발명의 실시예들은 단일 카테고리에 관한 속성 정보를 변환하고 변환된 속성 정보를 기반으로 적대적 생성 신경망을 통해 범용 특징 정보를 생성하여, 복합 카테고리 데이터 세트에 대하여 제로샷 인식 성능을 향상시키는 데 주된 목적이 있다.Embodiments of the present invention transform attribute information about a single category and generate general-purpose feature information through an adversarial generation neural network based on the transformed attribute information, to improve zero-shot recognition performance for a complex category data set. There is this.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Other objects not specified in the present invention may be additionally considered within the scope that can be easily inferred from the following detailed description and effects thereof.

본 실시예의 일 측면에 의하면, 컴퓨팅 디바이스에 의한 제로샷 인식 방법에 있어서, 단일 카테고리에 대한 원 속성 정보를 자기 지도 기반의 속성 변환 모델을 통해 새로운 속성 정보로 변환하는 단계, 상기 새로운 속성 정보를 이용하여 특징 생성 모델에서 정의된 손실 함수를 기반으로 복합 카테고리에 적용 가능한 범용 특징 정보를 생성하는 단계, 및 상기 범용 특징 정보를 상기 복합 카테고리 기반의 제로샷 학습 모델에 전송하는 단계를 포함하는 제로샷 인식 방법을 제공한다.According to one aspect of this embodiment, in the zero-shot recognition method by a computing device, converting original attribute information for a single category into new attribute information through a self-map-based attribute transformation model, using the new attribute information generating general-purpose feature information applicable to a complex category based on a loss function defined in the feature generation model, and transmitting the general-purpose feature information to the complex category-based zero-shot learning model. provide a way

본 실시예의 다른 측면에 의하면, 하나 이상의 프로세서 및 상기 하나 이상의 프로세서에 의해 실행되는 하나 이상의 프로그램을 저장하는 메모리를 포함하는 제로샷 인식 장치에 있어서, 상기 프로세서는 단일 카테고리에 대한 원 속성 정보를 자기 지도 기반의 속성 변환 모델을 통해 새로운 속성 정보로 변환하고, 상기 프로세서는 상기 새로운 속성 정보를 이용하여 특징 생성 모델에서 정의된 손실 함수를 기반으로 복합 카테고리에 적용 가능한 범용 특징 정보를 생성하고, 상기 프로세서는 상기 범용 특징 정보를 상기 복합 카테고리 기반의 제로샷 학습 모델에 전송하는 것을 특징으로 하는 제로샷 인식 장치를 제공한다.According to another aspect of this embodiment, in the zero-shot recognition apparatus including one or more processors and a memory for storing one or more programs executed by the one or more processors, the processor self-maps raw attribute information for a single category converted into new attribute information through an attribute transformation model based on the attribute, and the processor generates general-purpose characteristic information applicable to a complex category based on a loss function defined in the characteristic creation model using the new attribute information, the processor comprising: There is provided a zero-shot recognition apparatus characterized in that the general-purpose characteristic information is transmitted to the zero-shot learning model based on the complex category.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 단일 카테고리에 관한 속성 정보를 변환하고 변환된 속성 정보를 기반으로 적대적 생성 신경망을 통해 범용 특징 정보를 생성하여, 복합 카테고리 데이터 세트에 대하여 제로샷 인식 성능을 향상시킬 수 있는 효과가 있다.As described above, according to the embodiments of the present invention, by transforming attribute information on a single category and generating general-purpose feature information through an adversarial generation neural network based on the transformed attribute information, zero-shot for a complex category data set It has the effect of improving recognition performance.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if it is an effect not explicitly mentioned herein, the effects described in the following specification expected by the technical features of the present invention and their potential effects are treated as if they were described in the specification of the present invention.

도 1은 기존의 제로샷 학습 모델을 예시한 도면이다.
도 2는 단일 카테고리 및 복합 카테고리를 예시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 제로샷 인식 장치를 예시한 블록도이다.
도 4는 본 발명의 일 실시예에 따른 제로샷 인식 장치의 속성 변환 모델을 예시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 제로샷 인식 장치의 특징 생성 모델을 예시한 도면이다.
도 6은 본 발명의 다른 실시예에 따른 제로샷 인식 방법을 예시한 흐름도이다.
도 7 및 도 8은 본 발명의 실시예들에 따른 시뮬레이션 결과를 예시한 도면이다.1 is a diagram illustrating an existing zero-shot learning model.
2 is a diagram illustrating a single category and a composite category.
3 is a block diagram illustrating a zero-shot recognition apparatus according to an embodiment of the present invention.
4 is a diagram illustrating an attribute conversion model of a zero-shot recognition apparatus according to an embodiment of the present invention.
5 is a diagram illustrating a feature generation model of a zero-shot recognition apparatus according to an embodiment of the present invention.
6 is a flowchart illustrating a zero-shot recognition method according to another embodiment of the present invention.
7 and 8 are diagrams illustrating simulation results according to embodiments of the present invention.

이하, 본 발명을 설명함에 있어서 관련된 공지기능에 대하여 이 분야의 기술자에게 자명한 사항으로서 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하고, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. Hereinafter, in the description of the present invention, if it is determined that the subject matter of the present invention may be unnecessarily obscured as it is obvious to those skilled in the art with respect to related known functions, the detailed description thereof will be omitted, and some embodiments of the present invention will be described. It will be described in detail with reference to exemplary drawings.

도 1은 기존의 제로샷 학습 모델을 예시한 도면이다.1 is a diagram illustrating an existing zero-shot learning model.

기존의 딥러닝이 학습 데이터로 학습하지 않은 새로운 레이블을 인식할 수 없는 문제를 해결하기 위해 개발된 제로샷 학습은 다른 형식의 데이터를 이용하여 새로운 데이터를 인식할 수 있으나, 기존의 제로샷 학습 역시 속성 정보가 정의되지 않으면 새로운 클래스를 인식할 수 없는 문제가 있다.Zero-shot learning, developed to solve the problem of not being able to recognize new labels that have not been trained with training data by existing deep learning, can recognize new data using other types of data, but the existing zero-shot learning is also If the attribute information is not defined, there is a problem that the new class cannot be recognized.

기존의 제로샷 학습은 속성 정보가 정의된 데이터 세트에서만 학습이 가능하다. 하나의 카테고리로 학습된 모델은 다른 카테고리에 사용할 수 없는 한계를 갖는다. 예를 들어, 새 종류들로 학습된 제로샷 모델은 꽃 종류들의 데이터 세트에 사용할 수 없다. 정의된 속성 정보의 체계가 다르고, 서로 다른 카테고리를 인식할 수 있는 제로샷 학습 기술이 없기 때문이다. Existing zero-shot learning can only be learned from a data set in which attribute information is defined. A model trained in one category has limitations that cannot be used in other categories. For example, a zero-shot model trained on bird species cannot be used on a data set of flower types. This is because the defined attribute information system is different and there is no zero-shot learning technology that can recognize different categories.

기존의 제로샷 학습 모델을 예시한 도 1을 참조하면, 기존의 제로샷 학습은 학습에 사용한 클래스(seen class)의 데이터로부터 해당 클래스와 관련된 속성 정보(attribute)를 올바르게 추론할 수 있도록 모델을 학습하면, 처음 보는 클래스(Unseen Class)의 데이터에 대해서도 해당 클래스의 속성 정보를 추론한 후 처음 보는 클래스를 인식한다.Referring to FIG. 1 illustrating the existing zero-shot learning model, the conventional zero-shot learning trains the model to correctly infer attribute information related to the class from the data of the see class used for learning. Then, even for the data of the first class (Unseen Class), the property information of the class is inferred and the class seen for the first time is recognized.

학습에 사용한 클래스(Seen Class)는 속성 정보를 학습하는데 활용되는 학습용 클래스이고, 처음 보는 클래스(Unseen Class)는 학습에서 전혀 사용되지 않은 테스트용 제로샷 클래스이다. 단일 카테고리 데이터 세트는 하나의 카테고리(클래스의 집합)에 한정된 클래스로만 이루어진 데이터 세트이다. 예를 들면, 꽃 종류들만 포함된 데이터 세트, 새 종류들만 포함된 데이터 세트 등이 있다. The class used for learning (Seen Class) is a learning class used to learn property information, and the first class (Unseen Class) is a zero-shot class for testing that is not used at all in learning. A single-category data set is a data set that consists only of classes limited to one category (a set of classes). For example, a data set containing only flower types, a data set containing only bird types, etc.

데이터 세트에 정의된 속성 정보(Given Attribute)는 클래스를 대표하는 주요 특성을 의미하다. 예를 들어, 새의 부리 길이, 날개 색상, 몸 크기 등이 있다. 또는 말의 다리 길이, 머리 모양, 꼬리 모양 등이 있다. 제로샷 학습 모델은 속성 정보를 핵심 단서로 사용한다. 데이터 세트에 학습용으로 미리 정의되어 있는 경우 제로샷 학습시 정확도가 낮아지는 문제가 있다.The given attribute defined in the data set means the main attribute representing the class. For example, a bird's beak length, wing color, and body size. Or the length of the horse's legs, the shape of the head, the shape of the tail, etc. The zero-shot learning model uses attribute information as a key clue. If the data set is predefined for training, there is a problem in that the accuracy is lowered during zero-shot training.

데이터 세트에 정의되지 않은 속성 정보(Not Given Attribute)는 기존 데이터 세트에 정의되어 있지 않은 속성 정보를 의미한다.Attribute information not defined in the data set (Not Given Attribute) means attribute information not defined in the existing data set.

도 2는 단일 카테고리 및 복합 카테고리를 예시한 도면이다.2 is a diagram illustrating a single category and a composite category.

기존의 제로샷 학습 기술은 단일 카테고리 데이터 세트에서 연구되며, 복합 카테고리 데이터 세트에서는 제로샷 기술의 분류 정확도가 높지 않다.Existing zero-shot learning techniques are studied on single-category data sets, and the classification accuracy of zero-shot techniques is not high in complex category data sets.

본 실시예에서는 자기 지도 기반으로 속성 정보를 변환하고, 변환된 속성 정보를 기반으로 범용 특징 정보를 생성하여 기존 제로샷 학습 방식의 한계를 극복한다. 본 실시예는 범용 특징 정보를 통해 복합 카테고리 데이터 세트에 포함된 객체들이나 학습에서 사용되지 않은 객체들을 인식할 수 있다.The present embodiment overcomes the limitations of the existing zero-shot learning method by transforming attribute information based on self-direction and generating general-purpose characteristic information based on the transformed attribute information. The present embodiment may recognize objects included in the complex category data set or objects not used in learning through general-purpose feature information.

도 3은 본 발명의 일 실시예에 따른 제로샷 인식 장치를 예시한 블록도이다.3 is a block diagram illustrating a zero-shot recognition apparatus according to an embodiment of the present invention.

제로샷 인식 장치(110)는 적어도 하나의 프로세서(120), 컴퓨터 판독 가능한 저장매체(130) 및 통신 버스(170)를 포함한다. The zero-shot recognition apparatus 110 includes at least one processor 120 , a computer-readable storage medium 130 , and a communication bus 170 .

프로세서(120)는 제로샷 인식 장치(110)로 동작하도록 제어할 수 있다. 예컨대, 프로세서(120)는 컴퓨터 판독 가능한 저장 매체(130)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 컴퓨터 실행 가능 명령어는 프로세서(120)에 의해 실행되는 경우 제로샷 인식 장치(110)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.The processor 120 may control to operate as the zero-shot recognition apparatus 110 . For example, the processor 120 may execute one or more programs stored in the computer-readable storage medium 130 . The one or more programs may include one or more computer-executable instructions, which, when executed by the processor 120 , configure the zero-shot recognition device 110 to perform operations according to the exemplary embodiment. can be

컴퓨터 판독 가능한 저장 매체(130)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능한 저장 매체(130)에 저장된 프로그램(140)은 프로세서(120)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독한 가능 저장 매체(130)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 제로샷 인식 장치(110)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 130 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 140 stored in the computer-readable storage medium 130 includes a set of instructions executable by the processor 120 . In one embodiment, the computer-readable storage medium 130 includes memory (volatile memory, such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, It may be flash memory devices, other types of storage media that can be accessed by the zero-shot recognition apparatus 110 and store desired information, or a suitable combination thereof.

통신 버스(170)는 프로세서(120), 컴퓨터 판독 가능한 저장 매체(140)를 포함하여 제로샷 인식 장치(110)의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus 170 interconnects various other components of the zero-shot recognition device 110 , including the processor 120 and the computer-readable storage medium 140 .

제로샷 인식 장치(110)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(150) 및 하나 이상의 통신 인터페이스(160)를 포함할 수 있다. 입출력 인터페이스(150) 및 통신 인터페이스(160)는 통신 버스(170)에 연결된다. 입출력 장치(미도시)는 입출력 인터페이스(150)를 통해 제로샷 인식 장치(110)의 다른 컴포넌트들에 연결될 수 있다.The zero-shot recognition device 110 may also include one or more input/output interfaces 150 and one or more communication interfaces 160 that provide interfaces for one or more input/output devices 24 . The input/output interface 150 and the communication interface 160 are connected to the communication bus 170 . The input/output device (not shown) may be connected to other components of the zero-shot recognition device 110 through the input/output interface 150 .

제로샷 인식 장치(110)는 단일 카테고리에 대한 원 속성 정보를 자기 지도 기반의 속성 변환 모델을 통해 새로운 속성 정보로 변환하고, 새로운 속성 정보를 이용하여 특징 생성 모델에서 정의된 손실 함수를 기반으로 복합 카테고리에 적용 가능한 범용 특징 정보를 생성하고, 범용 특징 정보를 복합 카테고리 기반의 제로샷 학습 모델에 전송하고, 제로샷 학습 모델을 통해 범용 특징 정보를 기준으로 미사용된 클래스의 데이터를 인식한다.The zero-shot recognition apparatus 110 converts original attribute information for a single category into new attribute information through a self-map-based attribute transformation model, and uses the new attribute information to create a complex based on a loss function defined in the feature generation model. Generating general-purpose characteristic information applicable to categories, transmitting general-purpose characteristic information to a composite category-based zero-shot learning model, and recognizing unused class data based on general-purpose characteristic information through the zero-shot learning model.

도 4는 본 발명의 일 실시예에 따른 제로샷 인식 장치의 속성 변환 모델을 예시한 도면이다.4 is a diagram illustrating an attribute conversion model of a zero-shot recognition apparatus according to an embodiment of the present invention.

속성 변환 모델은 입력된 속성 정보를 벡터 레벨에서 변환하는 모델이다. 복수의 샘플이 벡터 레벨로 입력되어 임베딩 공간을 구성한다.The attribute transformation model is a model that transforms input attribute information at a vector level. A plurality of samples are input at a vector level to constitute an embedding space.

도 4의 (a)를 참조하면 원 속성 정보는 복수의 벡터로 입력된다. 속성 변환 모델은 복수의 벡터에서 일부 값을 다른 값으로 변환한다.Referring to FIG. 4A , original attribute information is input as a plurality of vectors. The attribute transformation model transforms some values into other values in multiple vectors.

도 4의 (b)를 참조하면 속성 변환 모델은 복수의 벡터에서 일부 값을 제로(0)로 변환할 수 있다.Referring to FIG. 4B , the attribute transformation model may convert some values of a plurality of vectors to zero (0).

도 4의 (c)를 참조하면 속성 변환 모델은 복수의 벡터에서 일부 값을 노이즈 값으로 변환할 수 있다. 노이즈 값으로는 가우시안 노이즈가 적용될 수 있다.Referring to FIG. 4C , the attribute conversion model may convert some values of a plurality of vectors into noise values. Gaussian noise may be applied as the noise value.

도 4의 (d)를 참조하면 속성 변환 모델은 복수의 벡터에서 대응하는 두 개의 값을 상호 치환한다. 동일한 열에 대응하는 다른 속성 정보 값을 열 단위로 치환한다. 제로 또는 노이즈와 달리 실제 값을 사용하므로, 원 속성 정보의 매니폴드를 유지할 수 있다. 학습 분야에서 매니폴드는 데이터가 있는 공간으로, 고차원에서 저차원으로 표현된 점들의 연결 또는 그룹을 의미한다.Referring to FIG. 4D , the attribute transformation model mutually substitutes two corresponding values in a plurality of vectors. Other attribute information values corresponding to the same column are replaced in units of columns. Unlike zero or noise, real values are used, so a manifold of raw attribute information can be maintained. In the field of learning, a manifold is a space with data, and it means a connection or group of points expressed from high to low dimensions.

도 5는 본 발명의 일 실시예에 따른 제로샷 인식 장치의 특징 생성 모델을 예시한 도면이다.5 is a diagram illustrating a feature generation model of a zero-shot recognition apparatus according to an embodiment of the present invention.

사람이 정의한 속성 정보 기반의 데이터 세트에 국한되지 않는 범용적인 제로샷 학습을 위해, 원 속성 정보와 변경된 속성 정보를 이용하여 특징 생성 모델을 학습한다.For general-purpose zero-shot learning that is not limited to human-defined attribute information-based data sets, a feature generation model is trained using original attribute information and changed attribute information.

특징 생성 모델은 랜덤 노이즈 분포와 새로운 속성 정보를 입력받고 범용 특징 정보를 출력하며, 생성 모델과 판별 모델이 상호 작용하는 적대적 생성 신경망에서 (i) 조건부 생성 손실 함수 및 (ii) 자기 지도 손실 함수를 최적화하여 상기 범용 특징 정보를 교정한다.The feature generation model receives random noise distribution and new attribute information and outputs general-purpose feature information. In an adversarial generative neural network where the generative model and the discriminant model interact, (i) a conditionally generated loss function and (ii) a self-guided loss function By optimizing, the general-purpose characteristic information is corrected.

조건부 생성 손실 함수(L_gan)는 수학식 1과 같이 표현되고, 자기 지도 손실 함수(L_ss)는 수학식 2와 같이 표현된다.The conditionally generated loss function (L _gan ) is expressed as in Equation (1), and the self-guided loss function (L _ss ) is expressed as in Equation (2).

조건부 생성 손실 함수는 페널티 가중치를 조건부로 범용 특징 정보에 바서스타인 거리(Wasserstein Distance)를 적용한다.The conditionally generated loss function applies a Wasserstein distance to the general-purpose feature information conditionally with a penalty weight.

x는 클래스 y의 CNN 특징 정보를 의미한다.

는 생성 모델의 분포 P_g로부터 클래스 y의 생성된 특징이다. Z는 특징 생성 모델의 자유도를 높이는 노이즈 벡터이다.

는

조건에 따른 처음 보는 클래스의 CNN 특징 정보이다. λ는 페널티 가중치 파라미터이다. 로그 우도(Log Likelihood) 대신에 바서스타인 거리를 적용하고, 경사 페널티를 추가로 포함한다. 바서스타인 거리는 두 확률분포의 연관성을 측정하여 그 거리의 기대값이 가장 작을 때의 거리를 의미한다. 결합 확률분포는 두 분포가 동시에 일어날 때의 사건에 대한 확률분포를 의미한다. x means CNN feature information of class y.

is the generated feature of class y from _{the distribution P g} of the generative model. Z is a noise vector that increases the degree of freedom of the feature generation model.

Is

This is the CNN feature information of the first class you see according to the condition. λ is a penalty weight parameter. Instead of log likelihood (Log Likelihood), we apply the Basherstein distance, and additionally include the slope penalty. The Wasserstein distance measures the correlation between two probability distributions and means the distance when the expected value of the distance is the smallest. A joint probability distribution refers to a probability distribution for an event when two distributions occur simultaneously.

자기 지도 손실 함수는 자기 지도를 수행하면서 원 속성 정보(s=0) 또는 새로운 속성 정보(s=1)에 따른 상기 범용 특징 정보에 대한 판별 모델의 분포(

)를 적용한다. 속성 정보에 대한 치환 유무를 분별하는 목적 함수이다. 속성 변환 모델은 특정 확률 상수 p에 의해 열 단위로 속성 값을 섞는 함수이며, 매니폴드를 그대로 유지하면서 이상치(outlier) 또는 노이즈에 강인하다.The self-guidance loss function is the distribution (

) is applied. This is an objective function that distinguishes whether or not to replace attribute information. The attribute transformation model is a function that mixes attribute values in units of columns by a specific probability constant p, and is robust against outliers or noise while maintaining the manifold as it is.

모든 손실은 이상치와 노이즈에 강인한 특징을 생성하게 하여 복합 카테고리 데이터 세트에 적용 가능하다. All losses are applicable to complex categorical data sets, resulting in the creation of features that are robust against outliers and noise.

프로세서가 슈도 속성 정보를 제로샷 학습 모델에 전송하면, 제로샷 학습 모델은 범용 특징 정보를 기준으로 미사용된 클래스의 데이터를 인식한다.When the processor transmits pseudo attribute information to the zero-shot learning model, the zero-shot learning model recognizes unused class data based on general-purpose feature information.

임베딩 공간으로서 사용자에 의해 정의된 속성 공간과 클래스 y에 대한 양립 가능성 점수 s(y)를 사용한다.We use the user-defined attribute space as the embedding space and the compatibility score s(y) for class y.

W는 풀리 커넥티드 레이어(Fully Connected Layer)를 갖는 가중치 매트릭스이다. 제로샷 학습 모델은 주어준 특징 x에 대해 클래스 레벨을 예측하기 위해서, 특징은 속성 표현에 투영될 수 있다. 양립 가능성 점수 s(y)는 최선의 매칭된 클래스 y^*를 선택하도록 사용될 수 있다.W is a weight matrix having a fully connected layer. In order to predict the class level for a given feature x, the zero-shot learning model can be projected onto the attribute expression. The compatibility score s(y) can be used to select the ^{best matched class y*.}

조건에서, 가장 높은 양립 가능성 점수를 가는 y^*는 예측된 클래스이다.

로 ZSL(Zero Shot Learning)을 설정할 수 있고,

로 GZSL(Generalized Zero Shot Learning)를 설정할 수 있다. GZSL는 테스트 단계에서 처음 보는 클래스(

, Unseen Class)와 사용된 클래스(

, Seen Class)를 동시에 분류한다.

In the condition, y ^* with the highest compatibility score is the predicted class.

You can set ZSL (Zero Shot Learning) with

You can set GZSL (Generalized Zero Shot Learning). GZSL is the first class (

, Unseen Class) and the class used (

, Seen Class) at the same time.

제로샷 학습 모델은 특징 추출 모델을 포함할 수 있으며, 특징 추출 모델은 CNN(Convolutional Neural Network)으로 구현될 수 있다. 특징 추출 모델은 다수의 레이어가 네트워크로 연결되며 히든 레이어를 포함한다. 레이어는 파라미터를 포함할 수 있고, 레이어의 파라미터는 학습가능한 필터 집합을 포함한다. 필터는 컨볼루션 필터를 적용할 수 있다. 파라미터는 노드 간의 가중치 및/또는 바이어스를 포함한다.The zero-shot learning model may include a feature extraction model, and the feature extraction model may be implemented as a convolutional neural network (CNN). In the feature extraction model, multiple layers are networked and include hidden layers. A layer may include a parameter, and the parameter of the layer includes a set of learnable filters. A convolution filter can be applied as a filter. The parameters include weights and/or biases between nodes.

도 6은 본 발명의 다른 실시예에 따른 제로샷 인식 방법을 예시한 흐름도이다. 제로샷 인식 방법은 컴퓨팅 디바이스에 의하여 수행될 수 있으며, 제로샷 인식 장치와 동일한 방식으로 동작한다.6 is a flowchart illustrating a zero-shot recognition method according to another embodiment of the present invention. The zero-shot recognition method may be performed by a computing device, and operates in the same manner as the zero-shot recognition apparatus.

단계 S210에서 프로세서는 단일 카테고리에 대한 원 속성 정보를 자기 지도 기반의 속성 변환 모델을 통해 새로운 속성 정보로 변환한다. 원 속성 정보는 복수의 벡터로 입력된다. 속성 변환 모델은 복수의 벡터에서 일부 값을 다른 값으로 변환한다. 속성 변환 모델은 복수의 벡터에서 일부 값을 제로 또는 노이즈 값으로 변환할 수 있다. 속성 변환 모델은 복수의 벡터에서 대응하는 두 개의 값을 상호 치환하여, 원 속성 정보의 매니폴드를 유지할 수 있다.In step S210, the processor converts original attribute information for a single category into new attribute information through a self-map-based attribute transformation model. The original attribute information is input as a plurality of vectors. The attribute transformation model transforms some values into other values in multiple vectors. The attribute transformation model may transform some values in a plurality of vectors into zero or noise values. The attribute transformation model may maintain a manifold of original attribute information by mutually permuting two corresponding values in a plurality of vectors.

단계 S220에서 프로세서는 새로운 속성 정보를 이용하여 특징 생성 모델에서 정의된 손실 함수를 기반으로 복합 카테고리에 적용 가능한 범용 특징 정보를 생성한다. 특징 생성 모델은 랜덤 노이즈 분포와 상기 새로운 속성 정보를 입력받고 상기 범용 특징 정보를 출력하며, 생성 모델과 판별 모델이 상호 작용하는 적대적 생성 신경망에서 (i) 조건부 생성 손실 함수 및 (ii) 자기 지도 손실 함수를 최적화하여 상기 범용 특징 정보를 교정한다. 조건부 생성 손실 함수는 페널티 가중치를 조건부로 범용 특징 정보에 바서스타인 거리(Wasserstein Distance)를 적용할 수 있다. 자기 지도 손실 함수는 자기 지도를 수행하면서 원 속성 정보 또는 새로운 속성 정보에 따른 범용 특징 정보에 대한 판별 모델의 분포를 적용할 수 있다.In step S220, the processor generates general-purpose feature information applicable to a complex category based on the loss function defined in the feature generation model by using the new attribute information. The feature generation model receives the random noise distribution and the new attribute information and outputs the general feature information. The general-purpose feature information is corrected by optimizing the function. The conditionally generated loss function may apply a Wasserstein distance to general-purpose feature information conditionally with a penalty weight. The self-guidance loss function may apply the distribution of the discriminant model to the general-purpose feature information according to the original attribute information or the new attribute information while performing the self-guidance.

단계 S220에서 프로세서는 범용 특징 정보를 복합 카테고리 기반의 제로샷 학습 모델에 전송한다.In step S220, the processor transmits general-purpose feature information to the composite category-based zero-shot learning model.

단계 S220에서 프로세서는 슈도 속성 정보를 제로샷 학습 모델에 전송하면, 제로샷 학습 모델은 범용 특징 정보를 기준으로 미사용된 클래스의 데이터를 인식할 수 있다.In step S220, when the processor transmits pseudo attribute information to the zero-shot learning model, the zero-shot learning model may recognize data of an unused class based on general-purpose feature information.

도 7 및 도 8은 본 발명의 실시예들에 따른 시뮬레이션 결과를 예시한 도면이다.7 and 8 are diagrams illustrating simulation results according to embodiments of the present invention.

자동 생성된 처음 보는 클래스의 데이터 및 속성 정보에 대한 정량 평가를 위하여 제로샷 학습에서 널리 사용되는 Average Per-Class Top-1 Accuracy(%) 측정 방법을 이용하였다. Average Per-Class Top-1 Accuracy(%)는 각 클래스 별로 인식 정확도(%)의 평균값을 계산하여 최종 제로샷 성능 평가의 척도로 사용한다.For the quantitative evaluation of the automatically generated data and attribute information of the first-time class, the Average Per-Class Top-1 Accuracy (%) measurement method widely used in zero-shot learning was used. Average Per-Class Top-1 Accuracy (%) calculates the average value of recognition accuracy (%) for each class and uses it as a measure of final zero-shot performance evaluation.

복합 카테고리 제로샷 데이터 세트인 ImageNet 21K를 사용하였고, 테스트 클래스의 집합을 총 9 가지를 사용하여 평가하였다. 2H는 전체 21K 클래스 중에서 WordNet hierarchy에서 학습 클래스로부터 2칸 이내로 떨어진 모든 클래스들의 집합이다. 3H는 전체 21K 클래스 중에서 WordNet hierarchy에서 학습 클래스로부터 3칸 이내로 떨어진 모든 클래스들의 집합이다. M500는 전체 21K 클래스 중에서 가장 데이터 양이 많은 500개의 클래스들의 집합이다. M1K는 전체 21K 클래스 중에서 가장 데이터 양이 많은 1,000개의 클래스들의 집합이다. M5K는 전체 21K 클래스 중에서 가장 데이터 양이 많은 5,000개의 클래스들의 집합이다. L500는 전체 21K 클래스 중에서 가장 데이터 양이 적은 500개의 클래스들의 집합이다. L1K는 전체 21K 클래스 중에서 가장 데이터 양이 적은 1,000개의 클래스들의 집합이다. L5K는 전체 21K 클래스 중에서 가장 데이터 양이 적은 5,000개의 클래스들의 집합이다. All20K는 전체 21K 클래스를 모두 사용한다.ImageNet 21K, a composite category zero-shot data set, was used, and a total of 9 sets of test classes were used for evaluation. 2H is the set of all classes within 2 spaces from the learning class in the WordNet hierarchy among all 21K classes. 3H is the set of all classes within 3 spaces from the learning class in the WordNet hierarchy among all 21K classes. M500 is a set of 500 classes with the largest amount of data among all 21K classes. M1K is a set of 1,000 classes with the largest amount of data among all 21K classes. M5K is a set of 5,000 classes with the largest amount of data among all 21K classes. L500 is a set of 500 classes with the smallest amount of data among all 21K classes. L1K is a set of 1,000 classes with the smallest amount of data among all 21K classes. L5K is a set of 5,000 classes with the smallest amount of data among all 21K classes. All20K uses all 21K classes.

본 실시예에 따른 제로샷 인식 장치는 복합 카테고리 데이터 세트의 일종인 ImageNet에 대해서 최신 제로샷 기술의 Top-1 분류 정확도가 가장 우수한 결과를 보여준다. 본 실시예에 따른 제로샷 인식 장치는 카테고리의 종류와 특성 정보 범위에 제한을 두지 않으며, 기존 기술의 한계를 극복 가능하다.The zero-shot recognition apparatus according to this embodiment shows the best result of Top-1 classification accuracy of the latest zero-shot technology for ImageNet, which is a kind of complex category data set. The zero-shot recognition apparatus according to the present embodiment does not limit the type of category and the range of characteristic information, and can overcome the limitations of existing technologies.

제로샷 인식 장치는 하드웨어, 펌웨어, 소프트웨어 또는 이들의 조합에 의해 로직회로 내에서 구현될 수 있고, 범용 또는 특정 목적 컴퓨터를 이용하여 구현될 수도 있다. 장치는 고정배선형(Hardwired) 기기, 필드 프로그램 가능한 게이트 어레이(Field Programmable Gate Array, FPGA), 주문형 반도체(Application Specific Integrated Circuit, ASIC) 등을 이용하여 구현될 수 있다. 또한, 장치는 하나 이상의 프로세서 및 컨트롤러를 포함한 시스템온칩(System on Chip, SoC)으로 구현될 수 있다.The zero-shot recognition apparatus may be implemented in a logic circuit by hardware, firmware, software, or a combination thereof, or may be implemented using a general-purpose or special-purpose computer. The device may be implemented using a hardwired device, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like. In addition, the device may be implemented as a system on chip (SoC) including one or more processors and controllers.

제로샷 인식 장치는 하드웨어적 요소가 마련된 컴퓨팅 디바이스 또는 서버에 소프트웨어, 하드웨어, 또는 이들의 조합하는 형태로 탑재될 수 있다. 컴퓨팅 디바이스 또는 서버는 각종 기기 또는 유무선 통신망과 통신을 수행하기 위한 통신 모뎀 등의 통신장치, 프로그램을 실행하기 위한 데이터를 저장하는 메모리, 프로그램을 실행하여 연산 및 명령하기 위한 마이크로프로세서 등을 전부 또는 일부 포함한 다양한 장치를 의미할 수 있다.The zero-shot recognition apparatus may be mounted in the form of software, hardware, or a combination thereof on a computing device or server provided with hardware elements. A computing device or server is all or part of a communication device such as a communication modem for performing communication with various devices or a wired/wireless communication network, a memory for storing data for executing a program, and a microprocessor for executing operations and commands by executing the program It can mean a variety of devices, including

도 5 및 도 6에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나 이는 예시적으로 설명한 것에 불과하고, 이 분야의 기술자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 5 및 도 6에 기재된 순서를 변경하여 실행하거나 또는 하나 이상의 과정을 병렬적으로 실행하거나 다른 과정을 추가하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이다.In Figures 5 and 6, each process is described as being sequentially executed, but this is merely an exemplary description, and those skilled in the art are shown in Figures 5 and 6 within the range that does not depart from the essential characteristics of the embodiment of the present invention. Various modifications and variations may be applied by changing the order described, executing one or more processes in parallel, or adding other processes.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 매체에 기록될 수 있다. 컴퓨터 판독 가능한 매체는 실행을 위해 프로세서에 명령어를 제공하는 데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예를 들면, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.The operations according to the present embodiments may be implemented in the form of program instructions that can be performed through various computer means and recorded in a computer-readable medium. Computer-readable media refers to any medium that participates in providing instructions to a processor for execution. Computer-readable media may include program instructions, data files, data structures, or a combination thereof. For example, there may be a magnetic medium, an optical recording medium, a memory, and the like. A computer program may be distributed over a networked computer system so that computer readable code is stored and executed in a distributed manner. Functional programs, codes, and code segments for implementing the present embodiment may be easily inferred by programmers in the technical field to which the present embodiment pertains.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The present embodiments are for explaining the technical idea of the present embodiment, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the equivalent range should be interpreted as being included in the scope of the present embodiment.

Claims

A zero-shot recognition method by a computing device, comprising:
converting original attribute information for a single category into new attribute information through a self-map-based attribute transformation model;
generating general-purpose feature information applicable to a complex category based on a loss function defined in a feature creation model using the new attribute information; and
Transmitting the general-purpose feature information to the composite category-based zero-shot learning model,
The feature generation model receives a random noise distribution and the new attribute information, outputs the general-purpose feature information, and conditionally applies (i) a penalty weight defined in an adversarial generative neural network in which the generative model and the discriminant model interact. A conditionally generated loss function that applies Wasserstein Distance to information and (ii) the distribution of the discriminant model for the universal feature information according to the original attribute information or the new attribute information while performing self-map A zero-shot recognition method, characterized in that the general-purpose feature information is corrected using a result of learning a self-guided loss function.

According to claim 1,
The original attribute information is input as a plurality of vectors,
and the attribute transformation model converts some values in the plurality of vectors into other values.

3. The method of claim 2,
and the attribute conversion model converts some values of the plurality of vectors into zero or noise values.

3. The method of claim 2,
and the attribute transformation model maintains a manifold of the original attribute information by mutually permuting two values corresponding to the plurality of vectors.

delete

According to claim 1,
When pseudo-attribute information is sent to the zero-shot learning model,
The zero-shot learning model recognizes data of an unused class based on the general-purpose feature information.

A zero-shot recognition device comprising one or more processors and a memory for storing one or more programs executed by the one or more processors,
The processor converts original attribute information for a single category into new attribute information through a self-map-based attribute transformation model,
The processor generates general-purpose feature information applicable to a complex category based on a loss function defined in a feature creation model using the new attribute information,
The processor transmits the general-purpose feature information to the composite category-based zero-shot learning model,
The feature generation model receives a random noise distribution and the new attribute information, outputs the general-purpose feature information, and conditionally applies (i) a penalty weight defined in an adversarial generative neural network in which the generative model and the discriminant model interact. A conditionally generated loss function that applies Wasserstein Distance to information and (ii) the distribution of the discriminant model for the universal feature information according to the original attribute information or the new attribute information while performing self-map A zero-shot recognition apparatus, characterized in that the general-purpose feature information is corrected using a result of learning the self-guided loss function.

10. The method of claim 9,
The original attribute information is input as a plurality of vectors,
and the attribute transformation model maintains a manifold of the original attribute information by mutually permuting two values corresponding to the plurality of vectors.

delete

10. The method of claim 9,
When the processor transmits pseudo attribute information to the zero-shot learning model,
The zero-shot learning model recognizes data of an unused class based on the general-purpose feature information.