KR102662997B1

KR102662997B1 - Hyperspectral image classification method and appratus using neural network

Info

Publication number: KR102662997B1
Application number: KR1020210136867A
Authority: KR
Inventors: 이준기; 유광선
Original assignee: 주식회사 엘로이랩
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2024-05-03
Also published as: KR20240025578A; KR20230053404A

Abstract

뉴럴 네트워크를 이용하는 이미지 분류 장치가 개시된다. 일 실시예에 따른 이미지 분류 장치는 이미지를 수신하는 수신기, 및 상기 이미지를 복수의 시퀀스 블록에 입력시킴으로써 스펙트럼 특징 및 공간 특징을 추출하고, 상기 스펙트럼 특징 및 상기 공간 특징에 어텐션을 수행함으로써 강화된 스펙트럼 특징(enhanced spectral feature) 및 강화된 공간 특징(enhanced spatial feature)을 생성하고, 상기 강화된 스펙트럼 특징 및 상기 강화된 공간 특징에 기초하여 상기 이미지를 분류하는 프로세서를 포함할 수 있다.An image classification device using a neural network is disclosed. An image classification device according to an embodiment includes a receiver that receives an image, extracts spectral features and spatial features by inputting the images into a plurality of sequence blocks, and performs attention on the spectral features and spatial features to generate an enhanced spectrum. It may include a processor that generates an enhanced spectral feature and an enhanced spatial feature, and classifies the image based on the enhanced spectral feature and the enhanced spatial feature.

Description

Hyperspectral image classification method and device using neural network {HYPERSPECTRAL IMAGE CLASSIFICATION METHOD AND APPRATUS USING NEURAL NETWORK}

아래 실시예들은 뉴럴 네트워크를 이용한 초분광 이미지 분류 방법 및 장치에 관한 것이다.The examples below relate to a hyperspectral image classification method and device using a neural network.

RGB는 3 개의 채널을 가지는 반면, HSIs(Hyperspectral Images)는 200개의 대역(bands) 이상의 넓은 범위의 파장을 측정한다. 이를 위해, HSI는 RGB 이미지에서 찾을 수 없는 복잡한 특징(complex feature)을 검출한다. RGB 이미지는 공간 특징(spatial feature)만을 갖지만, HSI는 스펙트럼 특징(spectral feature) 및 공간 특징을 모두 가지고 있어 분류에 우수한 성능을 보인다. 따라서, HSI는 유사한 색상에 대하여 분류될 수 있어, 원격 감지(remote sensing)를 포함하는 넓은 분야에 적용될 수 있다.While RGB has three channels, HSIs (Hyperspectral Images) measure a wide range of wavelengths, over 200 bands. To achieve this, HSI detects complex features that cannot be found in RGB images. RGB images only have spatial features, but HSI has both spectral features and spatial features, showing excellent performance in classification. Therefore, HSI can be classified for similar colors and thus can be applied to a wide range of fields, including remote sensing.

최근, 육질(meat quality), 랜드 커버 매핑(land cover mapping), 물체 감지(object detection), 변화 감지 및 위성 이미지 매핑과 같은 분야에서 머신 러닝 및 딥 러닝을 이용한 HSI를 채용하고 있다.Recently, HSI using machine learning and deep learning has been adopted in areas such as meat quality, land cover mapping, object detection, change detection, and satellite image mapping.

머신 러닝(machine learning)은 데이터의 분류나 선형 회귀(linear regression)의 패턴을 학습하는 수학적 모델을 포함한다. 딥 러닝(deep learning)은 컴퓨터 비전 또는 번역과 같은 다양한 분야에서 널리 사용되고 있다. 딥 러닝에 사용되는 딥 뉴럴 네트워크는 다중 레이어를 포함한다. Machine learning involves mathematical models that learn patterns of data classification or linear regression. Deep learning is widely used in various fields such as computer vision or translation. Deep neural networks used in deep learning include multiple layers.

실시예들은 뉴럴 네트워크를 이용한 이미지 분류 기술을 제공할 수 있다. Embodiments may provide image classification technology using a neural network.

다만, 기술적 과제는 상술한 기술적 과제들로 한정되는 것은 아니며, 또 다른 기술적 과제들이 존재할 수 있다.However, technical challenges are not limited to the above-mentioned technical challenges, and other technical challenges may exist.

일 실시예에 따른 뉴럴 네트워크를 이용하여 이미지를 분류하는 장치는, 이미지를 수신하는 수신기; 및 상기 이미지를 복수의 시퀀스 블록에 입력시킴으로써 스펙트럼 특징 및 공간 특징을 추출하고, 상기 스펙트럼 특징 및 상기 공간 특징에 어텐션을 수행함으로써 강화된 스펙트럼 특징(enhanced spectral feature) 및 강화된 공간 특징(enhanced spatial feature)을 생성하고, 상기 강화된 스펙트럼 특징 및 상기 강화된 공간 특징에 기초하여 상기 이미지를 분류하는 프로세서를 포함할 수 있다.An apparatus for classifying images using a neural network according to an embodiment includes a receiver that receives an image; And extracting spectral features and spatial features by inputting the image into a plurality of sequence blocks, and performing attention on the spectral features and spatial features to create enhanced spectral features and enhanced spatial features. ) and may include a processor that classifies the image based on the enhanced spectral features and the enhanced spatial features.

상기 프로세서는, 상기 이미지를 상기 복수의 시퀀스 블록 중에서 스펙트럼 시퀀스 블록에 입력시킴으로써 상기 스펙트럼 특징을 추출하고, 상기 이미지를 상기 복수의 시퀀스 블록 중에서 공간 시퀀스 블록에 입력시킴으로써 상기 공간 특징을 추출할 수 있다.The processor may extract the spectral features by inputting the image into a spectral sequence block among the plurality of sequence blocks, and extract the spatial features by inputting the image into a spatial sequence block among the plurality of sequence blocks.

상기 스펙트럼 시퀀스 블록 및 상기 공간 시퀀스 블록은, 은닉 상태에 대한 컨볼루션 연산, 입력에 대한 컨볼루션 연산 및 엘리먼트 와이즈 섬(element-wise sum) 연산을 포함하는 메모리 블록(memory block); 및 상기 은닉 상태 및 상기 입력에 대한 연결(concatenation) 및 배치 정규화 연산을 포함하는 생성 블록(generation block)을 포함할 수 있다.The spectral sequence block and the spatial sequence block include a memory block including a convolution operation for a hidden state, a convolution operation for an input, and an element-wise sum operation; and a generation block including concatenation and batch normalization operations for the hidden state and the input.

일 실시예에 따른 이미지 분류 장치는, 상기 은닉 상태에 대한 컨볼루션 연산 및 상기 입력에 대한 컨볼루션 연산은 3 차원 컨볼루션 연산을 포함할 수 있다.In the image classification device according to an embodiment, the convolution operation for the hidden state and the convolution operation for the input may include a three-dimensional convolution operation.

상기 프로세서는, 상기 스펙트럼 특징에 채널 글로벌 컨텍스트 어텐션 연산을 수행함으로써 상기 강화된 스펙트럼 특징을 생성하고, 상기 공간 특징에 포지션 글로벌 컨텍스트 어텐션 연산을 수행함으로써 상기 강화된 공간 특징을 생성할 수 있다.The processor may generate the enhanced spectral feature by performing a channel global context attention operation on the spectral feature and generate the enhanced spatial feature by performing a position global context attention operation on the spatial feature.

상기 채널 글로벌 컨텍스트 어텐션 연산은, 컨볼루션 연산, 리쉐입(reshape) 연산 및 레이어 정규화 연산을 포함하고, 상기 포지션 글로벌 컨텍스트 어텐션 연산은, 컨볼루션 연산, 풀링(pooling) 연산, 리쉐입 연산 및 레이어 정규화 연산을 포함할 수 있다.The channel global context attention operation includes a convolution operation, a reshape operation, and a layer normalization operation, and the position global context attention operation includes a convolution operation, a pooling operation, a reshape operation, and a layer normalization operation. May include operations.

상기 프로세서는, 상기 강화된 스펙트럼 특징 및 상기 강화된 공간 특징을 연결함으로써 1 차원 벡터를 생성하고, 상기 1 차원 벡터를 전 연결 레이어(fully connected layer)에 입력함으로써 상기 이미지를 분류할 수 있다.The processor may classify the image by generating a one-dimensional vector by connecting the enhanced spectral feature and the enhanced spatial feature, and inputting the one-dimensional vector to a fully connected layer.

일 실시예에 따른 뉴럴 네트워크를 이용하여 이미지를 분류하는 장치는 이미지를 수신하는 수신기; 및 상기 이미지를 복수의 시퀀스 블록에 입력시킴으로써 상기 이미지에 대한 특징 정보를 생성하고, 상기 특징 정보에 기초하여 상기 이미지를 분류하고, 상기 시퀀스 블록은, 수학식 1 및 수학식 2에 기초하여 출력을 생성하는 생성 블록을 포함하고, 상기 수학식 1은, 이고, 상기 수학식 2는, 이고, x는 입력, h는 은닉 상태, l은 레이어 식별 요소, 는 배치 정규화, ReLU는 활성화 함수, 는 가중치를 나타낼 수 있다.An apparatus for classifying images using a neural network according to one embodiment includes a receiver that receives an image; and generating feature information for the image by inputting the image into a plurality of sequence blocks, classifying the image based on the feature information, and the sequence block outputs output based on Equation 1 and Equation 2. It includes a generation block that generates, and Equation 1 above is: And Equation 2 is, , x is the input, h is the hidden state, l is the layer identification element, is batch normalization, ReLU is the activation function, may represent a weight.

일 실시예에 따른 뉴럴 네트워크를 이용하여 이미지를 분류하는 장치는 이미지를 수신하는 수신기; 및 상기 이미지를 복수의 시퀀스 블록에 입력시킴으로써 상기 이미지에 대한 특징 정보를 생성하고, 상기 특징 정보에 기초하여 상기 이미지를 분류하고, 상기 시퀀스 블록은, 수학식 1에 기초하여 출력을 생성하는 메모리 블록을 포함하고, 상기 수학식 1은, 이고, x는 입력, h는 은닉 상태, l은 레이어 식별 요소, 및 는 상기 메모리 블록의 가중치를 나타낼 수 있다.An apparatus for classifying images using a neural network according to one embodiment includes a receiver that receives an image; and a memory block that generates feature information about the image by inputting the image into a plurality of sequence blocks, classifies the image based on the feature information, and the sequence block generates an output based on Equation 1. Includes, and Equation 1 is, , x is the input, h is the hidden state, l is the layer identification element, and may represent the weight of the memory block.

일 실시예에 따른 이미지 분류 장치에 의해 수행되는, 뉴럴 네트워크를 이용하여 이미지를 분류하는 방법은 이미지를 수신하는 단계; 상기 이미지를 복수의 시퀀스 블록에 입력시킴으로써 스펙트럼 특징 및 공간 특징을 추출하는 단계; 상기 스펙트럼 특징 및 상기 공간 특징에 어텐션을 수행함으로써 강화된 스펙트럼 특징(enhanced spectral feature) 및 강화된 공간 특징(enhanced spatial feature)을 생성하는 단계; 상기 강화된 스펙트럼 특징 및 상기 강화된 공간 특징에 기초하여 상기 이미지를 분류하는 단계를 포함할 수 있다.A method of classifying an image using a neural network, performed by an image classification device according to an embodiment, includes receiving an image; extracting spectral features and spatial features by inputting the image into a plurality of sequence blocks; generating an enhanced spectral feature and an enhanced spatial feature by performing attention on the spectral feature and the spatial feature; Classifying the image based on the enhanced spectral features and the enhanced spatial features.

일 실시예에 따른 이미지 분류 장치에 의해 수행되는, 뉴럴 네트워크를 이용하여 이미지를 분류하는 방법은 이미지를 수신하는 단계; 상기 이미지를 복수의 시퀀스 블록에 입력시킴으로써 상기 이미지에 대한 특징 정보를 생성하는 단계; 및 상기 특징 정보에 기초하여 상기 이미지를 분류하는 단계를 포함하고, 상기 시퀀스 블록은, 메모리 블록 또는 생성 블록을 포함하고, 상기 메모리 블록은, 수학식 1 및 수학식 2에 기초하여 출력을 생성하는 생성 블록 또는, 수학식 3에 기초하여 출력을 생성하는 메모리 블록 중 적어도 하나를 포함하고, 상기 수학식 1은, 이고, 상기 수학식 2는,이고, 상기 수학식 3은, 이고, x는 입력, h는 은닉 상태, l은 레이어 식별 요소, 는 배치 정규화, ReLU는 활성화 함수, 는 생성 블록의 가중치, 및 는 상기 메모리 블록의 가중치를 나타낼 수 있다.A method of classifying an image using a neural network, performed by an image classification device according to an embodiment, includes receiving an image; Generating feature information about the image by inputting the image into a plurality of sequence blocks; and classifying the image based on the feature information, wherein the sequence block includes a memory block or a generation block, and the memory block generates an output based on Equation 1 and Equation 2. It includes at least one of a generation block or a memory block that generates an output based on Equation 3, where Equation 1 is: And Equation 2 is, And Equation 3 above is, , x is the input, h is the hidden state, l is the layer identification element, is batch normalization, ReLU is the activation function, is the weight of the creation block, and may represent the weight of the memory block.

실시예들은 스펙트럼 특징 및 공간 특징을 추출하기 위한 복수의 브랜치를 갖는 뉴럴 네트워크를 통해 이미지 분류를 수행함으로써 분류 성능을 향상시킬 수 있다.Embodiments may improve classification performance by performing image classification through a neural network with a plurality of branches for extracting spectral features and spatial features.

실시예들은 메모리 블록 및 생성 블록을 포함하는 시퀀스 블록을 이용하여 이미지 분류 성능을 향상시킬 수 있다.Embodiments may improve image classification performance by using sequence blocks including memory blocks and generation blocks.

도 1은 일 실시예에 따른 이미지 분류 장치의 개략적인 블록도를 나타낸다.
도 2는 시퀀스 연결(sequential connectivity)의 예를 나타낸다.
도 3은 도 1에 도시된 이미지 분류 장치가 이용하는 뉴럴 네트워크의 예를 나타낸다.
도 4는 도 3에 도시된 뉴럴 네트워크의 스펙트럴 시퀀스 블록의 예를 나타낸다.
도 5는 도 3에 도시된 뉴럴 네트워크의 공간 시퀀스 블록의 예를 나타낸다.
도 6은 도 2에 도시된 뉴럴 네트워크의 채널 글로벌 컨텍스트 어텐션의 예를 나타낸다.
도 7은 도 2에 도시된 뉴럴 네트워크의 포지션 글로벌 컨텍스트 어텐션의 예를 나타낸다.
도 8은 도 1에 도시된 이미지 분류 장치의 동작의 흐름도를 나타낸다.
도 9는 일 실시예에 따른 이미지 분류 장치를 통해 수행된 이미지 분류를 다른 뉴럴 네트워크와 비교 설명하기 위한 도면이다.
도 10a 내지 도 10c는 IP, UP, SV 데이터 세트에 대한 분류 결과의 평균 및 분산을 나타내는 도면일 수 있다.
도 11은 일 실시예에 따른 이미지 분류 장치의 세부 파라미터 변화에 따른 성능을 예시적으로 도시하는 도면이다.
도 12는 조건 변화에 따른 이미지 분류 장치의 성능을 설명하기 위한 도면이다.
도 13은 학습 데이터 수의 변화에 따라 모델의 정확도 변화를 나타내는 도면이다.
도 14a 및 도 14b는 학습 데이터의 종류에 따라 학습 가능한 파라미터의 수, 학습 시간 및 테스트 시간을 나타내는 도면이다.Figure 1 shows a schematic block diagram of an image classification device according to an embodiment.
Figure 2 shows an example of sequential connectivity.
FIG. 3 shows an example of a neural network used by the image classification device shown in FIG. 1.
Figure 4 shows an example of a spectral sequence block of the neural network shown in Figure 3.
Figure 5 shows an example of a spatial sequence block of the neural network shown in Figure 3.
FIG. 6 shows an example of channel global context attention of the neural network shown in FIG. 2.
Figure 7 shows an example of position global context attention of the neural network shown in Figure 2.
FIG. 8 shows a flowchart of the operation of the image classification device shown in FIG. 1.
Figure 9 is a diagram for explaining and comparing image classification performed through an image classification device according to an embodiment with another neural network.
FIGS. 10A to 10C may be diagrams showing the average and variance of classification results for IP, UP, and SV data sets.
FIG. 11 is a diagram illustrating performance according to changes in detailed parameters of an image classification device according to an embodiment.
Figure 12 is a diagram to explain the performance of an image classification device according to changes in conditions.
Figure 13 is a diagram showing the change in model accuracy according to the change in the number of learning data.
Figures 14a and 14b are diagrams showing the number of learnable parameters, learning time, and testing time depending on the type of learning data.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are merely illustrative for the purpose of explaining the embodiments according to the concept of the present invention. They may be implemented in various forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Since the embodiments according to the concept of the present invention can make various changes and have various forms, the embodiments will be illustrated in the drawings and described in detail in this specification. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes changes, equivalents, or substitutes included in the spirit and technical scope of the present invention.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another component, for example, a first component may be named a second component, without departing from the scope of rights according to the concept of the present invention, Similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 "~사이에"와 "바로~사이에" 또는 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is said to be "connected" or "connected" to another component, it is understood that it may be directly connected to or connected to the other component, but that other components may exist in between. It should be. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between. Expressions that describe the relationship between components, such as “between”, “immediately between” or “directly adjacent to”, should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is only used to describe specific embodiments and is not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "include" or "have" are intended to designate the presence of a described feature, number, step, operation, component, part, or combination thereof, and one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the technical field to which the present invention pertains. Terms as defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or overly formal sense. No.

본 명세서에서의 모듈(module)은 본 명세서에서 설명되는 각 명칭에 따른 기능과 동작을 수행할 수 있는 하드웨어를 의미할 수도 있고, 특정 기능과 동작을 수행할 수 있는 컴퓨터 프로그램 코드를 의미할 수도 있고, 또는 특정 기능과 동작을 수행시킬 수 있는 컴퓨터 프로그램 코드가 탑재된 전자적 기록 매체, 예를 들어 프로세서 또는 마이크로 프로세서를 의미할 수 있다.A module in this specification may mean hardware that can perform functions and operations according to each name described in this specification, or it may mean computer program code that can perform specific functions and operations. , or it may mean an electronic recording medium loaded with computer program code that can perform specific functions and operations, for example, a processor or microprocessor.

다시 말해, 모듈이란 본 발명의 기술적 사상을 수행하기 위한 하드웨어 및/또는 상기 하드웨어를 구동하기 위한 소프트웨어의 기능적 및/또는 구조적 결합을 의미할 수 있다.In other words, a module may mean a functional and/or structural combination of hardware for carrying out the technical idea of the present invention and/or software for driving the hardware.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. However, the scope of the patent application is not limited or limited by these examples. The same reference numerals in each drawing indicate the same members.

도 1은 일 실시예에 따른 이미지 분류 장치의 개략적인 블록도를 나타낸다.Figure 1 shows a schematic block diagram of an image classification device according to an embodiment.

도 1을 참조하면, 이미지 분류 장치(10)는 이미지 분류를 수행할 수 있다. 이미지 분류 장치(10)는 이미지를 입력 받아 뉴럴 네트워크를 이용하여 이미지를 처리함으로써 이미지에 대한 분류를 수행할 수 있다. 이미지는 빛의 굴절이나 반사 등에 의하여 이루어진 물체의 상을 포함하는 것으로, 선이나 색채를 이용하여 사물의 형상을 나타낸 것을 의미할 수 있다. 이미지는 컴퓨터가 처리할 수 있는 형태로 된 정보로 이루어질 수 있다. 예를 들어, 이미지는 HSI(Hyperspectral Image)를 포함할 수 있다.Referring to FIG. 1, the image classification device 10 can perform image classification. The image classification device 10 can classify the image by receiving an image as an input and processing the image using a neural network. An image includes an image of an object created by refraction or reflection of light, and may mean representing the shape of the object using lines or colors. An image may consist of information in a form that a computer can process. For example, the image may include a Hyperspectral Image (HSI).

뉴럴 네트워크(또는 인공 신경망)는 기계학습과 인지과학에서 생물학의 신경을 모방한 통계학적 학습 알고리즘을 포함할 수 있다. 뉴럴 네트워크는 시냅스의 결합으로 네트워크를 형성한 인공 뉴런(노드)이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 전반을 의미할 수 있다.Neural networks (or artificial neural networks) can include statistical learning algorithms that mimic neurons in biology in machine learning and cognitive science. A neural network can refer to an overall model in which artificial neurons (nodes), which form a network through the combination of synapses, change the strength of the synapse connection through learning and have problem-solving capabilities.

뉴럴 네트워크의 뉴런은 가중치 또는 바이어스의 조합을 포함할 수 있다. 뉴럴 네트워크는 하나 이상의 뉴런 또는 노드로 구성된 하나 이상의 레이어(layer)를 포함할 수 있다. 뉴럴 네트워크는 뉴런의 가중치를 학습을 통해 변화시킴으로써 임의의 입력으로부터 예측하고자 하는 결과를 추론할 수 있다.Neurons in a neural network can contain combinations of weights or biases. A neural network may include one or more layers consisting of one or more neurons or nodes. Neural networks can infer the results they want to predict from arbitrary inputs by changing the weights of neurons through learning.

뉴럴 네트워크는 심층 뉴럴 네트워크 (Deep Neural Network)를 포함할 수 있다. 뉴럴 네트워크는 CNN(Convolutional Neural Network), RNN(Recurrent Neural Network), 퍼셉트론(perceptron), 다층 퍼셉트론(multilayer perceptron), FF(Feed Forward), RBF(Radial Basis Network), DFF(Deep Feed Forward), LSTM(Long Short Term Memory), GRU(Gated Recurrent Unit), AE(Auto Encoder), VAE(Variational Auto Encoder), DAE(Denoising Auto Encoder), SAE(Sparse Auto Encoder), MC(Markov Chain), HN(Hopfield Network), BM(Boltzmann Machine), RBM(Restricted Boltzmann Machine), DBN(Depp Belief Network), DCN(Deep Convolutional Network), DN(Deconvolutional Network), DCIGN(Deep Convolutional Inverse Graphics Network), GAN(Generative Adversarial Network), LSM(Liquid State Machine), ELM(Extreme Learning Machine), ESN(Echo State Network), DRN(Deep Residual Network), DNC(Differentiable Neural Computer), NTM(Neural Turning Machine), CN(Capsule Network), KN(Kohonen Network) 및 AN(Attention Network)를 포함할 수 있으나 이에 한정되는 것이 아닌 임의의 뉴럴 네트워크를 포함할 수 있음은 통상의 기술자가 이해할 것이다.Neural networks may include deep neural networks. Neural networks include CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), perceptron, multilayer perceptron, FF (Feed Forward), RBF (Radial Basis Network), DFF (Deep Feed Forward), and LSTM. (Long Short Term Memory), GRU (Gated Recurrent Unit), AE (Auto Encoder), VAE (Variational Auto Encoder), DAE (Denoising Auto Encoder), SAE (Sparse Auto Encoder), MC (Markov Chain), HN (Hopfield) Network), BM (Boltzmann Machine), RBM (Restricted Boltzmann Machine), DBN (Depp Belief Network), DCN (Deep Convolutional Network), DN (Deconvolutional Network), DCIGN (Deep Convolutional Inverse Graphics Network), GAN (Generative Adversarial Network) ), Liquid State Machine (LSM), Extreme Learning Machine (ELM), Echo State Network (ESN), Deep Residual Network (DRN), Differential Neural Computer (DNC), Neural Turning Machine (NTM), Capsule Network (CN), Those skilled in the art will understand that it may include any neural network, including but not limited to KN (Kohonen Network) and AN (Attention Network).

이미지 분류 장치(10)는 마더보드(motherboard)와 같은 인쇄 회로 기판(printed circuit board(PCB)), 집적 회로(integrated circuit(IC)), 또는 SoC(system on chip)로 구현될 수 있다. 예를 들어, 이미지 분류 장치(10)는 애플리케이션 프로세서(application processor)로 구현될 수 있다.The image classification device 10 may be implemented with a printed circuit board (PCB) such as a motherboard, an integrated circuit (IC), or a system on chip (SoC). For example, the image classification device 10 may be implemented with an application processor.

또한, 이미지 분류 장치(10)는 PC(personal computer), 데이터 서버, 또는 휴대용 장치 내에 구현될 수 있다.Additionally, the image classification device 10 may be implemented within a personal computer (PC), a data server, or a portable device.

휴대용 장치는 랩탑(laptop) 컴퓨터, 이동 전화기, 스마트 폰(smart phone), 태블릿(tablet) PC, 모바일 인터넷 디바이스(mobile internet device(MID)), PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라(digital still camera), 디지털 비디오 카메라(digital video camera), PMP(portable multimedia player), PND(personal navigation device 또는 portable navigation device), 휴대용 게임 콘솔(handheld game console), e-북(e-book), 또는 스마트 디바이스(smart device)로 구현될 수 있다. 스마트 디바이스는 스마트 와치(smart watch), 스마트 밴드(smart band), 또는 스마트 링(smart ring)으로 구현될 수 있다.Portable devices include laptop computers, mobile phones, smart phones, tablet PCs, mobile internet devices (MIDs), personal digital assistants (PDAs), and enterprise digital assistants (EDAs). , digital still camera, digital video camera, portable multimedia player (PMP), personal navigation device or portable navigation device (PND), handheld game console, e-book ( It can be implemented as an e-book) or a smart device. A smart device may be implemented as a smart watch, smart band, or smart ring.

이미지 분류 장치(10)는 수신기(100) 및 프로세서(200)를 포함한다. 이미지 분류 장치(10)는 메모리(300)를 더 포함할 수 있다.The image classification device 10 includes a receiver 100 and a processor 200. The image classification device 10 may further include a memory 300.

수신기(100)는 이미지를 수신할 수 있다. 수신기(100)는 수신 인터페이스를 포함할 수 있다. 수신기(100)는 수신한 이미지를 프로세서(200)로 출력할 수 있다.The receiver 100 can receive an image. Receiver 100 may include a receiving interface. The receiver 100 may output the received image to the processor 200.

프로세서(200)는 메모리(300)에 저장된 데이터를 처리할 수 있다. 프로세서(200)는 메모리(300)에 저장된 컴퓨터로 읽을 수 있는 코드(예를 들어, 소프트웨어) 및 프로세서(200)에 의해 유발된 인스트럭션(instruction)들을 실행할 수 있다.The processor 200 may process data stored in the memory 300. The processor 200 may execute computer-readable code (eg, software) stored in the memory 300 and instructions triggered by the processor 200 .

"프로세서(200)"는 목적하는 동작들(desired operations)을 실행시키기 위한 물리적인 구조를 갖는 회로를 가지는 하드웨어로 구현된 데이터 처리 장치일 수 있다. 예를 들어, 목적하는 동작들은 프로그램에 포함된 코드(code) 또는 인스트럭션들(instructions)을 포함할 수 있다.The “processor 200” may be a data processing device implemented in hardware that has a circuit with a physical structure for executing desired operations. For example, the intended operations may include code or instructions included in the program.

예를 들어, 하드웨어로 구현된 데이터 처리 장치는 마이크로프로세서(microprocessor), 중앙 처리 장치(central processing unit), 프로세서 코어(processor core), 멀티-코어 프로세서(multi-core processor), 멀티프로세서(multiprocessor), ASIC(Application-Specific Integrated Circuit), FPGA(Field Programmable Gate Array)를 포함할 수 있다.For example, data processing devices implemented in hardware include microprocessors, central processing units, processor cores, multi-core processors, and multiprocessors. , ASIC (Application-Specific Integrated Circuit), and FPGA (Field Programmable Gate Array).

프로세서(200)는 이미지를 복수의 시퀀스 블록에 입력시킴으로써 스펙트럼 특징 및 공간 특징을 추출할 수 있다. 프로세서(200)는 이미지를 복수의 시퀀스 블록 중에서 스펙트럼 시퀀스 블록에 입력시킴으로써 스펙트럼 특징을 추출할 수 있다. 프로세서(200)는 이미지를 복수의 시퀀스 블록 중에서 공간 시퀀스 블록에 입력시킴으로써 공간 특징을 추출할 수 있다.The processor 200 may extract spectral features and spatial features by inputting images into a plurality of sequence blocks. The processor 200 may extract spectral features by inputting an image into a spectral sequence block among a plurality of sequence blocks. The processor 200 may extract spatial features by inputting an image into a spatial sequence block among a plurality of sequence blocks.

스펙트럼 시퀀스 블록 및 상기 공간 시퀀스 블록은 메모리 블록(memory block) 및 생성 블록(generation block)을 포함할 수 있다. 메모리 블록은 RNN, LSTM, GRU 등 임의의 시계열적 네트워크로 구성될 수 있다.The spectral sequence block and the spatial sequence block may include a memory block and a generation block. A memory block can be composed of any time-series network such as RNN, LSTM, or GRU.

메모리 블록은 은닉 상태에 대한 컨볼루션 연산, 입력에 대한 컨볼루션 연산 및 엘리먼트 와이즈 섬(element-wise sum) 연산을 포함할 수 있다. 생성 블록은 은닉 상태 및 입력에 대한 연결(concatenation) 및 배치 정규화 연산을 포함할 수 있다.The memory block may include a convolution operation on the hidden state, a convolution operation on the input, and an element-wise sum operation. The generation block may include concatenation and batch normalization operations on hidden states and inputs.

은닉 상태에 대한 컨볼루션 연산 및 입력에 대한 컨볼루션 연산은 3 차원 컨볼루션 연산을 포함할 수 있다.The convolution operation on the hidden state and the convolution operation on the input may include a three-dimensional convolution operation.

프로세서(200)는 스펙트럼 특징 및 공간 특징에 어텐션을 수행함으로써 강화된 스펙트럼 특징(enhanced spectral feature) 및 강화된 공간 특징(enhanced spatial feature)을 생성할 수 있다.The processor 200 may generate an enhanced spectral feature and an enhanced spatial feature by performing attention on the spectral feature and spatial feature.

프로세서(200)는 스펙트럼 특징에 채널 글로벌 컨텍스트 어텐션 연산을 수행함으로써 강화된 스펙트럼 특징을 생성하고, 공간 특징에 포지션 글로벌 컨텍스트 어텐션 연산을 수행함으로써 강화된 공간 특징을 생성할 수 있다.The processor 200 may generate enhanced spectral features by performing a channel global context attention operation on the spectral features, and generate enhanced spatial features by performing a position global context attention operation on the spatial features.

채널 글로벌 컨텍스트 어텐션 연산은 컨볼루션 연산, 리쉐입(reshape) 연산 및 레이어 정규화 연산을 포함할 수 있다. 포지션 글로벌 컨텍스트 어텐션 연산은 컨볼루션 연산, 풀링(pooling) 연산, 리쉐입 연산 및 레이어 정규화 연산을 포함할 수 있다.Channel global context attention operations may include convolution operations, reshape operations, and layer normalization operations. The position global context attention operation may include a convolution operation, pooling operation, reshaping operation, and layer normalization operation.

프로세서(200)는 강화된 스펙트럼 특징 및 강화된 공간 특징에 기초하여 이미지를 분류할 수 있다. 프로세서(200)는 강화된 스펙트럼 특징 및 강화된 공간 특징을 연결함으로써 1 차원 벡터를 생성할 수 있다. 프로세서(200)는 1 차원 벡터를 전 연결 레이어(fully connected layer)에 입력함으로써 이미지를 분류할 수 있다.Processor 200 may classify images based on enhanced spectral features and enhanced spatial features. Processor 200 may generate a one-dimensional vector by concatenating the enhanced spectral features and the enhanced spatial features. The processor 200 can classify the image by inputting a one-dimensional vector into a fully connected layer.

메모리(300)는 프로세서(200)에 의해 실행가능한 인스트럭션들(또는 프로그램)을 저장할 수 있다. 예를 들어, 인스트럭션들은 프로세서의 동작 및/또는 프로세서의 각 구성의 동작을 실행하기 위한 인스트럭션들을 포함할 수 있다.The memory 300 may store instructions (or programs) executable by the processor 200. For example, the instructions may include instructions for executing the operation of the processor and/or the operation of each component of the processor.

메모리(300)는 휘발성 메모리 장치 또는 불휘발성 메모리 장치로 구현될 수 있다.The memory 300 may be implemented as a volatile memory device or a non-volatile memory device.

휘발성 메모리 장치는 DRAM(dynamic random access memory), SRAM(static random access memory), T-RAM(thyristor RAM), Z-RAM(zero capacitor RAM), 또는 TTRAM(Twin Transistor RAM)으로 구현될 수 있다.Volatile memory devices may be implemented as dynamic random access memory (DRAM), static random access memory (SRAM), thyristor RAM (T-RAM), zero capacitor RAM (Z-RAM), or twin transistor RAM (TTRAM).

불휘발성 메모리 장치는 EEPROM(Electrically Erasable Programmable Read-Only Memory), 플래시(flash) 메모리, MRAM(Magnetic RAM), 스핀전달토크 MRAM(Spin-Transfer Torque(STT)-MRAM), Conductive Bridging RAM(CBRAM), FeRAM(Ferroelectric RAM), PRAM(Phase change RAM), 저항 메모리(Resistive RAM(RRAM)), 나노 튜브 RRAM(Nanotube RRAM), 폴리머 RAM(Polymer RAM(PoRAM)), 나노 부유 게이트 메모리(Nano Floating Gate Memory(NFGM)), 홀로그래픽 메모리(holographic memory), 분자 전자 메모리 소자(Molecular Electronic Memory Device), 또는 절연 저항 변화 메모리(Insulator Resistance Change Memory)로 구현될 수 있다.Non-volatile memory devices include EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory, MRAM (Magnetic RAM), Spin-Transfer Torque (STT)-MRAM (MRAM), and Conductive Bridging RAM (CBRAM). , FeRAM (Ferroelectric RAM), PRAM (Phase change RAM), Resistive RAM (RRAM), Nanotube RRAM (Nanotube RRAM), Polymer RAM (PoRAM), Nano Floating Gate Memory (NFGM), holographic memory, molecular electronic memory device, or insulation resistance change memory.

도 2는 시퀀스 연결(sequential connectivity)의 예를 나타낸다.Figure 2 shows an example of sequential connectivity.

도 2를 참조하면, 시퀀스 연결은 메모리 블록(memory block)(210)((예: RNN(Recurrent Neural Network) 블록, LSTM 블록 등) 및 생성 블록(generation block)(230)을 포함할 수 있다. 메모리 블록(210)은 은닉 상태(hidden state) h와 입력 x를 갖는 일련의 convRNN 모델을 포함할 수 있다. convRNN 모델은 전 연결 연산(fully connected operation) 대신에 컨볼루션 연산(convolution operation)을 이용하는 시퀀스 기반 모델(예: RNN)을 포함할 수 있다.Referring to FIG. 2, the sequence connection may include a memory block 210 (e.g., a Recurrent Neural Network (RNN) block, an LSTM block, etc.) and a generation block 230. The memory block 210 may include a series of convRNN models with a hidden state h and an input x. The convRNN model uses a convolution operation instead of a fully connected operation. May include sequence-based models (e.g. RNN).

생성 블록(230)은 배치 정규화(batch normalization), ReLU(Rectified Linear Unit) 활성화 함수 및 3D(3 dimensional) 컨볼루션 레이어를 포함할 수 있다. 3D 컨볼루션 레이어는 비특허문헌 1(Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift)을 통해 통상의 기술자가 용이하게 도출할 수 있다.The generation block 230 may include a batch normalization, a Rectified Linear Unit (ReLU) activation function, and a 3-dimensional (3D) convolution layer. The 3D convolution layer can be easily derived by a person skilled in the art through non-patent document 1 (Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift).

시퀀스 연결은 레이어 레벨 정보(layer level information) (특징 정보가 생성된 레이어의 레벨(위치)에 관한 정보)을 이용하여 밀집 연결(dense connectivity)을 향상시킨 뉴럴 네트워크를 의미할 수 있다. 메모리 블록(210)은 은닉 상태가 h이고 입력이 x인 convRNN 모델의 시리즈(series)로 구성될 수 있다.Sequence connection may refer to a neural network that improves dense connectivity using layer level information (information about the level (position) of the layer where feature information is generated). The memory block 210 may be composed of a series of convRNN models with a hidden state of h and an input of x.

밀집 연결(dense connectivity)는 이전 레이어를 연결(concatenate)하는 집합적인(collective) 연결로, Densenet에 포함된 연결 구조인 반면, 시퀀스 연결은 레이어의 순서에 따라 이전 레이어의 특징을 포함하는 새로운 특징을 생성할 수 있다.Dense connectivity is a collective connection that concatenates previous layers, and is a connection structure included in Densenet, while sequence connectivity is a connection structure that includes features of previous layers according to the order of the layers. can be created.

도 2의 예시에서, x는 입력 h는 은닉 상태를 의미할 수 있다. 생성 블록(230)의 출력 x_l은 수학식 1과 같이 나타낼 수 있다.In the example of FIG. 2, x may mean an input and h may mean a hidden state. The output x _l of the generation block 230 can be expressed as Equation 1.

여기서, 는 수학식 2와 같이 나타낼 수 있다. 는 배치 정규화( ), ReLU 활성화 및 컨볼루션 레이어를 포함하는 생성 블록(230)의 구성을 의미할 수 있다.here, can be expressed as Equation 2. is batch normalization ( ), may refer to the configuration of the generation block 230 including ReLU activation and convolution layers.

메모리 블록(210)의 출력 h_l은 수학식 3과 같이 나타낼 수 있다.The output h _l of the memory block 210 can be expressed as Equation 3.

및 는 메모리 블록의 컨볼루션 레이어 l의 가중치를 의미할 수 있다. h_l은 RNN, GRU 또는 LSTM과 같은 컨볼루션 RNN(convolutional RNN) 기반 모델에 의해 생성될 수 있다. 예를 들어, 프로세서(200)는 convRNN을 통해 수학식 3을 이용하여 h_l을 생성할 수 있다. and may mean the weight of convolution layer l of the memory block. h _l can be generated by a convolutional RNN (convolutional RNN)-based model such as RNN, GRU, or LSTM. For example, the processor 200 can generate h _l using Equation 3 through convRNN.

일반적으로, RNN의 경우, 다양한 데이터 길이 및 데이터 사이의 관계에 기초하여 블록들에 대해 공유되는 가중치(shared weight)가 활용될 수 있다. 일 실시예에 따른, 메모리 블록(210)은 블록 별로 공유되지 않는(서로 다른) 가중치 및 를 가질 수 있다. 이는 전체 네트워크의 길이가 특정한 값(L)로 고정되기 때문이다. 공유되지 않는 가중치 및 에 기반하여, 메모리 블록(210)은 특징 레벨의 순서에 보다 집중할 수 있다. Generally, in the case of RNN, shared weights for blocks can be utilized based on various data lengths and relationships between data. According to one embodiment, the memory block 210 has weights that are not shared (different) for each block. and You can have This is because the length of the entire network is fixed to a specific value (L). non-shared weights and Based on , the memory block 210 can focus more on the order of feature levels.

일 실시예에 따른 시퀀스 연결(시퀀스 블록)을 통해 구성되는 네트워크(seqnet)은 특징 레벨 정보(feature-level information)을 스킵 연결(skip connection)에 반영할 뿐만 아니라, 동일한 수의 레이어를 포함하는, 비특허 문헌 2(Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition)에 개시된 Densenet에 비해 연산 수를 절감할 수 있다.A network (seqnet) constructed through sequence connections (sequence blocks) according to one embodiment not only reflects feature-level information in skip connections, but also includes the same number of layers. Number of operations compared to Densenet disclosed in non-patent document 2 (Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition) can be saved.

3D 컨볼루션을 위해 수행되는 MAC 연산(Multiply-accumulate 연산)의 수는 수학식 4에 기초하여 산출될 수 있다. The number of MAC operations (Multiply-accumulate operations) performed for 3D convolution can be calculated based on Equation 4.

는 MAC 연산의 수, 는 각각 네트워크 입력 특징맵의 채널(channel), 깊이(depth), 높이(height) 및 너비(width)를 나타내고, 는 각각 네트워크 출력 특징맵의 채널(channel), 깊이(depth), 높이(height) 및 너비(width)를 나타내고, 는 각각 커넬(kernel)의 깊이(depth), 높이(height) 및 너비(width)를 나타내고, 는 각각 스트라이드(stride)의 깊이(depth), 높이(height) 및 너비(width)를 나타낼 수 있다. is the number of MAC operations, represents the channel, depth, height, and width of the network input feature map, respectively, represents the channel, depth, height, and width of the network output feature map, respectively, represents the depth, height, and width of the kernel, respectively, may represent the depth, height, and width of the stride, respectively.

Densenet에서 수행되는 MAC 연산의 수 및 일 실시예에 따른 시퀀스 연결(시퀀스 블록)을 활용한 네트워크(seqnet)에서 수행되는 MAC 연산의 수는 수학식 5와 같이 표현될 수 있다.The number of MAC operations performed in Densenet and the number of MAC operations performed in a network (seqnet) utilizing sequence connections (sequence blocks) according to one embodiment can be expressed as Equation 5.

는 Densenet에서 수행되는 L개의 레이어를 포함한 dense 블록의 MAC 연산의 수, 는 시퀀스 연결을 활용한 네트워크(seqnet)에서 수행되는 L개의 시퀀스 블록의 MAC 연산의 수, 은 레이어의 수, 는 Densenet의 초기 입력 채널, 는 Densenet의 반복되는 입력 채널, , 는 메모리 블록(210)의 가중치(, )를 나타낼 수 있다. is the number of MAC operations of a dense block including L layers performed in Densenet, is the number of MAC operations of L sequence blocks performed in a network (seqnet) utilizing sequence connection, is the number of layers, is the initial input channel of Densenet, is the repeated input channel of Densenet, , Is the weight of the memory block 210 ( , ) can be indicated.

배치 정규화 및 활성화 함수와 관계없이 3D 컨볼루션 연산 시간을 고려할 때에, 커넬 크기가 동일하고, 인 상황에서, Seqnet에 포함된 L개의 시퀀스 블록에서 수행되는 MAC 연산의 수는 Densenet에 포함된 L 개레이어를 포함한 Dense 블록에서 수행되는 MAC 연산의 수보다 적게 측정된다(이 7 이상인 경우). 또한, 인 상황의 경우, L개의 시퀀스 블록(seqnet에 포함된 개별 블록)은 L개 레이어를 포함한 dense 블록(densenet의 개별블록)에 비해 10% 높은 효율(이 4인 경우)을 나타내거나, 25% 개선된 효율(이 5인 경우)를 나타낸다. 효율의 경우, Seqnet과 Densenet에서 수행되는 MAC 연산의 수에 기초하여 산출될 수 있다. 보다 구체적으로, 수학식 6에 기초하여 산출된 수치를 통해 Seqnet에서 10%의 효율이 개선된 것으로 판단될 수 있다.When considering 3D convolution computation time regardless of batch normalization and activation function, the kernel size is the same, In this situation, the number of MAC operations performed on L sequence blocks included in Seqnet is measured to be less than the number of MAC operations performed on Dense blocks including L layers included in Densenet ( is 7 or higher). also, In the case of the situation, L sequence blocks (individual blocks included in seqnet) have 10% higher efficiency ( If this is 4), or 25% improved efficiency ( indicates 5). Efficiency can be calculated based on the number of MAC operations performed in Seqnet and Densenet. More specifically, it can be determined that the efficiency of Seqnet has been improved by 10% through the values calculated based on Equation 6.

도 3은 도 1에 도시된 이미지 분류 장치가 이용하는 뉴럴 네트워크의 예를 나타낸다.FIG. 3 shows an example of a neural network used by the image classification device shown in FIG. 1.

도 4는 도 3에 도시된 뉴럴 네트워크의 스펙트럴 시퀀스 블록(Spectral Sequence Block)의 예를 나타내고, 도 5는 도 3에 도시된 공간 시퀀스 블록(Spatial Sequence Block)의 예를 나타낸다. 도 3 내지 도 6에 도시된 각각의 시퀀스 블록은 앞서 설명된 시퀀스 연결의 구조를 차용하여 설계될 수 있다.FIG. 4 shows an example of a spectral sequence block of the neural network shown in FIG. 3, and FIG. 5 shows an example of a spatial sequence block shown in FIG. 3. Each sequence block shown in FIGS. 3 to 6 can be designed by borrowing the structure of sequence connection described above.

도 6은 도 3에 도시된 뉴럴 네트워크의 채널 글로벌 컨텍스트 어텐션의 예를 나타내고, 도 7은 도 3에 도시된 뉴럴 네트워크의 포지션 글로벌 컨텍스트 어텐션의 예를 나타낸다.FIG. 6 shows an example of channel global context attention of the neural network shown in FIG. 3, and FIG. 7 shows an example of position global context attention of the neural network shown in FIG. 3.

도 3 내지 도 7을 참조하면, 프로세서(예: 도 1의 프로세서(200))는 시퀀스 연결(시퀀스 블록에 포함된 연결 구조)을 갖는 뉴럴 네트워크를 이용하여 이미지 분류를 수행할 수 있다. 뉴럴 네트워크는 공간 특징 및 스펙트럼 특징을 획득하기 위한 두 개의 브랜치를 포함할 수 있다. 각각의 브랜치는 스펙트럼 특징 및 공간 특징을 추출할 수 있다. 브랜치의 끝에서, 스펙트럼 특징 및 공간 특징이 병합되어 이미지(예: HSI(Hyperspectral Image)) 분류를 위한 벡터가 생성될 수 있다.Referring to FIGS. 3 to 7 , a processor (eg, processor 200 of FIG. 1 ) may perform image classification using a neural network with sequence connections (connection structure included in sequence blocks). The neural network may include two branches for acquiring spatial features and spectral features. Each branch can extract spectral features and spatial features. At the end of the branch, spectral features and spatial features can be merged to create a vector for image (e.g., Hyperspectral Image (HSI)) classification.

첫 번째 브랜치는 스펙트럼 시퀀스 블록(310)을 포함하고, 두 번째 브랜치는 공간 시퀀스 블록(330)을 포함할 수 있다. 스펙트럼 시퀀스 블록(310) 및 공간 시퀀스 블록(330)은 도 2에서 설명한 시퀀스 블록의 구조를 취할 수 있다.The first branch may include a spectral sequence block 310 and the second branch may include a spatial sequence block 330. The spectral sequence block 310 and the spatial sequence block 330 may have the structure of the sequence block described in FIG. 2.

프로세서(200)는 시퀀스 연결을 이용하여 레이어의 특징들 사이의 순차 관계(sequential relationship)(레이어 레벨 정보)를 이미지 분류에 고려할 수 있다. 컨볼루션의 출력 특징은 시간 t 데이터로 간주될 수 있다. 프로세서(200)는 뉴럴 네트워크를 통해 은닉 상태 및 시간 t 데이터를 이용하여, 레이어 정보의 순서를 포함하는 게이티드 특징들을 생성할 수 있다.The processor 200 may use sequence connection to consider sequential relationships (layer level information) between layer features in image classification. The output features of the convolution can be considered time t data. The processor 200 may generate gated features including the order of layer information using the hidden state and time t data through a neural network.

프로세서(200)는 어텐션 블록을 이용하여 추출한 스펙트럼 특징 및 공간 특징을 강화할 수 있다. 어텐션 블록은 채널 글로벌 컨텍스트 어텐션 블록(350) 및 포지션 글로벌 컨텍스트 어텐션 블록(370)을 포함할 수 있다.The processor 200 can enhance the extracted spectral features and spatial features using the attention block. The attention block may include a channel global context attention block 350 and a position global context attention block 370.

프로세서(200)는 글로벌 컨텍스트 어텐션 매커니즘(global context attention mechanism)을 이용하여 뉴럴 네트워크의 성능을 향상시킬 수 있다. 글로벌 컨텍스트 어텐션 매커니즘은 두 개의 어텐션 매커니즘(예: 논 로컬 블록(non-local block) 및 스퀴즈 익사이테이션 블록(squeeze-excitation block))을 병합한 매커니즘일 수 있다. 단순화된 논 로컬 블록은 긴 범위의 의존성을 캡처하고, 스퀴즈 익사이테이션 블록은 가벼운 연산 비용(light weight computation cost)를 가질 수 있다.The processor 200 can improve the performance of the neural network by using a global context attention mechanism. The global context attention mechanism may be a mechanism that merges two attention mechanisms (e.g., non-local block and squeeze-excitation block). Simplified non-local blocks capture long-range dependencies, and squeeze excitation blocks can have a light weight computation cost.

도 3 내지 도 7의 예시에서, 스트라이드가 (1, 1) 또는 (1, 1, 1)이고 패딩이 동일한 경우, 2 차원 또는 3 차원 컨볼루션 레이어에서 스트라이드(stride) 및 패딩(padding)이 생략될 수 있다.3 to 7, if the stride is (1, 1) or (1, 1, 1) and the padding is the same, the stride and padding are omitted in the two-dimensional or three-dimensional convolutional layer. It can be.

뉴럴 네트워크는 3 개의 스테이지로 구성될 수 있다. 3 개의 스테이지는 특징 추출 스테이지, 강화 스테이지 및 분류 스테이지를 포함할 수 있다. 뉴럴 네트워크에 입력되는 이미지는 패치 이미지를 포함할 수 있다. 예를 들어, 패치 이미지는 9×9×1,200의 크기를 갖는 HSI의 일부일 수 있다. 여기서, 9는 패치 크기를 의미하고, 200은 입력 데이터의 대역 크기(band size)를 의미할 수 있다.A neural network can be composed of three stages. The three stages may include a feature extraction stage, an enhancement stage, and a classification stage. Images input to the neural network may include patch images. For example, a patch image may be part of an HSI with a size of 9×9×1,200. Here, 9 may mean the patch size, and 200 may mean the band size of the input data.

패치 이미지는 두 개의 브랜치로 입력될 수 있고, 각각의 브랜치는 스펙트럼 특징 및 공간 특징에 대응될 수 있다. 프로세서(200)는 스펙트럼 시퀀스 블록(310)을 이용하여 스펙트럼 특징을 추출하고, 공간 시퀀스 블록(330)을 이용하여 공간 특징을 추출할 수 있다.A patch image may be input as two branches, and each branch may correspond to a spectral feature and a spatial feature. The processor 200 may extract spectral features using the spectral sequence block 310 and spatial features using the spatial sequence block 330.

프로세서(200)는 패치 이미지를 압축할 수 있다. 예를 들어, 프로세서(200)는 패치 이미지를 스트라이드 및 패딩이 (1, 1, 2)이고 유효한(valid) 1×1×7, 8 컨볼루션 연산을 통해 9×9×97, 8의 크기를 갖는 로 압축할 수 있다. 동시에, 프로세서(200)는 0을 채움으로써 를 생성할 수 있다.The processor 200 may compress the patch image. For example, the processor 200 converts the patch image into a size of 9×9×97,8 through a valid 1×1×7,8 convolution operation with stride and padding of (1, 1, 2). having It can be compressed. At the same time, the processor 200 fills in zeros. can be created.

는 스펙트럼 시퀀스 블록(310)을 의미하고, 및 는 각각 메모리 블록 및 생성 블록의 입력으로 사용될 수 있다. 메모리 블록의 입력은 각각의 입력에 대하여 1×1×1, 8 크기의 커널을 가지는 두 개의 컨볼루션 레이어에 의해 9×9×97, 8의 크기로 변형될 수 있다. refers to the spectral sequence block 310, and Can be used as inputs to the memory block and generation block, respectively. The input of the memory block can be transformed into a size of 9×9×97,8 by two convolutional layers with kernels of size 1×1×1,8 for each input.

변형된 h 및 x는 엘리먼트와이즈 썸(element-wise sum) 및 tanh 활성화 함수에 입력될 수 있다. 메모리 블록의 출력인 의 형태는 9×9×97, 8의 크기를 가질 수 있다. 생성 블록에서, 입력은 연결(concatenate) 연산을 통해 9×9×97, 24의 크기를 가질 수 있다. 연결된 입력은 배치 정규화, ReLU 활성화 함수 및 1×1×1, 16 컨볼루션을 포함하는 연산에 순차적으로 입력될 수 있다.The transformed h and x can be input to the element-wise sum and tanh activation functions. The output of the memory block is The shape may have a size of 9×9×97, 8. In the creation block, the input can have a size of 9×9×97, 24 through the concatenate operation. The concatenated inputs can be sequentially fed into operations including batch normalization, ReLU activation function, and 1×1×1, 16 convolution.

생성 블록의 최종 출력인 은 와 동일한 크기를 가질 수 있다.The final output of the creation block is silver It may have the same size as .

스펙트럼 시퀀스 블록(310)은 L 번 반복될 수 있고, L 번 반복 후의 출력은 및 일 수 있다. 스펙트럼 시퀀스 블록의 최종 단에서 출력은 연결되어 9×9×97, 24의 크기가 될 수 있다. 출력은 다시 배치 정규화, ReLU 활성화 함수 및 1×1×97, 24 컨볼루션 레이어에 순차적으로 입력될 수 있다. 스펙트럼 특징 추출 과정 및 파라미터는 표 1과 같을 수 있다.The spectral sequence block 310 can be repeated L times, and the output after repeating L times is and It can be. At the final stage of the spectrum sequence block, the output can be concatenated to have a size of 9×9×97, 24. The output can be sequentially fed back to batch normalization, ReLU activation function, and 1×1×97, 24 convolutional layers. The spectral feature extraction process and parameters may be as shown in Table 1.

횟레이어 이름Layer name 입력 크기input size 출력 크기output size 커널 크기kernel size Conv1Conv1 (9×9×1,200)(9×9×1,200) (9×9×97,16)(9×9×97,16) (1×1×1, 16)(1×1×1, 16) Conv3dConv3d -메모리 블록 -memory block (9×9×97,16) / (9×9×97,8)(9×9×97,16) / (9×9×97,8) (9×9×97,8)(9×9×97,8) (1×1×1, 8)(1×1×1, 8) ConvRNNConvRNN -생성 블록 -Create blocks (9×9×97,16) / (9×9×97,8)
(9×9×97,24)(9×9×97,16) / (9×9×97,8)
(9×9×97,24) (9×9×97,24)
(9×9×97,16)(9×9×97,24)
(9×9×97,16) -
(1×1×1,16)-
(1×1×1,16) Concat
BN/ReLU/Conv3dConcat
BN/ReLU/Conv3d Conv2Conv2 (9×9×97, 16)/(9×9×97, 8)
(9×9×97, 24)(9×9×97, 16)/(9×9×97, 8)
(9×9×97, 24) (9×9×97, 24)
(9×9×1, 24)(9×9×97, 24)
(9×9×1, 24) -
(1×1×97,24)-
(1×1×97,24) Concat
BN/ReLU/Conv3dConcat
BN/ReLU/Conv3d

프로세서(200)는 9×9×1, 24 크기의 스펙트럼 시퀀스 블록의 출력을 채널 글로벌 컨텍스트 어텐션 블록(350)에 입력하여 스펙트럼 특징을 강화할 수 있다. 스펙트럼 특징은 어그리게이션(aggregation), 변환(transformation) 및 강화(enhancement)를 통해 뚜렷해질 수 있다. 채널 글로벌 텍스트 어텐션 블록(350)은 표 2와 같이 구현될 수 있다.The processor 200 may strengthen spectral characteristics by inputting the output of a 9×9×1, 24-sized spectral sequence block to the channel global context attention block 350. Spectral features can be sharpened through aggregation, transformation and enhancement. The channel global text attention block 350 may be implemented as shown in Table 2.

레이어 이름layer name 입력 크기input size 출력 크기output size 커널 크기kernel size 레이어 연산layer operations 특징 어그리게이션Feature aggregation ( 9×9×24)( 9×9×1)
( 9×9×24)
(1×1×81)/(81×24)( 9×9×24)( 9×9×1)
(9×9×24)
(1×1×81)/(81×24) (9×9×1)
(1×1×81)
(81×24)
(1×1×24)(9×9×1)
(1×1×81)
(81×24)
(1×1×24) (1×1, 1)
-
-
-(1×1, 1)
-
-
- Conv2d
Reshape/Softmax
Reshape
Matrix multiplicationConv2d
Reshape/Softmax
Reshape
Matrix multiplication 특징 변환Feature conversion ( 1×1×24)( 1×1×24/r)
( 1×1×24/r)( 1×1×24)( 1×1×24/r)
(1×1×24/r) (1×1×24/r)
(1×1×24/r)
(1×1×24)(1×1×24/r)
(1×1×24/r)
(1×1×24) (1×1, 24/r)
-
(1×1, 24)(1×1, 24/r)
-
(1×1, 24) Conv2d/Dropout
LayerNorm/ReLU
Conv2dConv2d/Dropout
LayerNorm/ReLU
Conv2d 특징 강화Feature Enhancement (9×9×24)/( 1×1×24)(9×9×24)/( 1×1×24) (9×9×24)(9×9×24) -- Element-wise sumElement-wise sum

프로세서(200)는 스펙트럼 특징과 유사한 방식으로 공간 특징을 생성할 수 있다. 패치 이미지는 1Х1Х200, 16의 커널 크기를 가지고, 스트라이드는 (1,1,1)이고, 패딩은 유효(valid)한 컨볼루션 연산에 입력될 수 있다. 프로세서(200)는 9Х9Х1, 16의 크기를 갖는 출력 및 9Х9Х1, 8의 크기를 갖는 를 동시에 생성할 수 있다. 두 개의 특징들은 스펙트럼 시퀀스 블록과 유사하게 공간 시퀀스 블록(330)으로 입력될 수 있다. 공간 시퀀스 블록(330)의 모든 컨볼루션 커널의 크기는 스펙트럼 시퀀스 블록과 동일할 수 있다. 및 의 형태는 9Х9Х1, 16 및 9Х9Х1, 8일 수 있다.Processor 200 may generate spatial features in a manner similar to spectral features. The patch image has a kernel size of 1Х1Х200, 16, the stride is (1,1,1), and the padding can be input into a valid convolution operation. The processor 200 has an output size of 9Х9Х1, 16 and 9Х9Х1, having a size of 8 can be created simultaneously. The two features can be input into the spatial sequence block 330 similarly to the spectral sequence block. The size of all convolution kernels of the spatial sequence block 330 may be the same as that of the spectral sequence block. and The form can be 9Х9Х1, 16 and 9Х9Х1, 8.

공간 시퀀스 블록(330)은 L 번 반복될 수 있다. 공간 시퀀스 블록(330)은 표 3과 같이 구현될 수 있다.The spatial sequence block 330 may be repeated L times. The spatial sequence block 330 may be implemented as shown in Table 3.

횟레이어 이름Layer name 입력 크기input size 출력 크기output size 커널 크기kernel size Conv1Conv1 (9×9×1,200)(9×9×1,200) (9×9×1,16)(9×9×1,16) (1×1×1, 16)(1×1×1, 16) Conv3dConv3d -메모리 블록 -memory block (9×9×1,16) / (9×9×1,8)(9×9×1,16) / (9×9×1,8) (9×9×1,8)(9×9×1,8) (1×1×1, 8)(1×1×1, 8) ConvRNNConvRNN -생성 블록 -Create blocks (9×9×1,16) / (9×9×1,8)
(9×9×1,24)(9×9×1,16) / (9×9×1,8)
(9×9×1,24) (9×9×1,24)
(9×9×1,16)(9×9×1,24)
(9×9×1,16) -
(1×1×1,16)-
(1×1×1,16) Concat
BN/ReLU/Conv3dConcat
BN/ReLU/Conv3d ConcatConcat (9×9×1, 16)/(9×9×1, 8)(9×9×1, 16)/(9×9×1, 8) (9×9×1, 24)(9×9×1, 24) -- ConcatConcat

공간 시퀀스 블록(330)의 끝단에서 출력이 연결되어 9Х9Х1, 24 크기가 될 수 있다. 9Х9Х1, 24크기의 출력은 포지션 글로벌 컨텍스트 어텐션 블록(370)에 입력될 수 있다. 포지션 글로벌 컨텍스트 어텐션 블록(370)은 공간 특징을 강화하고 입력과 동일한 크기의 출력을 생성할 수 있다. 포지션 글로벌 컨텍스트 어텐션 블록(370)의 구현은 표 4와 같을 수 있다.The output may be connected at the end of the spatial sequence block 330 to have a size of 9Х9Х1, 24. The output of size 9Х9Х1, 24 can be input to the position global context attention block 370. The position global context attention block 370 can enhance spatial features and generate an output of the same size as the input. The implementation of the position global context attention block 370 may be as shown in Table 4.

레이어 이름layer name 입력 크기input size 출력 크기output size 커널 크기kernel size 레이어 연산layer operations 특징 어그리게이션Feature aggregation ( 9×9×24)( 1×1×24)
( 9×9×24)
(1×1×24)/(24×81)( 9×9×24)( 1×1×24)
(9×9×24)
(1×1×24)/(24×81) (1×1×24)
(1×1×24)
(24×81)
(1×1×81)(1×1×24)
(1×1×24)
(24×81)
(1×1×81) -
-
-
--
-
-
- GlobalAvgPooling
Softmax
Reshape/Transpose
Matrix multiplicationGlobalAvgPooling
Softmax
Reshape/Transpose
Matrix multiplication 특징 변환Feature conversion ( 1×1×81)( 1×1×81/r)
( 1×1×81/r)( 1×1×81)( 1×1×81/r)
(1×1×81/r) (1×1×81/r)
(1×1×81/r)
(1×1×81)(1×1×81/r)
(1×1×81/r)
(1×1×81) (1×1, 81/r)
-
(1×1, 81)(1×1, 81/r)
-
(1×1, 81) Conv2d/Dropout
LayerNorm/ReLU
Conv2dConv2d/Dropout
LayerNorm/ReLU
Conv2d 특징 강화Feature Enhancement (9×9×24)/( 9×9×1)(9×9×24)/( 9×9×1) (9×9×24)(9×9×24) -- Element-wise sumElement-wise sum

특징 분류 부분에서 브랜치들의 출력은 배치 정규화, ReLU 활성화 함수 및 글로벌 평균 풀링(global average pooling) 연산에 의해 1×24 크기의 벡터가 될 수 있다. 프로세서(200)는 출력을 병합(merge)하여 1×48의 크기를 갖는 1 차원 벡터를 생성하고, 전 연결 레이어 및 소프트 맥스 함수를 통과시킴으로써 패치 이미지에서 중앙 픽셀(center pixel)의 클래스(class)를 예측할 수 있다. 분류 연산의 구현은 표 5와 같을 수 있다.In the feature classification part, the output of the branches can be a vector of size 1×24 by batch normalization, ReLU activation function, and global average pooling operation. The processor 200 merges the outputs to generate a one-dimensional vector with a size of 1 × 48, and passes it through the previous connection layer and soft max function to determine the class of the center pixel in the patch image. can be predicted. The implementation of the classification operation may be as shown in Table 5.

레이어 이름layer name 입력 크기input size 출력 크기output size 레이어 연산layer operations 스펙트럼 특징 변환Spectral feature conversion (9×9×24)(9×9×24) (1×24)(1×24) BN/ReLU/GlobalAvgPoolingBN/ReLU/GlobalAvgPooling 공간 특징 변환Spatial feature transformation (9×9×24)(9×9×24) (1×24)(1×24) BN/ReLU/GlobalAvgPoolingBN/ReLU/GlobalAvgPooling 연결connection (1×24)/(1×24)(1×24)/(1×24) (1×48)(1×48) ConcatConcat 분류classification 1×481×48 (1×N_Class)(1×N _Class ) FC/SoftmaxFC/Softmax

여기서, N_class는 예측되는 클래스의 수를 의미할 수 있다.Here, N _class may mean the number of predicted classes.

도 8은 도 1에 도시된 이미지 분류 장치의 동작의 흐름도를 나타낸다.FIG. 8 shows a flowchart of the operation of the image classification device shown in FIG. 1.

도 8을 참조하면, 수신기(예: 도 1의 수신기(100))는 이미지를 수신할 수 있다(810).Referring to FIG. 8, a receiver (eg, receiver 100 of FIG. 1) may receive an image (810).

프로세서(200)는 이미지를 복수의 시퀀스 블록에 입력시킴으로써 스펙트럼 특징 및 공간 특징을 추출할 수 있다(830). 프로세서(200)는 이미지를 복수의 시퀀스 블록 중에서 스펙트럼 시퀀스 블록에 입력시킴으로써 스펙트럼 특징을 추출할 수 있다. 프로세서(200)는 이미지를 복수의 시퀀스 블록 중에서 공간 시퀀스 블록에 입력시킴으로써 공간 특징을 추출할 수 있다.The processor 200 may extract spectral features and spatial features by inputting images into a plurality of sequence blocks (830). The processor 200 may extract spectral features by inputting an image into a spectral sequence block among a plurality of sequence blocks. The processor 200 may extract spatial features by inputting an image into a spatial sequence block among a plurality of sequence blocks.

스펙트럼 시퀀스 블록 및 상기 공간 시퀀스 블록은 메모리 블록(memory block) 및 생성 블록(generation block)을 포함할 수 있다. 메모리 블록은 RNN, LSTM 등 시계열적 데이터를 처리하는 인공신경망을 포함할 수 있다.The spectral sequence block and the spatial sequence block may include a memory block and a generation block. The memory block may include an artificial neural network that processes time-series data, such as RNN or LSTM.

메모리 블록은 은닉 상태에 대한 컨볼루션 연산, 입력에 대한 컨볼루션 연산 및 엘리먼트 와이즈 섬(element-wise sum) 연산을 포함할 수 있다. 생성 블록은 스펙트럼 시퀀스 블록 및 공간 시퀀스 블록은 은닉 상태 및 입력에 대한 연결(concatenation) 및 배치 정규화 연산을 포함할 수 있다.The memory block may include a convolution operation on the hidden state, a convolution operation on the input, and an element-wise sum operation. The generation block may include a spectral sequence block and a spatial sequence block may include concatenation and batch normalization operations for hidden states and inputs.

프로세서(200)는 스펙트럼 특징 및 공간 특징에 어텐션을 수행함으로써 강화된 스펙트럼 특징(enhanced spectral feature) 및 강화된 공간 특징(enhanced spatial feature)을 생성할 수 있다(850).The processor 200 may generate an enhanced spectral feature and an enhanced spatial feature by performing attention on the spectral feature and spatial feature (850).

프로세서(200)는 강화된 스펙트럼 특징 및 강화된 공간 특징에 기초하여 이미지를 분류할 수 있다(870). 프로세서(200)는 강화된 스펙트럼 특징 및 강화된 공간 특징을 연결함으로써 1 차원 벡터를 생성할 수 있다. 프로세서(200)는 1 차원 벡터를 전 연결 레이어(fully connected layer)에 입력함으로써 이미지를 분류할 수 있다.Processor 200 may classify the image based on the enhanced spectral features and enhanced spatial features (870). Processor 200 may generate a one-dimensional vector by concatenating the enhanced spectral features and the enhanced spatial features. The processor 200 can classify the image by inputting a one-dimensional vector into a fully connected layer.

도 9는 일 실시예에 따른 이미지 분류 장치를 통해 수행된 이미지 분류를 다른 뉴럴 네트워크와 비교 설명하기 위한 도면이다.Figure 9 is a diagram for explaining and comparing image classification performed through an image classification device according to an embodiment with another neural network.

실험에 사용되는 테스트 이미지는 관련 분야 실험에 널리 활용되는 i) Indian Pines(IP), ii) University of pavia(UP) 및 iii) Salinas Valley(SV)일 수 있다. 실험에서는 IP 데이터 세트의 2%, UP 및 SV의 0.5%가 활용될 수 있으며, 실험 과정에서 학습(training), 검증(validation) 및 평가(testing)에 활용되는 개별 샘플의 상세 내용은 도 9에 도시된 바와 같다. 보다 구체적으로, 표(911)은 IP 데이터 샘플의 상세 내용, 표(912)은 UP 데이터 샘플의 상세 내용, 표(913)은 SV 데이터 샘플의 상세 내용을 나타낼 수 있다.Test images used in the experiment may be i) Indian Pines (IP), ii) University of pavia (UP), and iii) Salinas Valley (SV), which are widely used in experiments in related fields. In the experiment, 2% of the IP data set and 0.5% of the UP and SV can be used, and details of individual samples used for training, validation, and testing during the experiment are shown in Figure 9. As shown. More specifically, table 911 may indicate details of an IP data sample, table 912 may indicate details of a UP data sample, and table 913 may indicate details of an SV data sample.

일 실시예에 따른 이미지 분류 장치의 동작과 성능이 비교되는 뉴럴 네트워크는 i) RBF(radial basis function) 커넬을 포함하는 SVM(support vector machine), ii) DDCNN(deep&dense convolutional neural network), iii) SSRN(spectral-spatial residual network), iv) SSGCA(spectral and spatial global context attention)를 포함할 수 있다. 실험 과정에서 패치(patch), 배치(batch), 에포크(epoch)는 9, 64, 300으로 설정될 수 있다. 성능을 평가하기 위한 학습에 활용되는 크로스 엔트로피 함수는 수학식 7에 기재된 바와 같다.The neural networks with which the operation and performance of the image classification device are compared according to one embodiment are i) support vector machine (SVM) including a radial basis function (RBF) kernel, ii) deep&dense convolutional neural network (DDCNN), and iii) SSRN. (spectral-spatial residual network), iv) SSGCA (spectral and spatial global context attention). During the experiment, patch, batch, and epoch can be set to 9, 64, and 300. The cross entropy function used for learning to evaluate performance is as described in Equation 7.

CE은 크로스 엔트로피 함수, m은 정답 카테고리(ground truth category)의 개수, 는 정답(ground truth)에서의 클래스 번호, 는 네트워크로부터 소프트맥스 값의 확률을 나타낼 수 있다. 학습은 정답에 대한 지표의 확률을 최대화하는 방향으로 진행된다. 로컬 미니멈(local minimum), 오버 피팅을 방지하기 위하여, 드롭아웃(dropout), 코사인 어닐링(cosine annealing) 및 조기 종료(early stopping) 방식이 적용될 수 있다. 학습 과정에서, 네트워크에서 랜덤하게 선택된 요소들은 p(0.5로 설정)의 확률로 0으로 설정될 수 있다. 코사인 어닐링 방법은 수학식 8에 기초하여 학습률(learning rate)을 변형하는 것이다.CE is the cross entropy function, m is the number of ground truth categories, is the class number in the ground truth, can represent the probability of the softmax value from the network. Learning proceeds in the direction of maximizing the probability of the indicator for the correct answer. To prevent local minimum and overfitting, dropout, cosine annealing, and early stopping methods can be applied. During the learning process, randomly selected elements in the network may be set to 0 with probability p (set to 0.5). The cosine annealing method modifies the learning rate based on Equation 8.

는 에포크의 최대 횟수, 는 최소 학습률, 는 수행된 에포크의 횟수를 나타낼 수 있다. 동적으로 학습률을 조절함으로써, 로컬 미니멈 문제가 완화되며, 네트워크의 정확도는 향상될 수 있다. 또한, 검증 손실(validation loss)은 모든 에포크에서 측정되고, 40 에포크 동안 지속적으로 검증 손실이 감소하지 않는 경우, 훈련이 조기 종료(early stop)될 수 있다. is the maximum number of epochs, is the minimum learning rate, may represent the number of epochs performed. By dynamically adjusting the learning rate, the local minimum problem is alleviated and the network accuracy can be improved. Additionally, validation loss is measured at every epoch, and if validation loss does not decrease continuously for 40 epochs, training may be stopped early.

각각의 클래스에 대한 정확도, OA(overall accuracy), AA(average accuracy), 카파 계수(kappa coefficient)가 지표로 활용될 수 있다. OA는 테스트 샘플에서 TP(true positive)의 비율, AA는 클라스 당 정확도의 평균, 카파 계수는 정답(ground truth)과 예측된 클래스 사이의 일치의 정도를 나타낼 수 있다.Accuracy, overall accuracy (OA), average accuracy (AA), and kappa coefficient for each class can be used as indicators. OA is the proportion of true positives (TP) in the test sample, AA is the average accuracy per class, and Kappa coefficient can represent the degree of agreement between the ground truth and the predicted class.

도 10a 내지 도 10c는 IP, UP, SV 데이터 세트에 대한 분류 결과의 평균 및 분산을 나타내는 도면일 수 있다.FIGS. 10A to 10C may be diagrams showing the average and variance of classification results for IP, UP, and SV data sets.

일 실시예에 따른 이미지 분류 장치는 RNN 및 GRU를 메모리 블록에 활용할 수 있다. 실험 결과를 통해 확인할 수 있는 바와 같이 메모리 블록이 RNN으로 구성된 RNN-Seqnet이 가장 좋은 성능을 나타냈다. 보다 구체적으로, RNN-Seqnet은 95.03%의 OA를 나타내어 SSGCA에 비해 2.28%, SSRN에 비해 8.14%, SVM에 비해 29.98%, DDCNN에 비해 43.35%의 개선된 성능을 나타냈다(IP 데이터 세트에서 2%의 훈련 및 검증 샘플들).An image classification device according to an embodiment may utilize RNN and GRU in memory blocks. As can be seen from the experimental results, RNN-Seqnet, whose memory blocks are composed of RNNs, showed the best performance. More specifically, RNN-Seqnet exhibited an OA of 95.03%, showing improved performance of 2.28% over SSGCA, 8.14% over SSRN, 29.98% over SVM, and 43.35% over DDCNN (2% on IP dataset). of training and validation samples).

이미지 분류 장치에서 활용되는 RNN-Seqnet(메모리 블록이 RNN으로 구성) 및 GRU-Seqnet(메모리 블록이 GRU로 구성)은 특징-레벨 정보(feature-level information) 를 분류 프로세스에 반영할 수 있으며, SSGCA에 비하여 정확도가 향상되는 것을 확인할 수 있다. Seqnet은 OA, AA, 카파 계수에서 가장 높은 평균과 가장 낮은 분산을 나타낸다. 더욱이, 가장 낮은 수치(78.43%)와 가장 높은 수치(99.96%) 사이의 간격은 네트워크들 사이에서 RNN-Seqnet 및 GRU-Seqnet이 최저 값을 나타낸다. 또한, Seqnet은 모든 데이터에서 가장 일반적이고, 강건(robust)한 네트워크임을 실험을 통해 확인될 수 있다.RNN-Seqnet (memory block composed of RNN) and GRU-Seqnet (memory block composed of GRU) used in image classification devices can reflect feature-level information in the classification process, and SSGCA It can be seen that accuracy is improved compared to . Seqnet shows the highest mean and lowest variance in OA, AA, and Kappa coefficients. Moreover, the gap between the lowest value (78.43%) and the highest value (99.96%) shows that RNN-Seqnet and GRU-Seqnet have the lowest values among the networks. Additionally, it can be confirmed through experiments that Seqnet is the most common and robust network in all data.

도 10a 내지 도 10c의 하단부를 통해 확인할 수 있는 바와 같이, 이미지 분류 장치에서 활용되는 GRU-Seqnet 및 RNN-Seqnet이 정답과 가장 유사한(높은 정확도의) 분류 결과를 나타냄을 확인할 수 있다.As can be seen from the lower part of FIGS. 10A to 10C, it can be confirmed that GRU-Seqnet and RNN-Seqnet used in the image classification device produce classification results that are most similar to the correct answer (high accuracy).

도 11은 일 실시예에 따른 이미지 분류 장치의 세부 파라미터 변화에 따른 성능을 예시적으로 도시하는 도면이다.FIG. 11 is a diagram illustrating performance according to changes in detailed parameters of an image classification device according to an embodiment.

도 11의 (a)는 파라미터 r의 변화에 따른 OA 값을 나타낸다. 파라미터 r은 글로벌 컨텍스트 어텐션 메커니즘에서 압축되는 비율을 의미할 수 있으며, 보틀넥 비율(bottelneck ratio)로 지칭될 수 있다. 파라미터 r은 앞선 도 6 및 도 7에서 활용된 바가 있으며, 비용 효율(cost efficiency) 및 정확도를 결정할 수 있다. (a)는 네트워크 내의 시퀀스 블록의 갯수 L이 4로 미리 결정된 상황에서 파라미터 r의 변화에 따른 OA 값을 나타낸다. 일 실시예에 따른 이미지 분류 장치는 IP, UP, SV에서 OA 값의 평균이 최대인 파라미터 r값을 최적의 파라미터 r 값으로 결정할 수 있다. 제시된 실험 결과를 토대로, 이미지 분류 장치가 RNN-Seqnet로 구성되는 경우, 파라미터 r 값을 12로 결정할 수 있으며, GRU-Seqnet으로 구성되는 경우, 파라미터 r 값을 6으로 결정할 수 있다.Figure 11 (a) shows the OA value according to the change in parameter r. The parameter r may refer to the compression ratio in the global context attention mechanism and may be referred to as the bottleneck ratio. Parameter r has been used previously in FIGS. 6 and 7, and can determine cost efficiency and accuracy. (a) shows the OA value according to the change in parameter r in a situation where the number of sequence blocks L in the network is predetermined to be 4. The image classification device according to one embodiment may determine the parameter r value with the maximum average of OA values in IP, UP, and SV as the optimal parameter r value. Based on the presented experimental results, if the image classification device is composed of RNN-Seqnet, the parameter r value can be determined to be 12, and if the image classification device is composed of GRU-Seqnet, the parameter r value can be determined to be 6.

이미지 분류 장치는 (a)를 통해 결정한 최적의 파라미터 r값을 기초로하여 최적의 L 값을 결정할 수 있다. (b)는 (a)를 통해 결정된 파라미터 r값을 토대로 L 값의 변화에 따라 산출되는 OA 값을 나타낸다. 이미지 분류 장치는 OA의 평균값이 가장 큰 L 값을 최적의 L 값으로 결정할 수 있다. 이미지 분류 장치는 RNN-Seqnet이 활용되는 경우, L 값을 4로 결정할 수 있으며, GRU-Seqnet이 활용되는 경우, L 값을 4로 결정할 수 있다.The image classification device can determine the optimal L value based on the optimal parameter r value determined through (a). (b) shows the OA value calculated according to the change in L value based on the parameter r value determined through (a). The image classification device can determine the L value with the largest average value of OA as the optimal L value. If RNN-Seqnet is used, the image classification device can determine the L value to be 4, and if GRU-Seqnet is used, the L value can be determined to be 4.

기울기 소실(gradient vanishing) 문제 때문에, 레이어가 깊어질수록 GRU-Seqnet의 성능은 RNN-Seqnet의 성능보다 우월한 성능을 나타낼 수 있다. 도 11에서 큰 차이가 발생하지 않는 이유는 L 값이 3 내지 6으로 설정되어 충분히 레이어가 깊은 상황이 아니기 때문이다.Because of the gradient vanishing problem, the performance of GRU-Seqnet can be superior to that of RNN-Seqnet as the layer becomes deeper. The reason why there is no significant difference in FIG. 11 is because the L value is set to 3 to 6, so the layer is not deep enough.

RNN-Seqnet은 GRU-Seqnet에 비해 보다 유용한 특징(informative feature)를 추출하기 때문에, 간단한 구조에서는 RNN-Seqnet이 GRU-Seqnet보다 향상된 성능을 나타낼 수 있다.Because RNN-Seqnet extracts more useful features (informative features) than GRU-Seqnet, RNN-Seqnet can show improved performance than GRU-Seqnet in a simple structure.

도 12는 개별 요소 유무에 따른 이미지 분류 장치의 성능을 설명하기 위한 도면이다.Figure 12 is a diagram for explaining the performance of an image classification device according to the presence or absence of individual elements.

도 12의 (a)는 어텐션 블록(채널 글로벌 컨텍스트 어텐션 블록, 포지션 글로벌 컨텍스트 어텐션 블록) 유무에 따라 이미지 분류 장치의 성능(OA 값)을 나타낸다. (a)를 참고하면, SV 데이터 세트의 경우, 어텐션 블록 유무에 따라 이미지 분류 장치의 비교적 큰 성능 차이를 보이지만, IP, UP 데이터 세트의 경우 이미지 분류 장치의 성능 차이가 비교적 작게 나타난 것을 확인할 수 있다.Figure 12 (a) shows the performance (OA value) of the image classification device depending on the presence or absence of an attention block (channel global context attention block, position global context attention block). Referring to (a), in the case of the SV data set, there is a relatively large difference in the performance of the image classification device depending on the presence or absence of an attention block, but in the case of the IP and UP data sets, the performance difference of the image classifier is relatively small. .

(b)는 메모리 블록에서 공유되는 가중치(shared weight)를 활용하는 경우와 공유되지 않는 가중치(non-shared weight)를 활용하는 경우 각각에 대한 OA 값을 나타낸다. 일반적으로, NLP에서는 RNN 기반 모델들은 공유되는 가중치를 활용한다. 이는 입력 데이터의 길이가 매우 다양하기 때문이다. 공유되는 가중치를 활용하는 RNN은 데이터 사이의 관계를 학습하기 때문에, 추가적인 학습이 필요 없다. 이는 서로 다른 위치에서 발생되는 데이터와 관계없이, 모델이 프론트-투-백 관계(front-to-back relationship)를 학습하기 때문이다. 이는 공유되지 않는 가중치는 공유되는 가중치에 비해 데이터 위치에 더욱 집중도를 높일 수 있음을 의미한다.(b) shows the OA value for the case of using a shared weight and the case of using a non-shared weight in the memory block. Generally, in NLP, RNN-based models utilize shared weights. This is because the length of the input data varies greatly. RNNs that utilize shared weights learn relationships between data, so no additional learning is required. This is because the model learns a front-to-back relationship, regardless of data occurring in different locations. This means that non-shared weights can increase the concentration of data location more than shared weights.

(b)를 참조하면, 공유되지 않는 가중치를 메모리 블록에 활용하는 경우, 더 높은 성능을 발휘함을 확인할 수 있다.Referring to (b), it can be seen that higher performance is achieved when non-shared weights are used for memory blocks.

도 13은 학습 데이터 수의 변화에 따라 모델의 정확도 변화를 나타내는 도면이다.Figure 13 is a diagram showing the change in model accuracy according to the change in the number of learning data.

학습 데이터의 개수가 감소하는 경우, 분류의 정확도는 감소할 수 있으나, 정확도가 감소하는 정도는 모델마다 상이할 수 있다. 도 13의 (a)는 IP 데이터 세트를 활용하는 경우, 훈련 데이터의 비율(전체 데이터 중 훈련 데이터의 비율)에 따른 정확도 변화를 나타내고, (b)는 UP 데이터 세트를 활용하는 경우, 훈련 데이터 비율에 따른 정확도 변화를 나타내고, (c)는 SV 데이터 세트를 활용하는 경우, 훈련 데이터 비율에 따른 정확도 변화를 나타낸다. 전체적으로, 전체 데이터 세트 중 훈련 데이터의 비율이 커지는 경우 정확도가 향상되는 추세를 확인할 수 있으나, 이미지 분류 장치에 활용되는 GRU-Seqnet 및 RNN-Seqnet의 정확도 감소 비율이 가장 작은 것을 확인할 수 있다. 위 결과를 통해 GRU-Seqnet 및 RNN-Seqnet의 타 모델에 비해 강건함(robustness) 및 일반성(generalization) 측면에서 우수한 성능을 나타내는 것을 확인할 수 있다.When the number of learning data decreases, classification accuracy may decrease, but the degree to which accuracy decreases may vary for each model. Figure 13 (a) shows the accuracy change according to the ratio of training data (ratio of training data to total data) when using the IP data set, and (b) shows the training data ratio when using the UP data set. shows the accuracy change according to , and (c) shows the accuracy change according to the training data rate when using the SV data set. Overall, it can be seen that the accuracy is improving as the ratio of training data among the entire data set increases, but the accuracy decrease rate of GRU-Seqnet and RNN-Seqnet, which are used in image classification devices, is the smallest. Through the above results, it can be seen that GRU-Seqnet and RNN-Seqnet show excellent performance in terms of robustness and generalization compared to other models.

도 14a 및 도 14b는 학습 데이터의 종류에 따라 학습 가능한 파라미터의 수, 학습 시간 및 테스트 시간을 나타내는 도면이다.Figures 14a and 14b are diagrams showing the number of learnable parameters, learning time, and testing time according to the type of learning data.

도 14를 참고하면, RNN-Seqnet이 가장 가벼운(lightest) 모델임을 확인할 수 있다. 유사한 OA를 나타내는 SSGCA와 비교할 때에, RNN-Seqnet은 18%의 낮은 훈련 가능한 파라미터(trainable parameter)를 포함하며, 14~33%의 보다 빠른 추론이 IP, SV 데이터 세트에 대해 가능함을 확인할 수 있다.Referring to Figure 14, it can be seen that RNN-Seqnet is the lightest model. Compared to SSGCA, which shows similar OA, RNN-Seqnet contains 18% lower trainable parameters, and 14-33% faster inference is possible for IP and SV data sets.

Densenet과의 추론 시간 비교를 위하여 고안된 SSGCA기반의 DenseAT 모델( 및 가 16이고, 커넬 사이즈는 1)은 IP 데이터 세트에 대하여 635,539 개의 가중치를 가지고, 이는 RNN-Seqnet보다 크다. DenseAT와 RNN-Seqnet의 가장 큰 차이는 블록 뒤의 3D 컨볼루션에 있다. DenseAT의 입력 특징 맵의 채널은 쌓여진 레이어(stacked layer)에 의해 80개이며, 이에 따라 620,880개의 파라미터를 필요로 한다. 이와 달리, RNN-Seqnet은 24개의 채널을 유지할 수 있다. 엘레먼트 와이즈 합 및 tanh 활성화 함수가 3D 컨볼루션에 비해 메모리 블록에서 더 많은 시간을 소요하기 때문에, RNN-Seqnet은 이론적으로 10% 더 빠른 추론 시간을 나타내지만, 실제 실험에서는 도면에 도시된 바와 같이 좀 더 오랜 연산 시간을 나타내었다. SSGCA-based DenseAT model designed to compare inference time with Densenet ( and is 16, and the kernel size is 1) has 635,539 weights for the IP data set, which is larger than RNN-Seqnet. The biggest difference between DenseAT and RNN-Seqnet is It's in the 3D convolution behind the block. The channels of DenseAT's input feature map are 80 by stacked layers, and thus 620,880 parameters are required. In contrast, RNN-Seqnet can maintain 24 channels. Because elementwise sum and tanh activation functions spend more time on memory blocks compared to 3D convolution, RNN-Seqnet theoretically exhibits 10% faster inference time, but in actual experiments, it is somewhat slower as shown in the figure. It showed a longer calculation time.

도 14b는 DenseAT 및 RNN-Seqnet의 레이어의 수 L에 따른 추론 시간을 도시한다. 레이어가 깊어짐에 따라 RNN-Seqnet은 DenseAT에 비해 더 적은 시간이 소요된다. L이 5인 경우, IP 및 SV 데이터 세트에 대하여 RNN-Seqnet은 DenseAT에 비하여 더 적은 연산 시간이 소요된다. UP 데이터 세트와 관련하여, UP 데이터 세트는 더 적은 밴드를 포함하기 때문에, 엘레먼트 와이즈 합 및 tanh 활성화 함수의 연산 시간이 3D 컨볼루션 연산 시간에 비해 지배적으로 나타난다. DenseAT은 추론 시간을 극복하기 위하여 L이 7보다 커야 한다. 실험례를 확인할 때에, RNN-Seqnet은 Densenet에 포함된 dense block을 활용하는 SSGCA에 비하여 비교적 가볍고(light-weight) 비용이 절감되는(cost-efficient) 모델이며, 보다 정확도가 높고 속도가 빠른 것으로 확인된다.Figure 14b shows the inference time according to the number of layers L of DenseAT and RNN-Seqnet. As the layers get deeper, RNN-Seqnet takes less time compared to DenseAT. When L is 5, RNN-Seqnet takes less computation time than DenseAT for IP and SV data sets. Regarding the UP data set, since the UP data set contains fewer bands, the computation time of element wise sum and tanh activation function dominates compared to the 3D convolution computation time. DenseAT requires L to be greater than 7 to overcome the inference time. When checking the experimental example, it was confirmed that RNN-Seqnet is a relatively light-weight and cost-efficient model compared to SSGCA that utilizes the dense block included in Densenet, and has higher accuracy and faster speed. do.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), etc. , may be implemented using one or more general-purpose or special-purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. A processing device may execute an operating system (OS) and one or more software applications that run on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In a device for classifying images using a neural network,
A receiver that receives images; and
Extract spectral features and spatial features by inputting the image into a plurality of sequence blocks,
Generating an enhanced spectral feature and an enhanced spatial feature by performing attention on the spectral feature and the spatial feature,
a processor to classify the image based on the enhanced spectral features and the enhanced spatial features;
Including,
Extracting the spectral features by inputting the image into a spectral sequence block among the plurality of sequence blocks,
Extracting the spatial features by inputting the image into a spatial sequence block among the plurality of sequence blocks,
The spectral sequence block and the spatial sequence block are:
A memory block containing a convolution operation on the hidden state, a convolution operation on the input, and an element-wise sum operation; and
A generation block including concatenation and batch normalization operations for the hidden state and the input,
The memory block is,
Generate size-transformed input by two convolutional layers,
By inputting the size-transformed input into the element-wise sum and tanh activation functions, an output is generated,
The creation block is,
Apply the concatenate operation to the input to create a concatenated input,
An image classification device that generates output by sequentially feeding concatenated inputs into batch normalization, ReLU activation functions, and convolution operations.

delete

According to paragraph 1,
The processor,
Generating the enhanced spectral features by performing a channel global context attention operation on the spectral features,
Generating the enhanced spatial features by performing a position global context attention operation on the spatial features,
Image classification device.

According to paragraph 4,
The channel global context attention operation is,
Includes convolution operation, reshape operation, and layer normalization operation,
The position global context attention operation is,
Including convolution operation, pooling operation, reshaping operation and layer normalization operation,
Image classification device.

According to paragraph 1,
The processor,
generating a one-dimensional vector by connecting the enhanced spectral features and the enhanced spatial features,
Classifying the image by inputting the one-dimensional vector into a fully connected layer
Image classification device.

In a device for classifying images using a neural network,
A receiver that receives images; and
Generating feature information about the image by inputting the image into a plurality of sequence blocks,
Classify the image based on the feature information,
The sequence block is,
Apply the concatenate operation to the input to create a concatenated input,
a generation block that sequentially inputs the concatenated inputs into batch normalization, ReLU activation functions, and convolution operations to generate output;
The creation block is
Generate output based on Equation 1 and Equation 2,
Equation 1 above is:

ego,
Equation 2 above is:

, x is the input, h is the hidden state, l is the layer identification element, is batch normalization, ReLU is the activation function, represents the weight, image classification device.

In a device for classifying images using a neural network,
A receiver that receives images; and
Generating feature information about the image by inputting the image into a plurality of sequence blocks,
Classify the image based on the feature information,
The sequence block is,
Generate size-transformed input by two convolutional layers,
A memory block that generates an output by inputting the size-transformed input to an element-wise sum and tanh activation function,
The memory block is,
Generate output based on Equation 1,
Equation 1 above is:

ego,
x is the input, h is the hidden state, l is the layer identification element, and represents the weight of the memory block, an image classification device.

In a method of classifying images using a neural network, performed by an image classification device,
receiving an image;
extracting spectral features and spatial features by inputting the image into a plurality of sequence blocks;
generating an enhanced spectral feature and an enhanced spatial feature by performing attention on the spectral feature and the spatial feature;
Classifying the image based on the enhanced spectral features and the enhanced spatial features.
Including,
The step of extracting the features is,
Extracting the spectral features by inputting the image into a spectral sequence block among the plurality of sequence blocks,
Extracting the spatial features by inputting the image into a spatial sequence block among the plurality of sequence blocks,
The spectral sequence block and the spatial sequence block are:
A memory block containing a convolution operation on the hidden state, a convolution operation on the input, and an element-wise sum operation; and
A generation block including concatenation and batch normalization operations for the hidden state and the input,
The memory block is,
Generate size-transformed input by two convolutional layers,
By inputting the size-transformed input into the element-wise sum and tanh activation functions, an output is generated,
The creation block is,
Apply the concatenate operation to the input to create a concatenated input,
An image classification method that produces output by sequentially inputting concatenated inputs into batch normalization, ReLU activation functions, and convolution operations.

In a method of classifying images using a neural network, performed by an image classification device,
receiving an image;
Generating feature information about the image by inputting the image into a plurality of sequence blocks; and
Classifying the image based on the feature information
Including,
The sequence block is,
Contains a memory block or generation block,
The step of extracting the feature information is,
Extract spectral features by inputting the image into a spectral sequence block among the plurality of sequence blocks,
Extracting spatial features by inputting the image into a spatial sequence block among the plurality of sequence blocks,
The spectral sequence block and the spatial sequence block are:
A memory block containing a convolution operation on the hidden state, a convolution operation on the input, and an element-wise sum operation; and
A generation block including concatenation and batch normalization operations for the hidden state and the input,
The memory block is,
Generate size-transformed input by two convolutional layers,
By inputting the size-transformed input into the element-wise sum and tanh activation functions, an output is generated,
The creation block is,
Apply the concatenate operation to the input to create a concatenated input,
The concatenated inputs are sequentially fed into batch normalization, ReLU activation function, and convolution operations to generate output;
The memory block is,
It includes at least one of a generation block that generates an output based on Equation 1 and Equation 2, or a memory block that generates an output based on Equation 3,
Equation 1 above is:

ego,
Equation 2 above is:

ego,
Equation 3 above is:

ego,
x is the input, h is the hidden state, l is the layer identification element, is batch normalization, ReLU is the activation function, is the weight of the creation block, and represents the weight of the memory block, image classification method.