KR20230103163A

KR20230103163A - Modular neural network nasnet based feature extraction method and system for image classification

Info

Publication number: KR20230103163A
Application number: KR1020210193821A
Authority: KR
Inventors: 조인휘; 무왐바카송고다후다
Original assignee: 한양대학교 산학협력단
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2023-07-07

Abstract

이미지 분류를 위한 Modular Neural Network NASNet 기반 특징 추출 방법 및 시스템이 개시된다. 일 실시예에 따른 모듈의 입력 수와 출력 수에 따라 구성된 복수 개의 모듈을 포함하는 모듈식 뉴럴 네트워크에 입력 이미지를 입력받는 단계; 및 상기 모듈식 뉴럴 네트워크를 통해 상기 입력 이미지의 이미지 분류를 위한 특징 정보를 추출하는 단계를 포함하고, 상기 모듈식 뉴럴 네트워크는, 복수 개의 모듈로 구성된 복수 개의 뉴럴 네트워크 모델이 직렬로 연결되어 이미지 분류를 위한 특징 정보가 추출되도록 학습된 것일 수 있다. A feature extraction method and system based on Modular Neural Network NASNet for image classification are disclosed. receiving an input image to a modular neural network including a plurality of modules configured according to the number of inputs and outputs of modules according to an embodiment; and extracting feature information for image classification of the input image through the modular neural network, wherein the modular neural network includes a plurality of neural network models composed of a plurality of modules connected in series to classify the image. It may be learned to extract feature information for.

Description

Modular Neural Network NASNet-based Feature Extraction Method and System for Image Classification

아래의 설명은 이미지 분류를 위한 특징 추출 기술에 관한 것이다.The description below relates to feature extraction techniques for image classification.

모듈형 뉴럴 네트워크(Neural Networks)는 복수 개의 모듈을 통해 함께 연결된 여러 신경망 모델로 구성된다. 이러한 모듈형 뉴럴 네트워크를 사용하면 보다 기본적인 신경망 시스템을 보다 간단하게 관리하고 처리할 수 있다. 이 경우, 여러 뉴럴 신경망이 모듈로 작동하여 각각 문제의 일부를 해결한다. Modular Neural Networks consist of several neural network models connected together through a plurality of modules. These modular neural networks make managing and handling more basic neural network systems simpler. In this case, several neural networks work as modules, each solving part of the problem.

모듈형 뉴럴 네트워크는 문제 해결을 위해 다양한 뉴럴 네트워크를 이용한다. 여기서 다양한 뉴럴 네트워크는 문제의 일부를 해결하기 위한 모듈 역할을 수행한다. 네트워크의 토폴로지는 서로 다른 노드와 네트워크 내의 모듈들은 서로 연결된다. 모델의 전체 아키텍처에 대한 순차 토폴로지는 직렬로 연결된 여러 하이브리드(Hybrid)로 구성된다. Modular neural networks use various neural networks to solve problems. Here, various neural networks play the role of modules to solve part of the problem. The topology of the network is that different nodes and modules in the network are connected to each other. The sequential topology of the overall architecture of the model consists of several hybrids connected in series.

이미지 분류를 위한 모듈식 뉴럴 네트워크 NASNet 기반 특징 추출 방법 및 시스템을 제공할 수 있다. A modular neural network NASNet-based feature extraction method and system for image classification can be provided.

이미지 분류를 위한 특징 추출 방법은, 모듈의 입력 수와 출력 수에 따라 구성된 복수 개의 모듈을 포함하는 모듈식 뉴럴 네트워크에 입력 이미지를 입력받는 단계; 및 상기 모듈식 뉴럴 네트워크를 통해 상기 입력 이미지의 이미지 분류를 위한 특징 정보를 추출하는 단계를 포함하고, 상기 모듈식 뉴럴 네트워크는, 복수 개의 모듈로 구성된 복수 개의 뉴럴 네트워크 모델이 직렬로 연결되어 이미지 분류를 위한 특징 정보가 추출되도록 학습된 것일 수 있다.A feature extraction method for image classification includes receiving an input image to a modular neural network including a plurality of modules configured according to the number of inputs and outputs of the modules; and extracting feature information for image classification of the input image through the modular neural network, wherein the modular neural network includes a plurality of neural network models composed of a plurality of modules connected in series to classify the image. It may be learned to extract feature information for.

상기 모듈식 뉴럴 네트워크는, 입력 이미지를 포워드 전파하고 사전 지정된 계층에서 정지시키며, 계층의 출력을 특징으로 사용하기 위해 사전 훈련된 뉴럴 네트워크를 특징 추출기로 사용하도록 구축된 것일 수 있다. The modular neural network may be constructed to forward propagate an input image, stop it at a pre-specified layer, and use the pre-trained neural network as a feature extractor to use the output of the layer as a feature.

상기 모듈식 뉴럴 네트워크는, 인코더와 디코더로 구성되어 노이즈가 포함된 이미지에 대한 이미지 전처리를 통해 노이즈가 제거 이미지를 출력하는 제1 모듈, 사전 훈련된 뉴럴 네트워크에서 전이 학습을 통해 특징 정보를 추출하는 제2 모듈, 상기 추출된 특징 정보를 입력 데이터로 사용하여 이미지를 분류하는 제3 모듈을 포함할 수 있다. The modular neural network consists of an encoder and a decoder, and a first module outputs a denoised image through image preprocessing on an image containing noise, and extracts feature information through transfer learning from a pretrained neural network. It may include a second module and a third module for classifying an image by using the extracted feature information as input data.

상기 제2 모듈은, 복수 개의 컨볼루션 계층, 최대 풀링 계층 및 드롭아웃 계층을 포함하는 컨볼루션 블록을 포함하는 뉴럴 네트워크의 아키텍처를 포함할 수 있다. The second module may include an architecture of a neural network including a convolution block including a plurality of convolution layers, a maximum pooling layer, and a dropout layer.

상기 제2 모듈은, 노이즈가 제거된 이미지가 적어도 하나 이상의 컨볼루션 블록을 포함하는 뉴럴 네트워크와 덴스 계층을 통과함에 따라 특징 정보를 학습할 수 있다. The second module may learn feature information as the noise-removed image passes through a neural network including at least one convolution block and a dense layer.

상기 제2 모듈은, 덴스 계층이 제거된 복수 개의 컨볼루션 블록을 포함하는 뉴럴 네트워크를 통해 특징 정보가 추출되는 동안 전이 학습을 통해 새로운 트레이닝 이미지를 학습할 수 있다. The second module may learn a new training image through transfer learning while feature information is extracted through a neural network including a plurality of convolution blocks from which dense layers are removed.

상기 제2 모듈은, 상기 덴스 계층이 제거된 복수 개의 컨볼루션 블록의 각각에서 풀링 계층을 제거하고 최대 풀링 계층으로 대체하고, 상기 대체된 최대 풀링 계층을 통해 특징 정보를 추출할 수 있다. The second module may remove a pooling layer from each of the plurality of convolution blocks from which the dense layer is removed, replace it with a maximum pooling layer, and extract feature information through the replaced maximum pooling layer.

상기 제3 모듈은, 상기 제2 모듈을 통해 출력된 특징 정보를 입력 데이터로 사용하여 새로운 이미지를 인식할 수 있는 분류기를 획득하기 위해 기계 학습 모델을 학습시킬 수 있다. The third module may train a machine learning model to acquire a classifier capable of recognizing a new image using feature information output through the second module as input data.

상기 제3 모듈은, 확률적 경사하강법(Stochastic Gradient Descent; SGD), 랜덤 포레스트(Random Forest; RF), 서포트 벡터 머신(Support Vector Machine; SVM), 로지스틱 회귀(Logistic Regression; LR), 의사결정 트리(Decision Trees; DT)를 포함하는 어느 하나의 분류 알고리즘을 통해 기계 학습 모델을 학습시킬 수 있다.The third module, Stochastic Gradient Descent (SGD), Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Decision Making A machine learning model may be trained through any one classification algorithm including Decision Trees (DT).

이미지 분류를 위한 특징 추출 방법 방법을 상기 특징 추출 시스템에 실행시키기 위해 비-일시적인 컴퓨터 판독가능한 기록 매체에 저장되는 컴퓨터 프로그램을 포함할 수 있다. A feature extraction method for image classification may include a computer program stored in a non-transitory computer readable recording medium to execute the method in the feature extraction system.

특징 추출 시스템은, 모듈의 입력 수와 출력 수에 따라 구성된 복수 개의 모듈을 포함하는 모듈식 뉴럴 네트워크에 입력 이미지를 입력받는 이미지 입력부; 및 상기 모듈식 뉴럴 네트워크를 통해 상기 입력 이미지의 이미지 분류를 위한 특징 정보를 추출하는 특징 정보 추출부를 포함하고, 상기 모듈식 뉴럴 네트워크는, 복수 개의 모듈로 구성된 복수 개의 뉴럴 네트워크 모델이 직렬로 연결되어 이미지 분류를 위한 특징 정보가 추출되도록 학습된 것일 수 있다. The feature extraction system includes an image input unit for receiving an input image to a modular neural network including a plurality of modules configured according to the number of inputs and outputs of the modules; and a feature information extractor extracting feature information for image classification of the input image through the modular neural network, wherein the modular neural network includes a plurality of neural network models composed of a plurality of modules connected in series. Feature information for image classification may be learned to be extracted.

모듈식 뉴럴 네트워크를 통해 복잡한 인공지능 문제를 해결하고, 학습 속도를 향상시킬 뿐만 아니라 특징 추출량을 감소시킬 수 있다. Modular neural networks can solve complex artificial intelligence problems, improve learning speed, and reduce the amount of feature extraction.

모듈식 뉴럴 네트워크를 사용함으로써 이미지 분류의 정확도를 높이고, 컴퓨터 전력의 소비량을 감소시킬 수 있다. By using a modular neural network, the accuracy of image classification can be increased and the consumption of computer power can be reduced.

도 1은 일 실시예에 따른 특징 추출 시스템에서 모듈식 뉴럴 네트워크의 구조를 설명하기 위한 도면이다.
도 2는 일 실시예에 있어서, 모듈식 뉴럴 네트워크에 구성된 제1 모듈을 설명하기 위한 도면이다.
도 3 내지 도 8은 일 실시예에 있어서, 모듈식 뉴럴 네트워크에 구성된 제2 모듈을 설명하기 위한 도면이다.
도 9는 일 실시예에 있어서, 모듈식 뉴럴 네트워크에 구성된 제3 모듈을 설명하기 위한 도면이다.
도 10은 일 실시예에 따른 특징 추출 시스템의 구성을 설명하기 위한 블록도이다.
도 11은 일 실시예에 따른 특징 추출 시스템에서 이미지 분류를 위한 특징 추출 방법을 설명하기 위한 흐름도이다. 1 is a diagram for explaining the structure of a modular neural network in a feature extraction system according to an embodiment.
2 is a diagram for explaining a first module configured in a modular neural network according to an embodiment.
3 to 8 are diagrams for explaining a second module configured in a modular neural network according to an embodiment.
9 is a diagram for explaining a third module configured in a modular neural network according to an embodiment.
10 is a block diagram for explaining the configuration of a feature extraction system according to an exemplary embodiment.
11 is a flowchart illustrating a feature extraction method for image classification in a feature extraction system according to an exemplary embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다. Hereinafter, an embodiment will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 특징 추출 시스템에서 모듈식 뉴럴 네트워크의 구조를 설명하기 위한 도면이다. 1 is a diagram for explaining the structure of a modular neural network in a feature extraction system according to an embodiment.

특징 추출 시스템은 모듈의 입력 수와 출력 수에 따라 구성된 복수 개의 모듈을 포함하는 모듈식 뉴럴 네트워크에 입력 이미지를 입력받고, 모듈식 뉴럴 네트워크를 통해 입력 이미지의 이미지 분류를 위한 특징 정보를 추출할 수 있다. The feature extraction system receives an input image to a modular neural network including a plurality of modules configured according to the number of inputs and outputs of the modules, and extracts feature information for image classification of the input image through the modular neural network. there is.

이러한, 모듈식 뉴럴 네트워크는 복수 개의 모듈로 구성된 복수 개의 뉴럴 네트워크 모델이 직렬로 연결되어 이미지 분류를 위한 특징 정보가 추출되도록 학습된 것일 수 있다. Such a modular neural network may be learned such that a plurality of neural network models composed of a plurality of modules are serially connected to extract feature information for image classification.

모듈은 3-튜플로 정의된 다층(multilayer) 피드포워드 뉴럴 네트워크이다. A module is a multilayer feedforward neural network defined as 3-tuples.

M = (a, b, H)M = (a, b, H)

여기서, a는 모듈의 입력 수이고, b는 출력 노드의 수이고, H는 각 은닉층의 뉴런 수를 포함한다. Here, a is the number of inputs of the module, b is the number of output nodes, and H includes the number of neurons in each hidden layer.

예를 들면, 8개의 입력, 첫번째 은닉층에 12개의 뉴런, 두 번째 은닉층에 10개의 뉴런, 그리고 네 개의 출력을 가진 다층 퍼셉트론 모듈 M은 M(8, 4, [12, 10])으로 설명될 수 있다. 은닉층의 수, 뉴런의 수와 같은 뉴럴 네트워크의 내부 구조는 각 은닉층에서 전체 아키텍처와 무관하게 선택될 수 있다. For example, a multilayer perceptron module M with 8 inputs, 12 neurons in the first hidden layer, 10 neurons in the second hidden layer, and 4 outputs can be described as M(8, 4, [12, 10]). there is. The internal structure of the neural network, such as the number of hidden layers and the number of neurons, can be selected independently of the overall architecture in each hidden layer.

실시예에서는 3 개의 모듈을 포함하는 모듈식 뉴럴 네트워크에 대하여 예를 들어 설명하기로 한다. 모듈식 뉴럴 네트워크는 특정 하위 작업에 대해 복수 개의 독립 뉴럴 네트워크(신경망)가 동시에 학습되고, 학습을 통해 획득된 결과가 단일 작업을 수행하기 위해 마지막에 결합될 수 있다.In the embodiment, a modular neural network including three modules will be described as an example. In a modular neural network, a plurality of independent neural networks (neural networks) are simultaneously trained for a specific sub-task, and the results obtained through learning can be finally combined to perform a single task.

모듈식 뉴럴 네트워크(Modular Neural Networks; MNN): 모듈 1+모듈 2+모듈 3Modular Neural Networks (MNN): module 1+module 2+module 3

모듈 1(101)은 노이즈가 포함된 이미지를 입력 데이터로 인식하고, 노이즈가 포함된 이미지로부터 노이즈가 제거된 이미지를 출력 데이터로 생성하는 디노이즈 오토인코더(Denoising Autoencoder; DAE) 이다. Module 1 (101) is a denoising autoencoder (DAE) that recognizes an image containing noise as input data and generates an image from which noise is removed as output data.

모듈 2(102)는 사전 훈련된 뉴럴 네트워크를 통해 특징 추출에 사용하는 전이 학습을 수행한다. 각 이미지의 특징 정보를 추출함에 있어서 사전 훈련된 뉴럴 네트워크(NASNet, NASNet-Large)가 특징 추출기로 사용될 수 있다. Module 2 (102) performs transfer learning using a pre-trained neural network for feature extraction. In extracting the feature information of each image, a pre-trained neural network (NASNet, NASNet-Large) can be used as a feature extractor.

모듈 3(103)은 추출된 특징 정보를 이미지 분류를 위한 입력 데이터로 사용하는 분류기(새로운 모델)이다. Module 3 (103) is a classifier (new model) that uses the extracted feature information as input data for image classification.

모듈식 뉴럴 네트워크는 간단하게 기술 조합(둘 이상의 네트워크를 빌딩 블록으로 사용할 수 있음), 효율성, 확장성, 모델 복잡성 감소, 견고성 및 증분성(결합된 네트워크는 점진적으로 성장함)의 기술적 효과를 도출할 수 있다. Modular neural networks are simply a combination of technologies (two or more networks can be used as building blocks), efficiency, scalability, reduced model complexity, robustness and incrementality (combined networks grow incrementally). can

특징 추출은 기존 특징 정보에서 새로운 특징 정보를 생성하여 데이터셋의 특징량을 줄이는 기법이다. 실시예에서는 입력 이미지를 포워드 전파하고 사전 지정된 계층에서 정지하며 계층의 출력을 특징 정보로 사용하기 위해 사전 훈련된 뉴럴 네트워크(예를 들면, NASNet)를 특징 추출기로 사용하는 모듈식 뉴럴 네트워크 기반 특징 추출 접근 방식에 대하여 설명하기로 한다. 이러한 모듈식 뉴럴 네트워크 기반 특징 추출 접근 방식에 의하여 추출된 특징 정보를 이용하여 분류기를 학습시킬 수 있다. Feature extraction is a technique that reduces the feature amount of a dataset by generating new feature information from existing feature information. In the embodiment, modular neural network-based feature extraction using a pre-trained neural network (eg, NASNet) as a feature extractor to forward propagate an input image, stop at a pre-specified layer, and use the output of the layer as feature information. We will explain the approach. The classifier can be trained using the feature information extracted by this modular neural network-based feature extraction approach.

정확한 이미지 분류를 위해 제안된 딥러닝 기반 특징 추출 접근법은 (1) 이미지 전처리, (2) 노이즈가 제거된 이미지에서 특징 정보 추출, (3) 이미지 분류를 포함하는 3단계로 수행될 수 있다. The proposed deep learning-based feature extraction approach for accurate image classification can be performed in three steps including (1) image preprocessing, (2) feature information extraction from denoised images, and (3) image classification.

도 2를 참고하면, 모듈 1을 설명하기 위한 도면이다. 모듈 1에서는 노이즈가 포함된 이미지를 분석하고 깨끗한 이미지를 생성한다. 입력에 노이즈(예를 들면, 가우시안)를 추가하면 모델이 데이터가 중요한 특징을 학습하게 된다. 노이즈 제거 오토인코더의 경우, 확률적으로 입력 벡터에 노이즈가 추가되어 데이터가 부분적으로 손상된다. 모델은 손상되지 않은 원본, 즉 손상되지 않은 데이터 포인트가 예측되어 출력 데이터로 출력되도록 학습될 수 있다. 예를 들면, 데이터 세트에서 입력이 샘플링될 수 있고, 입력의 손상된 버전은 확률적 매핑

에서 샘플링될 수 있다.

는 학습 예제로 사용될 수 있다. 노이즈 제거 오토인코더는 음의 로그 가능성에 대한 기울기 기반 근사 최소화로 학습할 수 있는 피드포워드 네트워크이다. Referring to FIG. 2, it is a diagram for explaining module 1. In module 1, images containing noise are analyzed and clean images are generated. Adding noise (e.g. Gaussian) to the input allows the model to learn features that matter to the data. In the case of a denoising autoencoder, noise is added to the input vector stochastically, partially corrupting the data. The model can be trained so that the undamaged original, i.e., uncorrupted data points, are predicted and output as output data. For example, an input may be sampled from a data set, and a corrupted version of the input may be mapped to a stochastic

can be sampled at

can be used as a learning example. A denoising autoencoder is a feedforward network that can be trained by minimizing a gradient-based approximation to the negative log likelihood.

도 3 내지 도 8을 참고하면, 모듈 2를 설명하기 위한 도면이다. 특징 추출(Feature Extraction)은 최적의 형상을 추출하는 데 도움이 된다. 변수를 선택하여 형특징에 결합함으로써, 데이터의 양을 줄일 수 있다.Referring to FIGS. 3 to 8 , they are diagrams for explaining module 2. Feature Extraction helps extract the optimal shape. By selecting variables and binding them to type features, you can reduce the amount of data.

도 3을 참고하면, 이미지 분류를 위한 뉴럴 네트워크 아키텍처 설계에 관한 것으로, 모듈 2에 복수 개의 컨볼루션 계층, 맥스 풀링 계층 및 드롭아웃 계층을 포함하는 컨볼루션 블록(201)이 구성될 수 있다. 입력 계층에 노이즈가 제거된 이미지가 입력 데이터로 입력될 수 있다. 노이즈가 제거된 이미지가 컨볼루션 블록 및 덴스 계층을 통과함에 따라 특징 정보가 학습될 수 있다. 이에, 출력 계층에 노이즈가 제거된 이미지에 대한 클래스(예를 들면, 자전거)가 출력 데이터로 출력될 수 있다. Referring to FIG. 3 , it relates to a neural network architecture design for image classification. In module 2, a convolution block 201 including a plurality of convolution layers, a max pooling layer, and a dropout layer may be configured. An image from which noise is removed in the input layer may be input as input data. As the noise-removed image passes through the convolution block and the dense layer, feature information may be learned. Accordingly, a class (eg, bicycle) of an image from which noise has been removed may be output as output data in the output layer.

도 4를 참고하면, 이미지 분류를 위한 뉴럴 네트워크에 대한 일반적인 설계에 관한 것으로, 도 3에서 설명된 3개의 계층이 하나의 블록(컨볼루션 블록)으로 그룹화될 수 있다. 컨볼루션 블록(201)은 복수 개의 컨볼루션 계층, 맥스 풀링 계층 및 드롭아웃 계층에 의해 형성될 수 있다. Referring to FIG. 4, it relates to a general design of a neural network for image classification, and the three layers described in FIG. 3 may be grouped into one block (convolutional block). The convolution block 201 may be formed by a plurality of convolution layers, max pooling layers, and dropout layers.

도 5를 참고하면, 복수 개의 컨볼루션 블록이 구성된 뉴럴 네트워크 아키텍처에 관한 것이다. 입력 이미지를 입력받을 수 있다. 이때, 입력 이미지는 노이즈가 제거된 이미지일 수 있다. 복수 개의 컨볼루션 블록이 구성된 뉴럴 네트워크는 입력 이미지의 다양한 측면/개체의 중요성(학습 가능한 가중치 및 편향)을 할당하여 한 측면을 다른 측면과 구별할 수 있는 컨볼루션 네트워크이다. Referring to FIG. 5 , it relates to a neural network architecture composed of a plurality of convolution blocks. You can receive an input image. In this case, the input image may be an image from which noise is removed. A neural network composed of multiple convolutional blocks is a convolutional network that can distinguish one aspect from another by assigning importance (learnable weights and biases) to various aspects/objects of an input image.

도 6을 참고하면, 복수 개의 컨볼루션 블록이 구성된 뉴럴 네트워크에서 덴스 계층이 제거된 뉴럴 네트워크 아키텍처에 관한 것이다. NASNet으로 특징 정보를 추출하는 동안 전이 학습이 수행될 수 있다. 덴스 계층이 제거된 복수 개의 컨볼루션 블록을 포함하는 뉴럴 네트워크를 통해 특징 정보가 추출되는 동안 전이 학습을 통해 새로운 트레이닝 이미지를 학습시킬 수 있다. 추출된 특징 정보는 파일에 저장될 수 있다.Referring to FIG. 6 , it relates to a neural network architecture in which a dense layer is removed from a neural network composed of a plurality of convolution blocks. Transfer learning can be performed while extracting feature information with NASNet. While feature information is extracted through a neural network including a plurality of convolution blocks from which dense layers have been removed, a new training image may be learned through transfer learning. The extracted feature information may be stored in a file.

NASNet은 신경 검색 아키텍처(NAS) 네트워크의 약자이며 기계 학습 모델이다. 최고의 CNN 아키텍처를 찾는 문제를 강화 학습 문제로 프레임화한 NASNet이 도입되었다. CNN의 진화는 Deeper is Better,　Architecture Engineering　and　AutoML인 3단계를 거쳤다. 각 단계에는 대표적인 네트워크 아키텍처가 있다. NASNet stands for Neural Search Architecture (NAS) Network and is a machine learning model. NASNet, which frames the problem of finding the best CNN architecture as a reinforcement learning problem, was introduced. The evolution of CNN has gone through three stages: Deeper is Better, 　Architecture Engineering　and　AutoML. Each phase has a representative network architecture.

전이 학습은 문제에 대해 구축된 모델을 몇 가지 요인에 따라 다른 문제에 다시 재사용하는 프로세스이다. 전이 학습을 사용하면 학습 시간과 리소스가 절약되고 작은 데이터 가용성 문제도 해결된다. 특징 추출기로서의 NASNet은 완전 연결 계층이 제거된 사전 훈련된 NASNet에서 이미지 넷(ImageNet) 데이터를 학습시킨다. Transfer learning is the process of reusing a model built for a problem for another problem depending on some factors. Using transfer learning saves training time and resources and solves small data availability issues. NASNet as a feature extractor trains ImageNet data on pre-trained NASNet with fully connected layers removed.

도 7을 참고하면, 모듈식 뉴럴 네트워크를 나타낸 도면이다. 직렬로 연결된 복수 개의 모듈로 구성된 순차적 토폴로지가 있는 모듈식 뉴럴 네트워크이다. 모듈식 뉴럴 네트워크에는 복수 개의 레이어가 직렬로 연결될 수 있다. 사전 훈련된 특징 추출기(모듈 2)(102)와 새로운 모델(모듈 3)(103)이 직렬적으로 연결될 수 있다.Referring to FIG. 7 , it is a diagram illustrating a modular neural network. It is a modular neural network with a sequential topology composed of multiple modules connected in series. A plurality of layers may be serially connected to the modular neural network. The pre-trained feature extractor (module 2) 102 and the new model (module 3) 103 can be serially connected.

도 8을 참고하면, NASNet-Large 모델을 특징 추출기(801)로 사용하는 것을 나타낸 것이다. NASNet 기반 특징을 추출하고 및 차원을 감소시킬 수 있다. Referring to FIG. 8, it shows the use of the NASNet-Large model as the feature extractor 801. NASNet-based features can be extracted and dimensionality reduced.

NASNet-Large는 이미지넷(ImageNet) 데이터베이스의 백만 개 이상의 이미지에 대해 학습된 컨볼루션 뉴럴 네트워크(Convolutional Neural Network)이다. 네트워크는 이미지를 키보드, 마우스, 연필과 같은 1000개의 개체 범주로 분류할 수 있다. 그 결과 네트워크는 광범위한 이미지에 대한 풍부한 특징 표현을 학습할 수 있다. 네트워크의 이미지 입력 크기는 331 x 331이다. NASNet-Large is a Convolutional Neural Network trained on over a million images from the ImageNet database. The network can classify images into 1000 object categories, such as keyboard, mouse, and pencil. As a result, the network can learn rich feature representations for a wide range of images. The size of the image input to the network is 331 x 331.

NASNet-Large를 사용하면 모델의 정확도가 향상되고 컴퓨터 전력 사용량을 감소시킨다는 이점이 있다. 또한, 차원 축소는 사전 훈련된 모델에 의해 수행될 수 있다.The advantage of using NASNet-Large is improved model accuracy and reduced computer power usage. Also, dimensionality reduction can be performed by a pre-trained model.

NASNet에서 풀링 계층을 제거하고 최대 풀링 계층이 반환하는 대신, 출력은 추출된 특징으로 사용된다. 풀링 계층은 컨볼루션된 특징의 공간 크기를 줄이는 역할을 한다. 이는 차원 축소를 통해 데이터를 처리하는데 필요한 연산력을 감소시키기 위함이다. 또한, 회전 및 위치 불변인 도메인 특징을 추출하여 모델의 효과적인 학습 과정을 유지하는데 유용하다. 풀링에는 최대 풀링 및 평균 풀링을 포함하는 두 가지의 유형이 있다. 최대 풀링은 커널이 커버하는 이미지 부분의 최대값을 반환한다. 반면에, 평균 풀링은 커널에 포함된 이미지 부분의 모든 값의 평균을 반환한다. 미리 지정된 계층(예를 들면, 덴스 계층 또는 풀링 계층)에서 전파를 중지하면, 지정된 계층에서 값을 추출하고, 추출된 값을 특징 벡터로 취급한다. In NASNet, instead of removing the pooling layer and returning the max pooling layer, the output is used as the extracted feature. The pooling layer serves to reduce the spatial size of convoluted features. This is to reduce the computational power required to process data through dimensionality reduction. In addition, it is useful to maintain an effective training process of the model by extracting rotational and positional invariant domain features. There are two types of pooling including max pooling and average pooling. Max pooling returns the maximum value of the portion of the image covered by the kernel. On the other hand, average pooling returns the average of all values of the image part included in the kernel. When propagation is stopped in a pre-specified layer (eg, a dense layer or a pooling layer), a value is extracted from the specified layer, and the extracted value is treated as a feature vector.

NASNet에서 풀링 계층 이전에 전파를 중지하면 네트워크의 마지막 계층이 최대 풀링 계층이 되며 출력 형태는 7 x 7 x 512가 된다. 이러한 볼륨을 특징 벡터로 병합하면 7 x 7 x 512 = 25,088개의 값 목록을 획득할 수 있다. 숫자 목록은 입력 이미지를 수량화하는 데 사용되는 특징 벡터 역할을 수행한다. In NASNet, if we stop propagation before the pooling layer, the last layer in the network will be the maximum pooling layer, and the output shape will be 7 x 7 x 512. Merging these volumes into feature vectors gives us a list of 7 x 7 x 512 = 25,088 values. The list of numbers serves as a feature vector used to quantify the input image.

도 9는 일 실시예에 있어서, 모듈식 뉴럴 네트워크에 구성된 모듈 3을 설명하기 위한 것으로, 추출된 특징 정보를 이미지 분류를 위한 입력 데이터로 사용하는 새로운 모델을 나타낸 것이다. 상세하게는, 모듈 2에서 추출된 특징 정보를 새로운 모델(모듈 3)에 입력받을 수 있다. 새로운 모델을 통해 추출된 특징 정보에 대하여 학습된 새로운 특징 정보가 획득될 수 있다. 획득된 새로운 특징 정보에 대한 이미지 분류를 통해 출력 데이터(예를 들면, 자전거)가 출력될 수 있다. 9 is for explaining module 3 configured in a modular neural network according to an embodiment, and shows a new model using extracted feature information as input data for image classification. In detail, the feature information extracted in module 2 can be input to a new model (module 3). New feature information learned with respect to the feature information extracted through the new model may be obtained. Output data (eg, a bicycle) may be output through image classification of the obtained new feature information.

모듈 3은 새로운 이미지를 인식할 수 있는 분류기를 획득하기 위하여 기계 학습 모델을 학습시킬 수 있다. 특징 벡터가 있으면 확률적 경사하강법(Stochastic Gradient Descent; SGD), 랜덤 포레스트(Random Forest; RF), 서포트 벡터 머신(Support Vector Machine; SVM), 로지스틱 회귀(Logistic Regression; LR), 의사결정 트리(Decision Trees; DT) 등의 기계학습 알고리즘을 학습시킬 수 있다. Module 3 can train a machine learning model to obtain a classifier capable of recognizing new images. With feature vectors, Stochastic Gradient Descent (SGD), Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Decision Trees ( Machine learning algorithms such as Decision Trees (DT) can be trained.

도 10은 일 실시예에 따른 특징 추출 시스템의 구성을 설명하기 위한 블록도이고, 도 11은 일 실시예에 따른 특징 추출 시스템에서 이미지 분류를 위한 특징 추출 방법을 설명하기 위한 흐름도이다. 10 is a block diagram for explaining the configuration of a feature extraction system according to an embodiment, and FIG. 11 is a flowchart for explaining a feature extraction method for image classification in the feature extraction system according to an embodiment.

특징 추출 시스템(100)의 프로세서는 이미지 입력부(1010) 및 특징 정보 추출부(1020)를 포함할 수 있다. 이러한 프로세서의 구성요소들은 특징 추출 시스템에 저장된 프로그램 코드가 제공하는 제어 명령에 따라 프로세서에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 프로세서 및 프로세서의 구성요소들은 도 11의 이미지 분류를 위한 특징 추출 방법이 포함하는 단계들(1110 내지 1120)을 수행하도록 특징 추출 시스템을 제어할 수 있다. 이때, 프로세서 및 프로세서의 구성요소들은 메모리가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다. The processor of the feature extraction system 100 may include an image input unit 1010 and a feature information extraction unit 1020 . Components of the processor may be representations of different functions performed by the processor according to control instructions provided by program codes stored in the feature extraction system. The processor and components of the processor may control the feature extraction system to perform steps 1110 to 1120 included in the feature extraction method for image classification of FIG. 11 . In this case, the processor and components of the processor may be implemented to execute instructions according to the code of an operating system included in the memory and the code of at least one program.

프로세서는 이미지 분류를 위한 특징 추출 방법을 위한 프로그램의 파일에 저장된 프로그램 코드를 메모리에 로딩할 수 있다. 예를 들면, 특징 추출 시스템에서 프로그램이 실행되면, 프로세서는 운영체제의 제어에 따라 프로그램의 파일로부터 프로그램 코드를 메모리에 로딩하도록 특징 추출 시스템을 제어할 수 있다. 이때, 이미지 입력부(1010) 및 특징 정보 추출부(1020) 각각은 메모리에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(1110 내지 1120)을 실행하기 위한 프로세서의 서로 다른 기능적 표현들일 수 있다.The processor may load a program code stored in a file of a program for a feature extraction method for image classification into a memory. For example, when a program is executed in the feature extraction system, the processor may control the feature extraction system to load a program code from a program file into a memory under the control of an operating system. At this time, each of the image input unit 1010 and the feature information extraction unit 1020 executes a command of a corresponding part of the program code loaded into the memory to perform the subsequent steps 1110 to 1120 with different functional expressions of the processor. can be picked up

단계(1110)에서 이미지 입력부(1010)는 모듈의 입력 수와 출력 수에 따라 구성된 복수 개의 모듈을 포함하는 모듈식 뉴럴 네트워크에 입력 이미지를 입력받을 수 있다. 모듈식 뉴럴 네트워크에 입력된 입력 이미지에 대한 노이즈 제거 과정이 수행될 수 있다. 이때, 모듈식 뉴럴 네트워크에 구성된 모듈들 중 이미지 전처리 과정을 수행하는 어느 하나의 모듈을 통해 입력 이미지에 대한 노이즈 제거가 수행될 수 있다.In step 1110, the image input unit 1010 may receive an input image to a modular neural network including a plurality of modules configured according to the number of inputs and outputs of the modules. A denoising process may be performed on an input image input to the modular neural network. At this time, noise removal may be performed on the input image through one of the modules configured in the modular neural network that performs image preprocessing.

단계(1120)에서 특징 정보 추출부(1020)는 모듈식 뉴럴 네트워크를 통해 입력 이미지의 이미지 분류를 위한 특징 정보를 추출할 수 있다. 이때, 모듈식 뉴럴 네트워크에 구성된 모듈들 중 특징 정보를 추출하는 모듈을 통해 사전 훈련된 뉴럴 네트워크에서 특징 정보를 추출하고, 모듈식 뉴럴 네트워크에 구성된 모듈들 중 이미지를 분류하는 모듈을 통해 추출된 특징 정보가 다시 입력 데이터로 사용되어 이미지가 분류될 수 있다. In operation 1120, the feature information extractor 1020 may extract feature information for image classification of the input image through a modular neural network. At this time, feature information is extracted from a pre-trained neural network through a module for extracting feature information among modules configured in the modular neural network, and features extracted through a module for classifying images among modules configured in the modular neural network. The information can be used again as input data to classify the image.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The devices described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. The device can be commanded. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in The software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

In the feature extraction method for image classification,
receiving an input image to a modular neural network including a plurality of modules configured according to the number of inputs and outputs of the modules; and
extracting feature information for image classification of the input image through the modular neural network;
including,
The modular neural network,
A feature extraction method for image classification, characterized in that a plurality of neural network models composed of a plurality of modules are connected in series and learned to extract feature information for image classification.

According to claim 1,
The modular neural network,
A feature extraction method for image classification, characterized in that it is built to forward propagate an input image, stop it in a pre-specified layer, and use a pre-trained neural network as a feature extractor to use the output of the layer as a feature.

According to claim 2,
The modular neural network,
A first module composed of an encoder and a decoder that outputs a denoised image through image preprocessing on an image containing noise; a second module that extracts feature information from a pretrained neural network through transfer learning; A feature extraction method for image classification comprising a third module for classifying an image using feature information as input data.

According to claim 3,
The second module,
A feature extraction method for image classification comprising a architecture of a neural network including a convolution block including a plurality of convolution layers, a maximum pooling layer, and a dropout layer.

According to claim 4,
The second module,
A feature extraction method for image classification, characterized in that the feature information is learned as the noise-removed image passes through a neural network including at least one convolution block and a dense layer.

According to claim 4,
The second module,
A feature extraction method for image classification, characterized in that a new training image is learned through transfer learning while feature information is extracted through a neural network including a plurality of convolution blocks from which dense layers have been removed.

According to claim 6,
The second module,
A feature extraction method for image classification, characterized in that the pooling layer is removed from each of the plurality of convolution blocks from which the dense layer is removed, replaced with a maximum pooling layer, and feature information is extracted through the replaced maximum pooling layer. .

According to claim 2,
The third module,
A feature extraction method for image classification, characterized in that for learning a machine learning model to obtain a classifier capable of recognizing a new image using the feature information output through the second module as input data.

According to claim 8,
The third module,
Stochastic Gradient Descent (SGD), Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Decision Trees (DT) ) A feature extraction method for image classification, characterized in that for learning a machine learning model through any one classification algorithm containing.

A computer program stored in a non-transitory computer readable recording medium to execute the feature extraction method for image classification according to any one of claims 1 to 9 in the feature extraction system.

In the feature extraction system,
an image input unit for receiving an input image to a modular neural network including a plurality of modules configured according to the number of inputs and outputs of the modules; and
A feature information extraction unit extracting feature information for image classification of the input image through the modular neural network.
including,
The modular neural network,
A feature extraction system, characterized in that a plurality of neural network models composed of a plurality of modules are connected in series and learned to extract feature information for image classification.