KR102333545B1

KR102333545B1 - Method for learnig images using convolution neural network and apparatus for executing the method

Info

Publication number: KR102333545B1
Application number: KR1020200051389A
Authority: KR
Inventors: 최규상; 아지만락슨
Original assignee: 영남대학교 산학협력단
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2021-12-01
Also published as: KR20210132908A

Abstract

합성곱 신경망 기반의 영상 학습 방법 및 이를 수행하기 위한 장치가 개시된다. 본 발명의 일 실시예에 따른 합성곱 신경망 기반의 영상 학습 방법 및 이를 수행하기 위한 장치는, 하나 이상의 프로세서들, 및 상기 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 컴퓨팅 장치에서 수행되는 방법으로서, 입력된 영상의 프레임을 정규화하여 영상 데이터를 생성하는 단계; 상기 생성된 영상 데이터에 대해 컨벌루션 레이어(convolution layer)를 통해, 상기 영상 데이터에 대한 특징 맵을 추출하는 단계; 상기 추출된 특징 맵에 대해 풀링 레이어(pooling layer)를 통해, 상기 특징 맵에 대한 서브 샘플링(sub sampling)을 수행하는 단계; 상기 컨벌루션 레이어와 상기 풀링 레이어를 기반으로 상기 영상 데이터에 대한 하나 이상의 최종 특징 맵을 추출하는 단계; 및 상기 추출한 최종 특징 맵 및 기 설정된 활성 함수를 기반으로 상기 영상 데이터에 대한 분류 정보를 생성하여 출력하는 단계를 포함한다.Disclosed are an image learning method based on a convolutional neural network and an apparatus for performing the same. A method for image learning based on a convolutional neural network and an apparatus for performing the same according to an embodiment of the present invention include one or more processors, and a computing having a memory for storing one or more programs executed by the one or more processors A method performed by an apparatus, the method comprising: generating image data by normalizing a frame of an input image; extracting a feature map for the image data through a convolution layer with respect to the generated image data; performing sub-sampling on the feature map through a pooling layer on the extracted feature map; extracting one or more final feature maps for the image data based on the convolutional layer and the pooling layer; and generating and outputting classification information for the image data based on the extracted final feature map and a preset activation function.

Description

Image learning method based on convolutional neural network and apparatus for performing the same

본 발명의 실시예들은 합성곱 신경망 기반의 영상 학습 기술과 관련된다.Embodiments of the present invention relate to an image learning technique based on a convolutional neural network.

최근 다양한 종류와 방대한 양의 영상 데이터들이 생성되고 있고, 이러한 영상 데이터를 자동화하여 처리하기 위한 다양한 기술들이 개발되고 있다. 특히 인공 신경망(Neural network) 기술이 발전하면서, 이를 이용하여 자동으로 영상 데이터를 학습하고 분류하는 처리등이 산업계에 적용되고 있다.Recently, various types and vast amounts of image data are being generated, and various technologies for automating and processing such image data are being developed. In particular, as artificial neural network technology develops, processing for automatically learning and classifying image data using this technology is being applied to the industry.

이러한 인공 신경망 기술중 비교적 널리 사용되고 있는 합성곱 신경망(Convoltuional Neural Network, CNN) 기술의 경우, 하나 또는 여러 개의 콘볼루션 계층(convolutional layer)과 풀링 계층(pooling layer), 완전하게 연결된 계층(fully connected layer)들로 구성된 구조를 가지며, 2차원 입력 데이터인 영상과 음성 분석에 주로 사용되고 있는 것으로 알려져 있다.In the case of a convolutional neural network (CNN) technology, which is relatively widely used among these artificial neural network technologies, one or several convolutional layers, a pooling layer, and a fully connected layer ), and is known to be mainly used for video and audio analysis, which are two-dimensional input data.

그러나, 인공 신경망을 이용한 영상 인식 기술은 공간 정도 또는 시간 정보를 동시에 학습할 수 없는 한계가 있었다. 또한, 영상 인식 기술은 사람, 사물을 포함한 객체간의 관계를 분석함에 있어서 시간의 흐름을 고려하기 어려운 문제가 있다.However, the image recognition technology using an artificial neural network has a limitation in that it cannot simultaneously learn spatial or temporal information. In addition, the image recognition technology has a problem in that it is difficult to consider the passage of time in analyzing the relationship between objects including people and objects.

대한민국 등록특허공보 제10-1735365호 (2017.05.16.)Republic of Korea Patent Publication No. 10-1735365 (2017.05.16.)

본 발명의 실시예들은 합성곱 신경망을 이용하여 동영상 내 객체의 행동을 분석하기 위한 것이다.Embodiments of the present invention are for analyzing the behavior of an object in a video using a convolutional neural network.

개시되는 일 실시예에 따른 영상 학습 방법은, 하나 이상의 프로세서들, 및 상기 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 컴퓨팅 장치에서 수행되는 방법으로서, 입력된 영상의 프레임을 정규화하여 영상 데이터를 생성하는 단계; 상기 생성된 영상 데이터에 대해 컨벌루션 레이어(convolution layer)를 통해, 상기 영상 데이터에 대한 특징 맵을 추출하는 단계; 상기 추출된 특징 맵에 대해 풀링 레이어(pooling layer)를 통해, 상기 특징 맵에 대한 서브 샘플링(sub sampling)을 수행하는 단계; 상기 컨벌루션 레이어와 상기 풀링 레이어를 기반으로 상기 영상 데이터에 대한 하나 이상의 최종 특징 맵을 추출하는 단계; 및 상기 추출한 최종 특징 맵 및 기 설정된 활성 함수를 기반으로 상기 영상 데이터에 대한 분류 정보를 생성하여 출력하는 단계를 포함한다.An image learning method according to an embodiment disclosed herein is a method performed in a computing device having one or more processors and a memory for storing one or more programs executed by the one or more processors, wherein the frame of an input image is generating image data by normalizing extracting a feature map for the image data through a convolution layer with respect to the generated image data; performing sub-sampling on the feature map through a pooling layer on the extracted feature map; extracting one or more final feature maps for the image data based on the convolutional layer and the pooling layer; and generating and outputting classification information for the image data based on the extracted final feature map and a preset activation function.

상기 영상 데이터를 생성하는 단계는, 상기 입력된 영상에 포함된 각 영상 크기를 추출하고, 상기 추출된 영상 크기를 각각 동일한 픽셀로 크기를 조정하는 단계; 및 상기 동일한 크기로 조정된 영상 프레임을 정규화하여 상기 영상 데이터로 생성하는 단계를 포함할 수 있다.The generating of the image data may include: extracting each image size included in the input image, and resizing the extracted image size to the same pixel; and normalizing the image frame adjusted to the same size and generating the image data as the image data.

상기 특징 맵을 추출하는 단계는, 제1 필터를 상기 영상 데이터의 각 픽셀에 순차적으로 적용하여 상기 영상 데이터의 각 픽셀 값을 상기 픽셀 내의 가중치에 따라 재설정하는 단계를 포함할 수 있다.The extracting of the feature map may include sequentially applying a first filter to each pixel of the image data to reset each pixel value of the image data according to a weight in the pixel.

상기 특징 맵을 추출하는 단계는, 비선형 활성화 함수를 이용하여 상기 복수 개의 특징 맵에 대하여 변환 작업을 수행하는 단계를 포함할 수 있다.The extracting of the feature map may include performing a transformation operation on the plurality of feature maps using a nonlinear activation function.

상기 서브 샘플링을 수행하는 단계는, 상기 추출된 특징 맵에 대해 복수의 풀링 레이어가 포함된 풀링 박스를 통해 산출된 각 풀링 결과를 연결 레이어를 통하여 연결(Concatenation)하는 연산을 수행하는 단계를 포함할 수 있다.The performing of the subsampling may include performing an operation of concatenating each pooling result calculated through a pooling box including a plurality of pooling layers with respect to the extracted feature map through a concatenated layer. can

상기 서브 샘플링을 수행하는 단계는, 상기 추출된 특징 맵에 대해 제1 풀링 레이어, 제2 풀링 레이어 및 제3 풀링 레이어를 통해 제1 풀링, 제2 풀링 및 제3 풀링을 순차적으로 수행하는 단계를 포함하며, 상기 제1 풀링 레이어, 상기 제2 풀링 레이어 및 상기 제3 풀링 레이어는, 동일한 풀링 방법을 사용하되, 서로 다른 크기의 제2 필터를 가질 수 있다.The performing of the subsampling includes sequentially performing the first pooling, the second pooling, and the third pooling through the first pooling layer, the second pooling layer, and the third pooling layer on the extracted feature map. The first pooling layer, the second pooling layer, and the third pooling layer may use the same pooling method, but may have second filters of different sizes.

상기 서브 샘플링을 수행하는 단계는, 상기 서로 다른 크기의 제2 필터의 크기를 가지는 상기 제1 풀링 레이어, 상기 제2 풀링 레이어 및 상기 제3 풀링 레이어에서 동일한 크기의 풀링 결과가 산출되도록, 각각 입력되는 상기 특징 맵의 패딩 폭을 조절하는 단계; 및 상기 동일한 크기로 산출된 각 풀링 결과를 채널 방향으로 추가하는 연결 연산을 수행하는 단계를 포함할 수 있다.The performing of the sub-sampling may include inputting the first pooling layer, the second pooling layer, and the third pooling layer having the different sizes of the second filters to yield pooling results of the same size. adjusting a padding width of the feature map; and performing a concatenation operation of adding each pooling result calculated with the same size in a channel direction.

상기 서브 샘플링을 수행하는 단계는, 상기 추출된 특징 맵에 대해 최대 풀링 레이어 및 평균 풀링 레이어를 통해 최대 풀링 및 평균 풀링을 동시에 수행하는 단계를 포함하며, 상기 최대 풀링 레이어 및 상기 평균 풀링 레이어는, 동일한 크기의 제2 필터를 가질 수 있다.The performing of the subsampling includes simultaneously performing maximum pooling and average pooling through a maximum pooling layer and an average pooling layer on the extracted feature map, wherein the maximum pooling layer and the average pooling layer are The second filter may have the same size.

상기 서브 샘플링을 수행하는 단계는, 상기 동일한 크기의 제2 필터를 가지는 최대 풀링 레이어 및 평균 풀링 레이어에서 동일한 크기로 산출된 각 풀링 결과를 채널 방향으로 추가하는 연결 연산을 수행하는 단계를 포함할 수 있다.The performing of the sub-sampling may include performing a concatenation operation of adding each pooling result calculated with the same size in the maximum pooling layer and the average pooling layer having the second filter of the same size in the channel direction. have.

상기 분류 정보를 생성하여 출력하는 단계는, 완전 연결 레이어를 통해 상기 특징 맵들의 가중치합의 비선형 변환을 통해 상기 영상 데이터의 분류 정보를 생성하는 단계를 포함할 수 있다.The generating and outputting the classification information may include generating the classification information of the image data through a nonlinear transformation of a weighted sum of the feature maps through a fully connected layer.

본 발명의 실시예들에 따르면, 합성곱 신경망을 이용하여 동영상 내 객체의 행동을 분석할 수 있다.According to embodiments of the present invention, a behavior of an object in a video may be analyzed using a convolutional neural network.

또한, 본 발명의 실시예들에 따르면, 일반적인 3차원 합성곱 신경망 방법에 비하여 더 적은 파라미터 수를 가짐으로써, 동영상을 더욱 효율적으로 학습하고 영상 학습할 수 있는 효과가 있다.In addition, according to embodiments of the present invention, by having a smaller number of parameters compared to a general 3D convolutional neural network method, there is an effect of more efficiently learning a video and image learning.

또한, 본 발명의 실시예들에 따르면, 파라미터 수가 매우 많기 때문에 메모리를 많이 차지하며 학습할 때 오랜 시간이 걸리고 학습된 모델을 사용할 때 계산 시간이 긴 일반적인 3차원 합성곱 신경망 모델의 문제점을 해결할 수 있다.In addition, according to the embodiments of the present invention, since the number of parameters is very large, it occupies a lot of memory, takes a long time to learn, and can solve the problems of a general 3D convolutional neural network model that takes a long time to calculate when using the trained model. have.

도 1는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도
도 2는 본 발명의 일 실시예에 따른 합성공 신경망(CNN)의 동작을 설명하기 위한 블록도
도 3은 본 발명의 일 실시예에 따른 영상 학습 방법을 설명하기 위한 흐름도
도 4는 본 발명의 일 실시예에 따른 풀링 블록에서 수행되는 최대 풀링과 평균 풀링을 설명하기 위한 도면
도 5는 본 발명의 일 실시예에 따른 풀링 블록의 동작을 설명하기 위한 블록도
도 6은 본 발명의 다른 실시예에 따른 풀링 블록의 동작을 설명하기 위한 블록도1 is a block diagram illustrating and describing a computing environment including a computing device suitable for use in example embodiments;
2 is a block diagram illustrating the operation of a synthetic neural network (CNN) according to an embodiment of the present invention.
3 is a flowchart illustrating an image learning method according to an embodiment of the present invention;
4 is a diagram for explaining maximum pooling and average pooling performed in a pooling block according to an embodiment of the present invention;
5 is a block diagram illustrating an operation of a pooling block according to an embodiment of the present invention;
6 is a block diagram illustrating an operation of a pooling block according to another embodiment of the present invention;

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to provide a comprehensive understanding of the methods, apparatus, and/or systems described herein. However, this is merely an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안된다.In describing the embodiments of the present invention, if it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification. The terminology used in the detailed description is for the purpose of describing embodiments of the present invention only, and should in no way be limiting. Unless explicitly used otherwise, expressions in the singular include the meaning of the plural. In this description, expressions such as “comprising” or “comprising” are intended to indicate certain features, numbers, steps, acts, elements, some or a combination thereof, one or more other than those described. It should not be construed to exclude the presence or possibility of other features, numbers, steps, acts, elements, or any part or combination thereof.

도 1는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술되는 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.1 is a block diagram illustrating and describing a computing environment 10 including a computing device suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities other than those described below, and may include additional components in addition to those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 본 발명의 실시예에 따른 영상 학습을 수행하기 위한 장치일 수 있다. The illustrated computing environment 10 includes a computing device 12 . In one embodiment, the computing device 12 may be a device for performing image learning according to an embodiment of the present invention.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 12 includes at least one processor 14 , computer readable storage medium 16 , and communication bus 18 . The processor 14 may cause the computing device 12 to operate in accordance with the exemplary embodiments discussed above. For example, the processor 14 may execute one or more programs stored in the computer-readable storage medium 16 . The one or more programs may include one or more computer-executable instructions that, when executed by the processor 14, configure the computing device 12 to perform operations in accordance with the exemplary embodiment. can be

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 20 stored in the computer readable storage medium 16 includes a set of instructions executable by the processor 14 . In one embodiment, computer-readable storage medium 16 includes memory (volatile memory, such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, other forms of storage medium accessed by computing device 12 and capable of storing desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.Communication bus 18 interconnects various other components of computing device 12 , including processor 14 and computer readable storage medium 16 .

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide interfaces for one or more input/output devices 24 . The input/output interface 22 and the network communication interface 26 are coupled to the communication bus 18 . Input/output device 24 may be coupled to other components of computing device 12 via input/output interface 22 . Exemplary input/output device 24 may include a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touchscreen), a voice or sound input device, various types of sensor devices, and/or imaging devices. input devices, and/or output devices such as display devices, printers, speakers and/or network cards. The exemplary input/output device 24 may be included in the computing device 12 as a component constituting the computing device 12 , and may be connected to the computing device 12 as a separate device distinct from the computing device 12 . may be

도 2는 본 발명의 일 실시예에 따른 합성곱 신경망(CNN)의 동작을 설명하기 위한 블록도이다.2 is a block diagram illustrating an operation of a convolutional neural network (CNN) according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 합성곱 신경망은 입력 레이어(100), 컨벌루션 레이어(210), 풀링 레이어(220), 완전 연결 레이어(300) 및 출력 레이어(400)를 포함할 수 있다. 합성곱 신경망은 입력된 영상 데이터에 대해 컨벌루션 레이어(210)와 풀링 레이어(220)를 통해, 입력된 영상 데이터에 대한 특징 맵(feature map)을 추출하고, 특징 맵을 통해 입력된 영상 데이터를 식별 또는 분류(classification)한다. 본 발명의 일 실시예에 따른 합성곱 신경망에 의해 동작하는 영상 학습 방법에 대해서는 아래에서 자세하게 설명한다.A convolutional neural network according to an embodiment of the present invention may include an input layer 100 , a convolutional layer 210 , a pooling layer 220 , a fully connected layer 300 , and an output layer 400 . The convolutional neural network extracts a feature map for the input image data through the convolution layer 210 and the pooling layer 220 with respect to the input image data, and identifies the input image data through the feature map Or classify. An image learning method operated by a convolutional neural network according to an embodiment of the present invention will be described in detail below.

도 3은 본 발명의 일 실시예에 따른 영상 학습 방법을 설명하기 위한 흐름도이다. 전술한 바와 같이, 본 발명의 일 실시예에 따른 영상 학습 방법은 하나 이상의 프로세서들, 및 상기 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 컴퓨팅 장치(12)에서 수행될 수 있다. 이를 위하여, 상기 영상 학습 방법은 하나 이상의 컴퓨터 실행 가능 명령어를 포함하는 프로그램 내지 소프트웨어의 형태로 구현되어 상기 메모리상에 저장될 수 있다. 3 is a flowchart illustrating an image learning method according to an embodiment of the present invention. As described above, the image learning method according to an embodiment of the present invention may be performed in a computing device 12 having one or more processors, and a memory for storing one or more programs executed by the one or more processors. can To this end, the image learning method may be implemented in the form of a program or software including one or more computer-executable instructions and stored in the memory.

또한, 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.In addition, in the illustrated flowchart, the method is described by dividing the method into a plurality of steps, but at least some of the steps are performed in a different order, are performed in combination with other steps, are omitted, are performed in separate steps, or are shown One or more steps not included may be added and performed.

한편, 본 발명의 일 실시예에 따른 영상 학습 방법에 입력되는 영상은 촬영된 대상 또는 대상의 활동 분야에 해당되는 영상일 수 있으나, 이에 한정되는 것은 아니다.Meanwhile, an image input to the image learning method according to an embodiment of the present invention may be an image corresponding to a photographed object or an activity field of the object, but is not limited thereto.

단계 302에서, 컴퓨팅 장치(12)는 입력된 영상의 프레임을 정규화하여 영상 데이터로 생성한다.In step 302, the computing device 12 normalizes the frame of the input image and generates it as image data.

구체적으로, 추출된 영상 프레임을 각각 동일한 크기(예를 들어, 224픽셀 *224픽셀)로 재조정 및 크기 조정을 한다. 컴퓨팅 장치(12)는 추출된 영상 프레임을 재조정 및 크기 조정하기 위하여 크기 조정 알고리즘을 사용할 수 있으며, 예를 들어, 다운 샘플링 알고리즘이 사용될 수 있다. 이에 추출된 영상 프레임은 각 영상 프레임의 종횡비가 유지되도록 다운 샘플링 알고리즘을 사용하여 영상 정보의 손실을 최소화할 수 있다.Specifically, each of the extracted image frames is readjusted and resized to the same size (eg, 224 pixels * 224 pixels). The computing device 12 may use a resizing algorithm to rescale and resize the extracted image frame, for example, a downsampling algorithm may be used. The extracted image frame may minimize loss of image information by using a down-sampling algorithm so that the aspect ratio of each image frame is maintained.

이후, 컴퓨팅 장치(12)는 동일한 크기로 조정된 영상 프레임을 정규화하여 영상 데이터로 생성한다. 정규화는 모든 입력 차원이 비슷한 균일 분포를 공유하기 위한 것으로, 평균-표준 편차 정규화 방법을 사용할 수 있다. 컴퓨팅 장치(12)는 평균-표준 편차 정규화 방법을 하기 위하여 평균 및 표준 편차 추정 알고리즘이 사용될 수 있다.Thereafter, the computing device 12 normalizes the image frame adjusted to the same size and generates the image data as image data. Normalization is for all input dimensions to share a similar uniform distribution, and the mean-standard deviation normalization method can be used. The computing device 12 may use a mean and standard deviation estimation algorithm to perform a mean-standard deviation normalization method.

단계 304에서, 컴퓨팅 장치(12)는 영상 데이터에 대해 컨벌루션 레이어(210)(convolution layer)를 통해, 입력된 영상 데이터에 대한 특징 맵(feature map)을 추출한다. 특징 맵은 영상 데이터에 대한 특징 정보를 포함한다. 특징 맵 추출을 위하여 컨벌루션 레이어(210)와 풀링 레이어(220)(pooling layer)가 반복되며, 반복 횟수는 실시예에 따라서 다양하게 결정될 수 있다.In operation 304, the computing device 12 extracts a feature map for the input image data through a convolution layer 210 for the image data. The feature map includes feature information on image data. In order to extract the feature map, the convolutional layer 210 and the pooling layer 220 are repeated, and the number of repetitions may be variously determined according to embodiments.

구체적으로, 컨벌루션에 사용되는 제1 필터(filter)의 크기가 결정되면, 제1 필터의 각 픽셀별로 할당된 가중치와 영상 데이터의 픽셀값의 가중치 합(weighted sum)을 통해 컨벌루션이 수행된다. 컨벌루션의 수행 결과 영상 데이터에 대한 특징 맵이 생성되게 된다. 즉, 컴퓨팅 장치(12)는 하나의 영상 데이터에 서로 다른 복수 개의 제1 필터를 각각 적용하여 복수 개의 특징 맵을 생성할 수 있다. 컴퓨팅 장치(12)는 제1 필터를 영상 데이터의 각 픽셀에 순차적으로 적용(즉, 제1 필터를 픽셀 단위로 이동하면서 적용)하여 영상 데이터의 각 픽셀 값을 제1 필터의 가중치에 따라 재설정하여 특징 맵을 생성할 수 있다.Specifically, when the size of a first filter used for convolution is determined, convolution is performed through a weighted sum of a weight assigned to each pixel of the first filter and a pixel value of image data. As a result of the convolution, a feature map for the image data is generated. That is, the computing device 12 may generate a plurality of feature maps by applying a plurality of different first filters to one image data, respectively. The computing device 12 sequentially applies the first filter to each pixel of the image data (that is, applies the first filter while moving it pixel by pixel) to reset each pixel value of the image data according to the weight of the first filter, You can create a feature map.

이 때, 컴퓨팅 장치(12)는 특징 맵의 크기가 영상 데이터에 비해 작아지지 않도록 하기 위해, 영상 데이터의 테두리를 따라 패딩(padding)(예를 들어, 제로 패딩(zero padding)을 수행한 후 제1 필터를 각 픽셀에 순차적으로 적용시킬 수 있다. 예를 들어, 컴퓨팅 장치(12)는 영상 데이터의 테두리를 따라 제로(0)의 값을 갖는 픽셀들을 추가하는 패딩 작업을 수행한 후, 제1 필터를 각 픽셀에 순차적으로 적용시킬 수 있다.In this case, the computing device 12 performs padding (eg, zero padding) along the edge of the image data to prevent the size of the feature map from becoming smaller than the image data, and then Filter 1 may be sequentially applied to each pixel For example, the computing device 12 performs a padding operation of adding pixels having a value of zero along the edge of the image data, and then Filters can be applied sequentially to each pixel.

여기서, 제1 필터는 3x3의 크기일 수 있으나, 이에 한정되는 것은 아니다. 제1 필터는 3x3의 각 픽셀에 가중치(예를 들어, 0 또는 1 등)가 부여될 수 있다. 컨벌루션을 수행하는 방법으로는 예를 들어, 컴퓨팅 장치(12)는 영상 데이터의 소정 픽셀에 대해 해당 픽셀의 주변에 있는 픽셀들의 픽셀 값과 제1 필터의 각 가중치를 곱한 후 그 총합을 해당 픽셀의 새로운 픽셀 값으로 재설정할 수 있다. 그러나, 이에 한정되는 것은 아니며, 컴퓨팅 장치(12)는 영상 데이터의 소정 픽셀에 대해 제1 필터에 포함되는 주변 픽셀들의 픽셀 값 중 최대값을 해당 픽셀의 새로운 픽셀 값으로 재설정할 수도 있다.Here, the first filter may have a size of 3x3, but is not limited thereto. In the first filter, a weight (eg, 0 or 1) may be assigned to each pixel of 3x3. As a method of performing the convolution, for example, the computing device 12 multiplies a pixel value of pixels in the vicinity of the pixel with each weight of the first filter with respect to a predetermined pixel of image data, and then adds the sum to the pixel value of the pixel. It can be reset to a new pixel value. However, the present invention is not limited thereto, and the computing device 12 may reset the maximum value among the pixel values of neighboring pixels included in the first filter to a new pixel value of the predetermined pixel of the image data.

컴퓨팅 장치(12)는 3x3의 각 픽셀 가중치가 서로 다르게 부여된 복수 개의 제1 필터를 하나의 영상 데이터에 각각 적용하여 복수 개의 특징 맵을 생성할 수 있다. 이 과정을 통해 영상 데이터에서 강인한 특징을 추출할 수 있게 된다.The computing device 12 may generate a plurality of feature maps by applying a plurality of first filters to which each pixel weight of 3x3 is different from each other to one image data. Through this process, robust features can be extracted from the image data.

한편, 컴퓨팅 장치(12)는 컨벌루션 레이어(210)를 통해 생성된 특징 맵에 대하여 비선형 활성화 함수를 이용하여 변환 작업을 수행할 수 있다. 이 때, 컴퓨팅 장치(12)는 ReLU(Rectified linear Unit) 활성화 함수를 이용하여 특징 맵에 대하여 변환 작업을 수행할 수 있다. RuLU 함수는 0보다 작은 값은 0으로 출력하고, 0보다 큰 값은 입력값 그대로 출력하는 함수이다. 여기서, 활성화 함수란 뉴런들의 신호 세기를 재조정하기 위해 필요한 것으로, 활성화 함수로 ReLU(Rectified Linear Unit) 함수가 사용될 수 있으나, 이에 한정되는 것은 아니다. Meanwhile, the computing device 12 may perform a transformation operation on the feature map generated through the convolutional layer 210 using a non-linear activation function. In this case, the computing device 12 may perform a transformation operation on the feature map using a Rectified Linear Unit (ReLU) activation function. The RuLU function is a function that outputs a value less than 0 as 0, and outputs a value greater than 0 as an input value. Here, the activation function is necessary to readjust the signal strength of neurons, and a Rectified Linear Unit (ReLU) function may be used as the activation function, but is not limited thereto.

단계 306에서, 컴퓨팅 장치(12)는 특징 맵에 대해 풀링 레이어(220)(pooling layer)를 통해, 특징 맵에 대한 서브 샘플링(sub sampling)을 수행할 수 있다. 여기서, 풀링 레이어(220)는 특징 맵의 크기를 줄이는 방식으로 적용될 수 있다.In operation 306 , the computing device 12 may sub-sampling the feature map through a pooling layer 220 . Here, the pooling layer 220 may be applied in a manner that reduces the size of the feature map.

구체적으로, 풀링에 사용되는 제2 필터의 크기가 결정되면, 제2 필터의 크기에 포함된 영상 데이터의 픽셀 값을 이용하여 하나의 픽셀 값을 추출할 수 있다. 즉, 컴퓨팅 장치(12)는 특징 맵에서 제2 필터가 겹치지 않게 이동(즉, 특징 맵에서 제2 필터 크기의 단위로 이동)하면서 제2 필터에 포함된 특징 맵의 픽셀 값을 이용하여 하나의 픽셀 값을 추출할 수 있다. 예를 들어, 풀링 레이어(220)는 최대 풀링(Max pooling), 최소 풀링(Min pooling) 또는 평균 풀링(Average pooling)을 이용할 수 있다. 최대 풀링은 제2 필터에 포함된 특징 맵의 픽셀들의 픽셀 값 중 최대값을 추출할 수 있다. 최소 풀링은 제2 필터에 포함된 특징 맵의 픽셀들의 픽셀 값 중 최소값을 추출할 수 있다. 평균 풀링은 제2 필터에 포함된 특징 맵의 픽셀들의 픽셀 값들의 평균값을 추출할 수 있다. Specifically, when the size of the second filter used for pooling is determined, one pixel value may be extracted using pixel values of image data included in the size of the second filter. That is, the computing device 12 moves the second filter non-overlapping in the feature map (that is, moves in a unit of the size of the second filter in the feature map) using the pixel value of the feature map included in the second filter to generate one You can extract pixel values. For example, the pooling layer 220 may use Max pooling, Min pooling, or Average pooling. The maximum pooling may extract a maximum value among pixel values of pixels of the feature map included in the second filter. The minimum pooling may extract a minimum value among pixel values of pixels of the feature map included in the second filter. The average pooling may extract an average value of pixel values of pixels of the feature map included in the second filter.

도 4는 본 발명의 일 실시예에 따른 풀링 블록에서 수행되는 최대 풀링과 평균 풀링을 설명하기 위한 도면이다. 4 is a diagram for explaining maximum pooling and average pooling performed in a pooling block according to an embodiment of the present invention.

최대 풀링은 영역별로 가장 중요한 특징을 추출하는 반면, 평균 풀링은 영역의 모든 값들을 고려하여 평균 값을 내므로 매우 부드러운 특징을 추출한다. 그러므로 최대 풀링이 가장자리(edge)나 질감(texture)와 같은 이미지의 구조를 포착하는데 유용하므로 영상 인식 분야에서 많이 사용되어진다. 합성곱 신경망에서 많이 사용되는 최대 풀링(Max pooling)은 입력 레이어의 각 픽셀에 일정한 크기의 필터를 적용하여 해당 픽셀 내의 갑 중 가장 큰 값을 추출한다. 평균 풀링(Average pooling)은 해당 필터 내의 픽셀 값들의 평균값을 추출한다. 도시된 바와 같이, 풀링의 방법에 따라 풀링 레이어의 결과는 다르다. 예를 들어, 1/4 크기로 압축되는 2x2인 제2 필터를 4X4 크기인 특징 맵에 최대 풀링을 적용할 경우, 서로 다르게 표시된 2X2 영역별로 25, 9, 41 및 3이 출력되는 반면, 평균 풀링을 적용할 경우, 서로 다르게 표시된 2X2 영역별로 14, 5, 20 및 1이 출력됨을 확인할 수 있다. 풀링 연산은 필터 값과 곱해지는 것이 아니기 때문에 학습해야할 매개변수가 없으며 입력 수만큼 그대로 출력하기 때문에 채널의 변화도 없으며, 이러한 풀링 연산을 통해 계산해야 할 변수량을 줄여줄 수 있게 된다.While max pooling extracts the most important features for each region, average pooling takes into account all values in the region and produces an average value, so very smooth features are extracted. Therefore, max pooling is useful in capturing image structures such as edges and textures, and is therefore widely used in image recognition. Max pooling, which is often used in convolutional neural networks, applies a filter of a certain size to each pixel of the input layer and extracts the largest value among the values in the pixel. Average pooling extracts an average value of pixel values in a corresponding filter. As shown, the result of the pooling layer is different according to the method of pooling. For example, when maximum pooling is applied to a 4X4 feature map with a 2x2 second filter compressed to a size of 1/4, 25, 9, 41, and 3 are output for each 2X2 region marked differently, whereas average pooling is applied. When , it can be seen that 14, 5, 20, and 1 are output for each 2X2 area displayed differently. Since the pooling operation is not multiplied by the filter value, there are no parameters to learn, and there is no change in the channel because the number of inputs is output as it is.

한편, 일반적으로 합성곱 신경망에서 풀링 레이어(220)는 하나의 풀링 방법과 하나의 필터 크기만 사용한다. 최대 풀링, 최소 풀링 또는 평균 풀링 중 하나를 선택하고 필터 크기를 지정한다. 이 경우, 합성곱 모델에서 평균 풀링이 적용되는 경우 부드럽게 다운 샘플링된 출력을 생성하기 때문에, 날카로운 기능을 감지해야 하는 경우에는 평균 풀링의 성능이 저하될 수 있다.Meanwhile, in general, in a convolutional neural network, the pooling layer 220 uses only one pooling method and one filter size. Select either max pooling, min pooling, or average pooling and specify the filter size. In this case, the performance of average pooling can be degraded when sharp features need to be detected, since it produces a smoothly downsampled output when applied to average pooling in a convolutional model.

이에, 본 발명에서는 복수의 풀링 레이어를 포함하는 풀링 블록을 이용하여 영상 데이터에 가장 적합한 풀링 레이어를 적용할 수 있다. 풀링 블록에 대해서는 아래에서 자세하게 설명한다.Accordingly, in the present invention, a pooling layer most suitable for image data may be applied using a pooling block including a plurality of pooling layers. Pooling blocks will be described in detail below.

도 5는 본 발명의 일 실시예에 따른 풀링 블록(500)의 동작을 설명하기 위한 블록도이다.5 is a block diagram for explaining the operation of the pooling block 500 according to an embodiment of the present invention.

도 5를 참조하면, 컴퓨팅 장치(12)는 특징 맵에 대해 복수의 풀링 레이어(510, 520, 530)가 포함된 풀링 블록(500)을 통해, 특징 맵에 대한 서브 샘플링을 수행할 수 있다.Referring to FIG. 5 , the computing device 12 may perform subsampling on the feature map through a pooling block 500 including a plurality of pooling layers 510 , 520 , and 530 with respect to the feature map.

구체적으로, 특징 맵에 대하여 제1 풀링 레이어(510), 제2 풀링 레이어(520) 및 제3 풀링 레이어(530)를 통해 제1 풀링, 제2 풀링 및 제3 풀링을 순차적으로 수행할 수 있다. 여기서, 제1 풀링, 제2 풀링 및 제3 풀링은 동일한 풀링 방법(예를 들어, 최대 풀링, 평균 풀링, 최소 풀링 등)을 사용할 수 있으며, 각각 다른 필터 크기를 가질 수 있다. 예를 들어, 제1 풀링은 1X1 사이즈의 최대 풀링일 수 있으며, 제2 풀링은 3X3 사이즈의 최대 풀링일 수 있으며, 제3 풀링은 5X5 사이즈의 최대 풀링일 수 있다. 컴퓨팅 장치(12)는 제1 풀링 레이어(510), 제2 풀링 레이어(520) 및 제3 풀링 레이어(530)를 통해 산출된 각 풀링 결과를 연결 레이어(540)(Concatenation layer)를 통해 연결(Concatenation)하는 연산을 수행할 수 있다. 이를 통해, 서로 분리된 레이어의 출력 데이터를 하나의 단일 레이어를 통해 취합할 수 있다. Specifically, the first pooling, the second pooling, and the third pooling may be sequentially performed on the feature map through the first pooling layer 510 , the second pooling layer 520 , and the third pooling layer 530 . . Here, the first pooling, the second pooling, and the third pooling may use the same pooling method (eg, maximum pooling, average pooling, minimum pooling, etc.), and may each have different filter sizes. For example, the first pool may be a maximum pool of 1X1 size, the second pool may be a maximum pool of 3X3 size, and the third pool may be a maximum pool of 5X5 size. The computing device 12 connects each pooling result calculated through the first pooling layer 510, the second pooling layer 520, and the third pooling layer 530 through the concatenation layer 540 (Concatenation layer) ( Concatenation) can be performed. Through this, the output data of the layers separated from each other can be collected through one single layer.

예시적인 실시예에서, 연결 레이어(540)(Concatenation layer)는 풀링 블록(500)을 통하여 각각 산출된 풀링 결과를 하나의 레이어로 합쳐 확장할 수 있다. 이 때, 컴퓨팅 장치(12)는 서로 다른 필터 크기를 가지는 제1 풀링, 제2 풀링 및 제3 풀링에서 동일한 크기의 출력 데이터가 출력되도록, 각각 입력되는 특징 맵의 테두리를 따라 패딩(padding)(예를 들어, 제로 패딩(zero padding)을 수행할 수 있다. 컴퓨팅 장치(12)는 특징 맵이 풀링 레이어를 통하여 출력되는 출력 데이터의 크기는 하기 수학식을 통해 산출할 수 있다.In an exemplary embodiment, the concatenation layer 540 may expand the pooling results calculated through the pooling block 500 by merging them into one layer. At this time, the computing device 12 performs padding (padding) along the edges of the input feature maps so that output data of the same size is output from the first pooling, the second pooling, and the third pooling having different filter sizes. For example, zero padding may be performed The computing device 12 may calculate the size of output data from which the feature map is output through the pooling layer through the following equation.

(수학식)(Equation)

Hi, Wi : 입력된 특징 맵의 크기(H X W)Hi, Wi: Size of the input feature map (H X W)

Ho, Wo : 출력 데이터의 크기(H X W)Ho, Wo : Size of output data (H X W)

K : 제2 필터의 크기(K X K)K: the size of the second filter (K X K)

P : 패딩 폭P: padding width

S : 스트라이드S: stride

즉, 상기 수학식 1에 따라 서로 다른 필터 크기를 가지는 제1 풀링(1X1), 제2 풀링(3X3) 및 제3 풀링(5X5)에서 각각 입력되는 특징 맵의 패딩 폭을 조절(특징 맵의 테두리를 따라 제로(0)의 값을 갖는 픽셀 수를 조절) 하여 동일한 크기의 출력 데이터가 출력되도록 할 수 있다. 컴퓨팅 장치(12)는 동일한 크기로 출력된 출력 데이터를 채널 방향으로 추가(채널 축으로 쌓음)하는 연결 연산을 수행하여 서로 분리된 레이어의 출력 데이터를 하나의 단일 레이어를 통해 취합할 수 있다.That is, according to Equation 1, the first pooling (1X1), the second pooling (3X3), and the third pooling (5X5) each having different filter sizes adjust the padding width of the input feature map (the border of the feature map) to adjust the number of pixels having a value of zero (0)) so that output data of the same size can be output. The computing device 12 may collect output data of layers separated from each other through one single layer by performing a concatenation operation of adding output data output with the same size in the channel direction (stacking along the channel axis).

도 6은 본 발명의 다른 실시예에 따른 풀링 블록(600)의 동작을 설명하기 위한 블록도이다.6 is a block diagram for explaining the operation of the pooling block 600 according to another embodiment of the present invention.

도 6을 참조하면, 컴퓨팅 장치(12)는 특징 맵에 대해 복수의 풀링 레이어(610, 620)가 포함된 풀링 블록(600)을 통해, 특징 맵에 대한 서브 샘플링을 수행할 수 있다. Referring to FIG. 6 , the computing device 12 may perform subsampling on the feature map through a pooling block 600 including a plurality of pooling layers 610 and 620 with respect to the feature map.

구체적으로, 특징 맵에 대하여 최대 풀링 레이어(610)와 평균 풀링 레이어(620)를 통해 최대 풀링 및 평균 풀링을 병렬적으로 동시에 수행할 수 있다. 컴퓨팅 장치(12)는 최대 풀링 레이어(610)와 평균 풀링 레이어(620)를 통해 산출된 각 풀링 결과를 연결 레이어(630)(Concatenation layer)를 통해 연결(Concatenation)하는 연산을 수행할 수 있다. 이를 통해, 서로 분리된 레이어의 출력 데이터를 하나의 단일 레이어를 통해 취합할 수 있다. Specifically, the maximum pooling layer 610 and the average pooling layer 620 may simultaneously perform maximum pooling and average pooling in parallel with respect to the feature map. The computing device 12 may perform an operation of concatenating each pooling result calculated through the maximum pooling layer 610 and the average pooling layer 620 through a concatenation layer 630 . Through this, the output data of the layers separated from each other can be collected through one single layer.

예시적인 실시예에서, 연결 레이어(630)(Concatenation layer)는 풀링 블록(600)을 통하여 각각 산출된 풀링 결과를 하나의 레이어로 합쳐 확장할 수 있다. 즉, 컴퓨팅 장치(12)는 최대 풀링 및 평균 풀링에서 동일한 크기로 출력된 출력 데이터를 채널 방향으로 추가(채널 축으로 쌓음)하는 연결 연산을 수행하여 서로 분리된 레이어의 출력 데이터를 하나의 단일 레이어를 통해 취합할 수 있다.In an exemplary embodiment, the concatenation layer 630 may expand the pooling result calculated through the pooling block 600 by combining them into one layer. That is, the computing device 12 performs a concatenation operation of adding output data output with the same size in the maximum pooling and average pooling in the channel direction (stacking in the channel axis) to combine the output data of separate layers into one single layer. can be collected through

따라서, 본 발명의 일 실시예에 따른 영상 학습 방법은 복수의 풀링 레이어를 포함하는 풀링 블록을 통하여 복수의 풀링 방법 또는 서로 다른 필터 크기를 동시에 사용함으로써, 입력된 영상 데이터에 다양한 특징을 도출하여 영상 학습의 정확도를 향상시킬 수 있다. 즉, 일반적인 합성곱 신경망에 비하여 더 적은 파라미터 수를 가지면서 영상을 더욱 효율적으로 학습하고 영상 학습할 수 있는 효과가 있다.Therefore, in the image learning method according to an embodiment of the present invention, various features are derived from input image data by using a plurality of pooling methods or different filter sizes through a pooling block including a plurality of pooling layers at the same time. It can improve the learning accuracy. That is, there is an effect of more efficiently learning an image and image learning while having a smaller number of parameters compared to a general convolutional neural network.

단계 308에서, 컴퓨팅 장치(12)는 컨벌루션 레이어(210)와 풀링 레이어(220) 결과 영상 데이터에 대한 최종 특징 맵을 추출할 수 있다. 예를 들어, 컴퓨팅 장치(12)는 영상 데이터에 대해 컨벌루션 레이어(210) 및 풀링 레이어(220)를 반복적으로 수행하여 기 설정된 개수의 1차원 행렬의 최종 특징 맵을 추출할 수 있다.In operation 308 , the computing device 12 may extract a final feature map for the image data resulting from the convolutional layer 210 and the pooling layer 220 . For example, the computing device 12 may extract final feature maps of a preset number of one-dimensional matrices by repeatedly performing the convolutional layer 210 and the pooling layer 220 on the image data.

단계 310에서, 컴퓨팅 장치(12)는 기 설정된 개수의 최종 특징 맵 및 기 설정된 활성 함수(Activation Function)를 기반으로 영상 데이터의 분류 정보를 생성하여 출력할 수 있다. In operation 310, the computing device 12 may generate and output classification information of the image data based on a preset number of final feature maps and a preset activation function.

구체적으로, 완전 연결 레이어(fully connected layer)를 통해 기 설정된 개수의 최종 특징 맵들에 대해 기 설정된 가중치 계수들을 각각 곱한 후 이를 합산한 값을 기 설정된 활성 함수의 입력으로 하여 영상 데이터의 분류 정보를 생성하여, 출력 레이어로 전달할 수 있다. 여기서, 활성 함수로는 Softmax 함수 등이 사용될 수 있다.Specifically, classification information of image data is generated by multiplying preset weighting coefficients for a preset number of final feature maps through a fully connected layer, and then using the summed value as an input of a preset activation function. Thus, it can be transmitted to the output layer. Here, as the activation function, a Softmax function or the like may be used.

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although representative embodiments of the present invention have been described in detail above, those of ordinary skill in the art will understand that various modifications are possible without departing from the scope of the present invention with respect to the above-described embodiments. . Therefore, the scope of the present invention should not be limited to the described embodiments and should be defined by the claims described below as well as the claims and equivalents.

10: 컴퓨팅 환경
12: 컴퓨팅 장치
14: 프로세서
16: 컴퓨터 판독 가능 저장 매체
18: 통신 버스
20: 프로그램
22: 입출력 인터페이스
24: 입출력 장치
26: 네트워크 통신 인터페이스10: Computing Environment
12: computing device
14: Processor
16: computer readable storage medium
18: communication bus
20: Program
22: input/output interface
24: input/output device
26: network communication interface

Claims

one or more processors, and
A method performed in a computing device having a memory storing one or more programs to be executed by the one or more processors, the method comprising:
generating image data by normalizing a frame of an input image;
extracting a feature map for the image data through a convolution layer with respect to the generated image data;
performing sub-sampling on the feature map through a pooling layer on the extracted feature map;
extracting one or more final feature maps for the image data based on the convolutional layer and the pooling layer; and
and generating and outputting classification information for the image data based on the extracted final feature map and a preset activation function,
The step of performing the sub-sampling comprises:
The method further comprises performing an operation of concatenating each pooling result calculated through a pooling box including a plurality of pooling layers with respect to the extracted feature map through a concatenated layer,
The step of performing the sub-sampling comprises:
The method further comprises sequentially performing a first pooling, a second pooling, and a third pooling on the extracted feature map through a first pooling layer, a second pooling layer, and a third pooling layer,
The first pooling layer, the second pooling layer, and the third pooling layer use the same pooling method, but have second filters of different sizes,
The step of performing the sub-sampling comprises:
The padding width of the input feature maps is adjusted so that pooling results of the same size are calculated in the first pooling layer, the second pooling layer, and the third pooling layer having the different sizes of the second filters. to do; and
The method further comprising the step of performing a concatenation operation of adding each pooling result calculated with the same size in a channel direction.

The method according to claim 1,
The step of generating the image data includes:
extracting a size of each image included in the input image, and resizing the extracted image size to the same pixel; and
and generating the image data by normalizing the image frame adjusted to the same size.

The method according to claim 1,
The step of extracting the feature map,
and resetting each pixel value of the image data according to a weight in the pixel by sequentially applying a first filter to each pixel of the image data.

The method according to claim 1,
The step of extracting the feature map,
An image learning method comprising the step of performing a transformation operation on a plurality of feature maps using a non-linear activation function.

delete

one or more processors, and
A method performed in a computing device having a memory storing one or more programs to be executed by the one or more processors, the method comprising:
generating image data by normalizing a frame of an input image;
extracting a feature map for the image data through a convolution layer with respect to the generated image data;
performing sub-sampling on the feature map through a pooling layer on the extracted feature map;
extracting one or more final feature maps for the image data based on the convolutional layer and the pooling layer; and
and generating and outputting classification information for the image data based on the extracted final feature map and a preset activation function,
The step of performing the sub-sampling comprises:
The method further comprises performing an operation of concatenating each pooling result calculated through a pooling box including a plurality of pooling layers with respect to the extracted feature map through a concatenated layer,
The step of performing the sub-sampling comprises:
The method further comprises simultaneously performing maximum pooling and average pooling on the extracted feature map through a maximum pooling layer and an average pooling layer,
The maximum pooling layer and the average pooling layer have a second filter of the same size,
The step of performing the sub-sampling comprises:
The method further comprising the step of performing a concatenation operation of adding each pooling result calculated with the same size in the maximum pooling layer and the average pooling layer having the second filter of the same size in the channel direction.

The method according to claim 1,
The step of generating and outputting the classification information includes:
and generating the classification information of the image data through nonlinear transformation of the weighted sum of the feature maps through a fully connected layer.

one or more processors, and
A method performed in a computing device having a memory storing one or more programs to be executed by the one or more processors, the method comprising:
a command for generating image data by normalizing a frame of an input image;
instructions for extracting a feature map for the image data through a convolution layer with respect to the generated image data;
instructions for performing sub-sampling on the feature map through a pooling layer on the extracted feature map;
instructions for extracting one or more final feature maps for the image data based on the convolutional layer and the pooling layer; and
and a command for generating and outputting classification information for the image data based on the extracted final feature map and a preset activation function,
The command for performing the subsampling is:
Further comprising a command for performing an operation of concatenating each pooling result calculated through a pooling box including a plurality of pooling layers with respect to the extracted feature map through a concatenated layer,
The command for performing the subsampling is:
Further comprising instructions for sequentially performing a first pooling, a second pooling, and a third pooling through a first pooling layer, a second pooling layer, and a third pooling layer on the extracted feature map,
The first pooling layer, the second pooling layer, and the third pooling layer use the same pooling method, but have second filters of different sizes;
The command for performing the subsampling is:
The padding width of the input feature maps is adjusted so that pooling results of the same size are calculated in the first pooling layer, the second pooling layer, and the third pooling layer having the different sizes of the second filters. order to do; and
The computing device further comprising an instruction for performing a concatenation operation of adding each pooling result calculated with the same size in a channel direction.

12. The method of claim 11,
The command for generating the image data is
a command for extracting a size of each image included in the input image, and resizing the extracted image size to the same pixel; and
and a command for generating the image data by normalizing the image frame adjusted to the same size.

12. The method of claim 11,
The command for extracting the feature map is
and a command for sequentially applying a first filter to each pixel of the image data to reset each pixel value of the image data according to a weight in the pixel.

12. The method of claim 11,
The command for extracting the feature map is
A computing device comprising: instructions for performing a transform operation on the plurality of feature maps using a non-linear activation function.

delete

one or more processors, and
A method performed in a computing device having a memory storing one or more programs to be executed by the one or more processors, the method comprising:
a command for generating image data by normalizing a frame of an input image;
instructions for extracting a feature map for the image data through a convolution layer with respect to the generated image data;
instructions for performing sub-sampling on the feature map through a pooling layer on the extracted feature map;
instructions for extracting one or more final feature maps for the image data based on the convolutional layer and the pooling layer; and
and a command for generating and outputting classification information for the image data based on the extracted final feature map and a preset activation function,
The command for performing the subsampling is:
Further comprising a command for performing an operation of concatenating each pooling result calculated through a pooling box including a plurality of pooling layers with respect to the extracted feature map through a concatenated layer,
The command for performing the subsampling is:
Further comprising instructions for simultaneously performing maximum pooling and average pooling through a maximum pooling layer and an average pooling layer on the extracted feature map,
The maximum pooling layer and the average pooling layer have a second filter of the same size,
The command for performing the subsampling is:
The computing device further comprising an instruction for performing a concatenation operation of adding each pooling result calculated to have the same size in the maximum pooling layer and the average pooling layer having the second filter of the same size in a channel direction.

12. The method of claim 11,
A command for generating and outputting the classification information is
and an instruction for generating the classification information of the image data through nonlinear transformation of the weighted sum of the feature maps through a fully connected layer.

As a computer program stored in a non-transitory computer readable storage medium,
The computer program includes one or more instructions, which, when executed by a computing device having one or more processors, cause the computing device to:
generating image data by normalizing a frame of an input image;
extracting a feature map for the image data through a convolution layer with respect to the generated image data;
performing sub-sampling on the feature map through a pooling layer on the extracted feature map;
extracting one or more final feature maps for the image data based on the convolutional layer and the pooling layer; and
to generate and output classification information for the image data based on the extracted final feature map and a preset activation function;
The computer program causes the computing device to perform the sub-sampling,
To perform an operation of concatenating each pooling result calculated through a pooling box including a plurality of pooling layers with respect to the extracted feature map through a concatenated layer,
The computer program causes the computing device to perform the sub-sampling,
to sequentially perform first pooling, second pooling, and third pooling on the extracted feature map through a first pooling layer, a second pooling layer, and a third pooling layer;
The first pooling, the second pooling, and the third pooling use the same pooling method, but have second filters of different sizes,
The computer program causes the computing device to perform the sub-sampling,
The padding width of the input feature maps is adjusted so that pooling results of the same size are calculated in the first pooling layer, the second pooling layer, and the third pooling layer having the different sizes of the second filters. to do, and
A computer program for performing a concatenation operation of adding each pooling result calculated with the same size in a channel direction.

22. The method of claim 21,
The computer program causes the computing device to generate the image data,
extracting each image size included in the input image, and resizing the extracted image size to the same pixel, and
A computer program for generating the image data by normalizing the image frame adjusted to the same size.

22. The method of claim 21,
The computer program causes the computing device to extract the feature map,
and sequentially applying a first filter to each pixel of the image data to reset each pixel value of the image data according to a weight within the pixel.

22. The method of claim 21,
The computer program causes the computing device to extract the feature map,
A computer program for performing a transformation operation on a plurality of feature maps using a non-linear activation function.

delete

As a computer program stored in a non-transitory computer readable storage medium,
The computer program includes one or more instructions, which, when executed by a computing device having one or more processors, cause the computing device to:
generating image data by normalizing a frame of an input image;
extracting a feature map for the image data through a convolution layer with respect to the generated image data;
performing sub-sampling on the feature map through a pooling layer on the extracted feature map;
extracting one or more final feature maps for the image data based on the convolutional layer and the pooling layer; and
to generate and output classification information for the image data based on the extracted final feature map and a preset activation function;
The computer program causes the computing device to perform the sub-sampling,
To perform an operation of concatenating each pooling result calculated through a pooling box including a plurality of pooling layers with respect to the extracted feature map through a concatenated layer,
The computer program causes the computing device to perform the sub-sampling,
Maximum pooling and average pooling are simultaneously performed on the extracted feature map through a maximum pooling layer and an average pooling layer,
The maximum pooling layer and the average pooling layer have a second filter of the same size,
The computer program causes the computing device to perform the sub-sampling,
A concatenation operation of adding each pooling result calculated with the same size in the maximum pooling layer and the average pooling layer having the second filter of the same size in the channel direction is performed.

22. The method of claim 21,
The computer program causes the computing device to generate and output the classification information,
A computer program for generating classification information of the image data through nonlinear transformation of the weighted sum of the feature maps through a fully connected layer.