KR102140805B1

KR102140805B1 - Neural network learning method and apparatus for object detection of satellite images

Info

Publication number: KR102140805B1
Application number: KR1020200053804A
Authority: KR
Inventors: 김세엽; 김성호; 신수진
Original assignee: 국방과학연구소
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2020-08-03

Abstract

The present invention provides a method and apparatus for learning a neural network for object detection of a satellite image. In the present invention, the method for learning the neural network to improve object detection performance using a weighted object candidate box and a metric learning may be provided. According to the present invention, the neural network learning method for object detection of the satellite image comprises the steps of: receiving an image including an object and extracting a feature map; generating a first object candidate box surrounding the object based on the feature map; generating a second object candidate box using a mask to which a predetermined weight is applied to a region extended N times (here, N is a real number greater than 1) around the first object candidate box; applying the second object candidate box to a region of interest (RoI) pooling and inputting the second object candidate box to a classification feature vector extractor; and inputting a classification feature vector outputted from the classification feature vector extractor into a metric learning network to learn.

Description

Neural network learning method and device for object identification of satellite images {NEURAL NETWORK LEARNING METHOD AND APPARATUS FOR OBJECT DETECTION OF SATELLITE IMAGES}

실시예들은 뉴럴 네트워크 학습 방법 및 장치에 관한 것으로, 보다 상세하게는 물체 식별을 위한 뉴럴 네트워크를 학습시키는 방법 및 장치에 관한 것이다.Embodiments relate to a neural network learning method and apparatus, and more particularly, to a method and apparatus for training a neural network for object identification.

현재는 전술 상황에서 숙련된 판독관이 수동으로 광학 위성영상을 분석하여 적군의 선박, 항공기 등의 표적을 식별한다. 하지만 다수의 초소형 위성 운용 체계 등 습득되는 위성 영상 정보가 많아질 것으로 예측되는 현 상황에서 많은 수의 위성영상을 수동으로 빠르게 분석하는 것은 어렵다. 따라서 위성영상 내 표적을 자동으로 식별할 수 있는 딥러닝 모델을 개발하여, 군 감시 정찰 체계를 자동화 하기위한 요구가 증가하고 있다.Currently, in a tactical situation, a skilled reader manually analyzes optical satellite images to identify targets, such as enemy ships and aircraft. However, it is difficult to manually and rapidly analyze a large number of satellite images in a current situation in which it is predicted that the number of acquired satellite image information such as a number of ultra-small satellite operating systems will increase. Accordingly, there is an increasing demand to automate military surveillance and reconnaissance systems by developing a deep learning model capable of automatically identifying targets in satellite images.

한편, 최근 들어 딥러닝의 발전으로 물체 식별 딥러닝 모델이 공공, 산업, 군사 등 분야를 막론하고 널리 적용되고 있다. 특히, 물체식별 딥러닝 모델은 CCTV, 위성영상 등 사람이 수동으로 관측 및 분석하는 보안 분야에서 높은 활용성이 기대 되고 있다. 이와 같이 다양한 분야에서 물체 식별 딥러닝 모델의 활용도가 높아짐에 따라, 입력된 영상에서 정확하게 물체의 위치를 식별하고, 세부 분류로 분류하고자 하는 요구 또한 증가하고 있다. 하지만 상대적으로 영상 내 물체의 크기가 작은 경우, 딥러닝 모델의 성능이 떨어지는 경향을 보인다. On the other hand, in recent years, with the development of deep learning, object identification deep learning models have been widely applied in public, industrial, and military fields. In particular, the object identification deep learning model is expected to be highly useful in security fields that are manually observed and analyzed by humans, such as CCTV and satellite images. As the utilization of the object identification deep learning model increases in various fields as described above, there is an increasing demand to accurately identify the position of the object in the input image and classify it into detailed classification. However, when the size of the object in the image is relatively small, the performance of the deep learning model tends to decrease.

높은 고도에서 영상이 획득되는 위성 영상의 경우, 식별하고자 하는 물체의 크기가 작아서, 물체를 구분하기 위한 정보가 상대적으로 부족하다. 이로 인해 물체의 위치를 식별하기 어려우며, 물체를 세부 분류로 분류하기 또한 어렵다. 따라서 물체 식별 성능을 향상시키기 위한 새로운 방법이 요구된다. In the case of a satellite image in which an image is acquired at a high altitude, the size of the object to be identified is small, so information for distinguishing the object is relatively insufficient. Due to this, it is difficult to identify the position of the object, and it is also difficult to classify the object into detailed categories. Therefore, a new method for improving object identification performance is required.

선행 1: KR 10-2015-0108577호Advance 1: KR 10-2015-0108577 선행 2: KR 10-2019-0056009호Predecessor 2: KR 10-2019-0056009 선행 3: KR 10-2019-0014908호Predecessor 3: KR 10-2019-0014908

실시예들에 따른 일 과제는, 위성 영상의 물체 식별을 위한 뉴럴 네트워크 학습 방법 및 장치를 제공하는 것이다.One task according to embodiments is to provide a neural network learning method and apparatus for object identification of satellite images.

실시예들에 따른 다른 일 과제는 뉴럴 네트워크를 이용한 물체 식별 방법 및 장치를 제공하는 것이다.Another task according to embodiments is to provide an object identification method and apparatus using a neural network.

해결하고자 하는 과제가 상술한 과제로 제한되는 것은 아니며, 언급되지 아니한 과제들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The problem to be solved is not limited to the above-described problems, and the problems not mentioned will be clearly understood by a person having ordinary knowledge in the technical field to which the present invention pertains from this specification and the accompanying drawings.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 일 양상에 따르면, 위성 영상의 물체 식별을 위한 뉴럴 네트워크 학습 방법으로서, 물체가 포함된 이미지(Image)를 입력 받아 특징맵(Feature Map)을 추출하는 단계, 특징맵에 기초하여 물체를 둘러싸는 제1 물체 후보 박스를 생성하는 단계, 제1 물체 후보 박스를 중심으로 N배(여기서, N은 1보다 큰 실수) 확장된 영역에 소정의 가중치가 적용된 마스크(Mask)를 이용하여 제2 물체 후보 박스를 생성하는 단계, 제2 물체 후보 박스가 RoI 풀링(Region of interest pooling)에 적용되어 분류 특징 벡터 추출부(Classification Feature Vector Extractor)에 입력되는 단계 및 분류 특징 벡터 추출부에서 출력된 분류 특징 벡터(Classification Feature Vector)를 메트릭 러닝 네트워크(Metric learning Network)에 입력하여 학습되는 단계를 포함하는 뉴럴 네트워크 학습 방법이 제공될 수 있다.As a technical means for achieving the above-described technical problem, according to an aspect of the present disclosure, as a neural network learning method for object identification of a satellite image, an image containing an object is received and a feature map is received. Extracting, generating a first object candidate box surrounding the object based on the feature map, N times (where N is a real number greater than 1) centered on the first object candidate box, a predetermined area Generating the second object candidate box using the weighted mask, the second object candidate box is applied to the RoI pooling (Region of interest pooling) and input to the Classification Feature Vector Extractor The neural network learning method may be provided, including the step of inputting and learning the classification feature vector output from the classification feature vector extraction unit into a metric learning network.

또한, 가중치는 마스크의 중심에서 외부 영역으로 진행할수록 감소될 수 있다.Also, the weight may decrease as it progresses from the center of the mask to the outer region.

또한, 제2 물체 후보 박스는 제1 물체 후보 박스를 중심으로 N배(여기서, N은 1보다 큰 실수) 확장된 영역 및 마스크의 아다마르 곱(Hadamard Product)으로 생성될 수 있다.In addition, the second object candidate box may be generated with an N-fold extended region and a Hadamard product of the mask (where N is a real number greater than 1) around the first object candidate box.

또한, 제2 물체 후보 박스의 높이 및 너비가 학습될 수 있다.Also, the height and width of the second object candidate box can be learned.

또한, 제2 물체 후보 박스가 RoI 풀링에 적용되어 분류 네트워크(Classification Network)에 입력되는 단계를 더 포함할 수 있다.In addition, the second object candidate box may further include the step of being applied to RoI pooling and input to a classification network.

또한, 메트릭 러닝 네트워크는 동일 종류(class)에 속하는 물체들의 상기 분류 특징 벡터들의 거리가 가깝게 되고, 다른 종류(class)에 속하는 물체들의 상기 분류 특징 벡터들의 거리는 멀어지도록 학습할 수 있다.In addition, the metric learning network may learn that the distances of the classification feature vectors of objects belonging to the same class are close, and the distances of the classification feature vectors of objects belonging to different classes are increased.

또한, 메트릭 러닝 네트워크는 메트릭 손실(Metric Loss)을 출력하고, 메트릭 손실을 이용하여 손실 함수를 계산할 수 있다.In addition, the metric learning network may output metric loss and calculate a loss function using the metric loss.

다른 일 양상에 따르면, 뉴럴 네트워크를 이용한 물체 식별 방법으로서, 위성 영상의 물체 식별을 위한 뉴럴 네트워크 학습 방법에 의해 학습된 뉴럴 네트워크에 이미지를 입력하는 단계, 학습된 뉴럴 네트워크를 통해 이미지에 상응하는 분류 결과를 출력하는 단계를 포함하는 뉴럴 네트워크를 이용한 물체 식별 방법이 제공될 수 있다.According to another aspect, as an object identification method using a neural network, inputting an image into a neural network learned by a neural network learning method for object identification of a satellite image, and classification corresponding to the image through the learned neural network An object identification method using a neural network including outputting a result may be provided.

또 다른 일 양상에 따르면, 위성 영상의 물체 식별을 위한 뉴럴 네트워크 학습 방법 또는 뉴럴 네트워크를 이용한 물체 식별 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 기록매체가 제공될 수 있다.According to another aspect, a recording medium recording a program for executing a neural network learning method for object identification of a satellite image or an object identification method using a neural network in a computer may be provided.

또 다른 일 양상에 따르면, 뉴럴 네트워크 학습 장치로서, 메모리 및 프로세서를 포함하고, 프로세서는, 이미지(Image)를 입력 받아 특징맵(Feature Map)을 추출하고, 특징맵에 기초하여 물체를 둘러싸는 제1 물체 후보 박스를 생성하고, 제1 물체 후보 박스의 높이 및 너비의 N배(여기서, N은 1보다 큰 실수)에 해당하는 영역에 소정의 가중치를 적용하여 제2 물체 후보 박스를 생성하고, 제2 물체 후보 박스가 RoI 풀링(Region of interest pooling)에 적용되어 분류 특징 벡터 추출부(Classification Feature Vector Extractor)에 입력되고, 분류 특징 벡터 추출부에서 출력된 분류 특징 벡터(Classification Feature Vector)를 메트릭 러닝 네트워크(Metric learning Network)에 입력하여 학습되는 뉴럴 네트워크 학습 장치가 제공될 수 있다.According to another aspect, as a neural network learning apparatus, including a memory and a processor, the processor receives an image (Image), extracts a feature map (Feature Map), and surrounds the object based on the feature map A first object candidate box is generated, and a second object candidate box is generated by applying a predetermined weight to an area corresponding to N times the height and width of the first object candidate box (where N is a real number greater than 1), The second object candidate box is applied to the RoI pooling (Region of interest pooling), input to the Classification Feature Vector Extractor, and metrics of the Classification Feature Vector output from the Classification Feature Vector Extractor A neural network learning apparatus that is learned by inputting into a learning network may be provided.

또 다른 일 양상에 따르면, 뉴럴 네트워크 학습 장치로서, 메모리 및 프로세서를 포함하고, 프로세서는, 위성 영상의 물체 식별을 위한 뉴럴 네트워크 학습 방법에 의해 학습된 뉴럴 네트워크에 이미지를 입력하고, 학습된 뉴럴 네트워크를 통해 이미지에 상응하는 분류 결과를 출력하는 뉴럴 네트워크를 이용한 물체 식별 장치가 제공 될 수 있다.According to another aspect, as a neural network learning apparatus, including a memory and a processor, the processor inputs an image into a neural network learned by a neural network learning method for object identification of a satellite image, and the learned neural network An object identification device using a neural network that outputs a classification result corresponding to an image may be provided.

실시 예에 따른 위성 영상의 물체 식별을 위한 뉴럴 네트워크 학습 방법 및 장치를 통해 군 감시 정찰 체계 자동화 및 감시 효율 향상에 기여할 수 있다.Through a neural network learning method and apparatus for object identification of satellite images according to an embodiment, it is possible to contribute to automation of a military surveillance reconnaissance system and improvement of surveillance efficiency.

또한, 물체뿐만 아니라, 물체의 주변 정보를 반영하여 학습할 수 있다. 물체의 주변 정보가 반영되어, 위성 영상 내의 물체 크기가 작은 경우에도 물체를 구분하기 위한 정보가 상대적으로 많아질 수 있다. 이에 따라, 물체 식별 성능이 향상될 수 있다. 물체 식별 성능이 향상됨에 따라, 사람이 수동으로 관측 및 분석하는 업무를 지원하여 분석 능력 및 속도가 향상될 수 있다.In addition, it is possible to learn by reflecting not only the object but also the surrounding information of the object. Since the surrounding information of the object is reflected, information for classifying the object may be relatively large even when the object size in the satellite image is small. Accordingly, object identification performance may be improved. As the object identification performance is improved, analytical ability and speed can be improved by supporting a manual observation and analysis task.

또한, 메트릭 러닝에 의해 물체 식별을 위한 뉴럴 네트워크가 학습됨에 따라, 물체의 종류를 식별하는 정확도를 향상시킬 수 있다.In addition, as the neural network for object identification is learned by metric learning, it is possible to improve the accuracy of identifying the type of object.

효과가 상술한 효과들로 제한되는 것은 아니며, 언급되지 아니한 효과들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확히 이해될 수 있을 것이다.The effects are not limited to the above-described effects, and the effects not mentioned will be clearly understood by those skilled in the art from the present specification and the accompanying drawings.

도 1은 일 실시예에 따른 뉴럴 네트워크의 아키텍처를 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 뉴럴 네트워크에서 입력 피처맵 및 출력 피처맵의 관계를 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 뉴럴 네트워크 장치의 하드웨어 구성을 도시한 블록도이다.
도 4는 뉴럴 네트워크의 컨볼루션 연산을 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 물체 식별을 위한 뉴럴 네트워크의 구조를 설명하기 위한 블록도이다.
도 6은 일 실시예에 따른 RoI 풀링을 설명하기 위한 도면이다.
도 7은 마스크를 설명하기 위한 도면이다.
도 8은 일 실시예에 따른 마스크를 설명하기 위한 도면이다.
도 9은 마스크를 적용한 결과를 설명하기 위한 도면이다.
도 10은 메트릭 러닝을 설명하기 위한 도면이다.
도 11은 일 실시예에 따른 뉴럴 네트워크 학습 방법을 설명하기 위한 흐름도이다.1 is a view for explaining the architecture of a neural network according to an embodiment.
2 is a diagram for explaining a relationship between an input feature map and an output feature map in a neural network according to an embodiment.
3 is a block diagram showing a hardware configuration of a neural network device according to an embodiment.
4 is a diagram for describing a convolution operation of a neural network.
5 is a block diagram illustrating a structure of a neural network for object identification according to an embodiment.
6 is a view for explaining RoI pooling according to an embodiment.
7 is a view for explaining a mask.
8 is a view for explaining a mask according to an embodiment.
9 is a view for explaining the results of applying the mask.
10 is a diagram for explaining metric learning.
11 is a flowchart illustrating a neural network learning method according to an embodiment.

본 실시 예들에서 사용되는 용어는 본 실시 예들에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 기술분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 임의로 선정된 용어도 있으며, 이 경우 해당 실시 예의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 실시 예들에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 실시 예들의 전반에 걸친 내용을 토대로 정의되어야 한다.The terminology used in the present exemplary embodiments has been selected from general terms that are currently widely used as possible while considering functions in the present exemplary embodiments, but this may vary depending on the intention or precedent of a person skilled in the art or the appearance of a new technology. . In addition, in certain cases, some terms are arbitrarily selected, and in this case, their meanings will be described in detail in the description of the corresponding embodiment. Therefore, the terms used in the present embodiments should be defined based on the meanings of the terms and contents of the present embodiments, not simply the names of the terms.

실시 예들에 대한 설명에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 그 중간에 다른 구성요소를 사이에 두고 전기적으로 연결되어 있는 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 실시 예들에 기재된 "...부"의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the description of the embodiments, when it is said that a part is connected to another part, this includes not only the case of being directly connected, but also the case of being electrically connected with another component in between. In addition, when a part includes a certain component, this means that other components may be further included instead of excluding other components unless otherwise specified. In addition, the term "... unit" described in the embodiments means a unit that processes at least one function or operation, which may be implemented in hardware or software, or a combination of hardware and software.

본 실시 예들에서 사용되는 "구성된다"또는"포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계를 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.The terms "consisting of" or "comprising" as used in the embodiments should not be construed to include all of the various components or steps described in the specification, and some or all of them. It should be construed that the steps may not be included, or may further include additional components or steps.

하기 실시 예들에 대한 설명은 권리범위를 제한하는 것으로 해석되지 말아야 하며, 해당 기술분야의 당업자가 용이하게 유추할 수 있는 것은 실시 예들의 권리범위에 속하는 것으로 해석되어야 할 것이다. 이하 첨부된 도면들을 참조하면서 오로지 예시를 위한 실시 예들을 상세히 설명하기로 한다.The description of the following embodiments should not be construed as limiting the scope of rights, and those that can be easily inferred by those skilled in the art should be interpreted as belonging to the scope of rights of the embodiments. Hereinafter, exemplary embodiments for illustrative purposes will be described in detail with reference to the accompanying drawings.

실시예에서, 분류(Classification)란, 입력으로 주어진 이미지 안의 객체(Object)의 종류(Class)를 구분하는 행위를 의미한다. 예를 들어, MNIST data set의 경우 0 부터 9까지 총 10개의 숫자들을 각각의 종류(class)로 구별하며 임의의 숫자 한 개의 이미지 입력에 대하여 학습된 딥러닝 모델은 입력된 이미지의 종류(class)가 0 부터 9까지의 숫자 중 어떤 숫자인지 분류하여 출력하게 된다. 로컬라이제이션(Localization)이란, 이미지 안의 객체(Object)가 이미지 안의 어느 위치에 있는지 위치 정보를 출력해주는 것을 의미한다. 주로 경계 박스(Bounding box)를 많이 사용한다. 경계 박스는 물체 후보 박스와 혼용될 수 있다.In an embodiment, classification refers to an act of classifying a class of an object in an image given as input. For example, in the case of the MNIST data set, a total of 10 numbers from 0 to 9 are classified into each class, and the deep learning model trained on the input of one random number image is the type of the input image. It is sorted and printed out from 0 to 9. Localization means outputting location information about an object in an image and an object in the image. The bounding box is mainly used. The bounding box can be mixed with the object candidate box.

객체 검출 또는 객체 식별(Object detection)이란 분류(Classification)와 로컬라이제이션(Localization)이 동시에 수행되는 것을 의미한다. 뉴럴 네트워크 의 학습 목적에 따라서 특정 객체만 검출할 수도 있고, 여러 개의 객체를 검출할 수도 있다. Object detection or object detection means that classification and localization are performed simultaneously. Depending on the learning purpose of the neural network, only a specific object can be detected, or multiple objects can be detected.

뉴럴 네트워크(neural network)는 생물학적 뇌를 모델링한 컴퓨터 과학적 아키텍쳐(computational architecture)를 참조한다. 최근 뉴럴 네트워크(neural network) 기술이 발전함에 따라, 다양한 종류의 전자 시스템에서 뉴럴 네트워크를 활용하여 입력 데이터를 분석하고 유효한 정보를 추출하는 연구가 활발히 진행되고 있다. 뉴럴 네트워크를 처리하는 장치는 복잡한 입력 데이터에 대한 많은 양의 연산을 필요로 한다. 따라서, 뉴럴 네트워크를 이용하여 대량의 입력 데이터를 실시간으로 분석하여, 원하는 정보를 추출하기 위해서는 뉴럴 네트워크에 관한 연산을 효율적으로 처리할 수 있는 기술이 요구된다.The neural network refers to a computational architecture that models a biological brain. Recently, as neural network technology has been developed, research into analyzing input data and extracting valid information by using the neural network in various types of electronic systems has been actively conducted. Devices that process neural networks require large amounts of computation on complex input data. Therefore, in order to analyze a large amount of input data in real time using a neural network and extract desired information, a technique capable of efficiently processing an operation related to the neural network is required.

도 1은 일 실시예에 따른 뉴럴 네트워크의 아키텍처를 설명하기 위한 도면이다.1 is a view for explaining the architecture of a neural network according to an embodiment.

도 1을 참조하면, 뉴럴 네트워크(1)는 딥 뉴럴 네트워크(Deep Neural Network, DNN) 또는 n-계층 뉴럴 네트워크(n-layers neural networks)의 아키텍처일 수 있다. DNN 또는 n-계층 뉴럴 네트워크는 컨볼루션 뉴럴 네트워크(Convolutional Neural Networks, CNN), 리커런트 뉴럴 네트워크(Recurrent Neural Networks, RNN), Deep Belief Networks, Restricted Boltzman Machines 등에 해당될 수 있다. 예를 들어, 뉴럴 네트워크(1)는 컨볼루션 뉴럴 네트워크(CNN)로 구현될 수 있으나, 이에 제한되지 않는다. 도 1에서는 뉴럴 네트워크(1)의 예시에 해당하는 컨볼루션 뉴럴 네트워크에서 일부의 컨볼루션 레이어가 도시되었지만, 컨볼루션 뉴럴 네트워크는 도시된 컨볼루션 레이어 외에도, 풀링 레이어(pooling layer), 풀리 커넥티드(fully connected) 레이어 등을 더 포함할 수 있다.Referring to FIG. 1, the neural network 1 may be an architecture of a deep neural network (DNN) or an n-layers neural network. The DNN or n-layer neural network may correspond to Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Belief Networks, Restricted Boltzman Machines, and the like. For example, the neural network 1 may be implemented as a convolutional neural network (CNN), but is not limited thereto. In FIG. 1, some convolutional layers are shown in the convolutional neural network corresponding to the example of the neural network 1, but the convolutional neural network includes a pooling layer and a pulley connected in addition to the illustrated convolutional layer. A fully connected) layer may be further included.

뉴럴 네트워크(1)는 입력 데이터, 피처맵들(feature maps) 및 출력을 포함하는 복수 레이어들을 갖는 아키텍처로 구현될 수 있다. 뉴럴 네트워크(1)에서 입력 데이터는 커널(kernel)이라 불리는 필터와의 컨볼루션 연산이 수행되고, 그 결과 피처맵들이 출력된다. 이때 생성된 출력 피처맵들은 입력 피처맵들로서 다시 커널과의 컨볼루션 연산이 수행되고, 새로운 피처맵들이 출력된다. 이와 같은 컨볼루션 연산이 반복적으로 수행된 결과, 최종적으로는 뉴럴 네트워크(1)를 통한 입력 데이터의 특징들에 대한 인식 결과가 출력될 수 있다.The neural network 1 may be implemented with an architecture having multiple layers including input data, feature maps and output. In the neural network 1, input data is subjected to a convolution operation with a filter called a kernel, and as a result, feature maps are output. At this time, the generated output feature maps are input feature maps, and convolution operation with the kernel is performed again, and new feature maps are output. As a result of such a convolution operation being repeatedly performed, a result of recognition of characteristics of input data through the neural network 1 may be finally output.

예를 들어, 도 1의 뉴럴 네트워크(1)에 24x24 픽셀 크기의 데이터가 입력된 경우, 입력 데이터는 커널과의 컨볼루션 연산을 통해 20x20 크기를 갖는 4채널의 피처맵들로 출력될 수 있다. 이후에도, 20x20 피처맵들은 커널과의 반복적인 컨볼루션 연산을 통해 크기가 줄어들면서, 최종적으로는 1x1 크기의 특징들이 출력될 수 있다. 뉴럴 네트워크(1)는 여러 레이어들에서 컨볼루션 연산 및 서브샘플링(또는 풀링) 연산을 반복적으로 수행함으로써 입력 데이터로부터 데이터 전체를 대표할 수 있는 강인한 특징들을 필터링하여 출력하고, 출력된 최종 특징들을 통해 입력 데이터의 인식 결과를 도출할 수 있다.For example, when data having a size of 24x24 pixels is input to the neural network 1 of FIG. 1, the input data may be output as feature channels of 4 channels having a size of 20x20 through convolution operation with the kernel. Thereafter, the size of the 20x20 feature maps is reduced through iterative convolution with the kernel, and finally, features of 1x1 size can be output. The neural network 1 filters and outputs robust features that can represent the entire data from the input data by repeatedly performing convolution and subsampling (or pooling) operations on multiple layers. Recognition results of input data can be derived.

도 2는 일 실시예에 따른 뉴럴 네트워크에서 입력 피처맵 및 출력 피처맵의 관계를 설명하기 위한 도면이다.2 is a diagram for explaining a relationship between an input feature map and an output feature map in a neural network according to an embodiment.

도 2을 참고하면, 뉴럴 네트워크의 어느 레이어(3)에서, 제1피처맵(FM1)은 입력 피처맵에 해당될 수 있고, 제2피처 맵(FM2)는 출력 피처맵에 해당될 수 있다. 피처맵은 입력 데이터의 다양한 특징들이 표현된 데이터 세트를 의미할 수 있다. 피처맵들(FM1, FM2)은 2차원 매트릭스의 엘리먼트들을 갖거나 또는 3차원 매트릭스의 엘리먼트들을 가질 수 있고, 각각의 엘리먼트에는 픽셀 값이 정의될 수 있다. 피처 맵들(FM1, FM2)은 너비(W)(또는 칼럼이라고 함), 높이(H)(또는 로우라고 함) 및 깊이(D)를 가진다. 이때, 깊이(D)는 채널들의 개수에 해당될 수 있다.Referring to FIG. 2, in a layer 3 of a neural network, the first feature map FM1 may correspond to an input feature map, and the second feature map FM2 may correspond to an output feature map. The feature map may mean a data set in which various characteristics of input data are expressed. The feature maps FM1 and FM2 may have elements of a 2D matrix or elements of a 3D matrix, and a pixel value may be defined for each element. The feature maps FM1 and FM2 have a width W (or column), height H (or row), and depth D. At this time, the depth D may correspond to the number of channels.

제1피처맵(FM1) 및 커널의 웨이트맵(WM)에 대한 컨볼루션 연산이 수행될 수 있고, 그 결과 제2피처맵(FM2)이 생성될 수 있다. 웨이트맵(WM)은 각 엘리먼트에 정의된 웨이트로 제1피처맵(FM1)과 컨볼루션 연산을 수행함으로써 제1피처맵(FM1)의 특징들을 필터링한다. 웨이트맵(WM)은 제1입력 피처맵(FM1)을 슬라이딩 윈도우 방식으로 시프트하면서 제1입력 피처맵(FM1)의 윈도우들(또는 타일이라고도 함)과 컨볼루션 연산을 수행한다. 각 시프트 동안, 웨이트맵(WM)에 포함된 웨이트들 각각은 제1피처맵(FM1) 내 중첩된 윈도우의 픽셀 값들 각각과 곱해지고 더해질 수 있다. 제1피처맵(FM1)과 웨이트맵(WM)이 컨볼루션됨에 따라, 제2피처맵(FM2)의 하나의 채널이 생성될 수 있다. 도 2에는 하나의 커널에 대한 웨이트맵(WM)이 도시되었으나, 실제로는 복수의 커널들의 웨이트 맵들이 제1피처맵(FM1)과 각각 컨볼루션되어, 복수의 채널들의 제2피처맵(FM2)이 생성될 수 있다. 제2피처맵(FM2)은 다음 레이어의 입력 피처맵에 해당될 수 있다. 예를 들어, 제2피처맵(FM2)은 풀링(또는 서브샘플링) 레이어의 입력 피처맵이 될 수 있다.The convolution operation of the first feature map FM1 and the kernel weight map WM may be performed, and as a result, the second feature map FM2 may be generated. The weight map WM filters characteristics of the first feature map FM1 by performing a convolution operation with the first feature map FM1 with a weight defined in each element. The weight map WM shifts the first input feature map FM1 in a sliding window manner to perform convolution with windows (or tiles) of the first input feature map FM1. During each shift, each of the weights included in the weight map WM may be multiplied and added to each of the pixel values of the overlapped window in the first feature map FM1. As the first feature map FM1 and the weight map WM are convolved, one channel of the second feature map FM2 may be generated. In FIG. 2, a weight map (WM) for one kernel is illustrated, but in reality, weight maps of a plurality of kernels are respectively convolved with a first feature map (FM1), and a second feature map (FM2) of a plurality of channels is used. This can be generated. The second feature map FM2 may correspond to an input feature map of the next layer. For example, the second feature map FM2 may be an input feature map of a pooling (or subsampling) layer.

도 1 및 도 2에서는 설명의 편의를 위하여 뉴럴 네트워크(1)의 개략적인 아키텍처에 대해서만 도시되어 있다. 하지만, 뉴럴 네트워크(1)는 도시된 바와 달리, 보다 많거나 적은 개수의 레이어들, 피처맵들, 커널들 등으로 구현될 수 있고, 그 크기들 또한 다양하게 변형될 수 있음을 당해 기술분야의 통상의 기술자라면 이해할 수 있다.1 and 2 are shown only for the schematic architecture of the neural network 1 for convenience of explanation. However, the neural network 1 may be implemented with more or fewer layers, feature maps, kernels, and the like, as illustrated, and its sizes may also be variously modified. Anyone skilled in the art can understand.

도 3은 일 실시예에 따른 뉴럴 네트워크 장치의 하드웨어 구성을 도시한 블록도이다.3 is a block diagram showing a hardware configuration of a neural network device according to an embodiment.

뉴럴 네트워크 장치(300)는 PC(personal computer), 서버 디바이스, 모바일 디바이스, 임베디드 디바이스 등의 다양한 종류의 디바이스들로 구현될 수 있고, 구체적인 예로서 뉴럴 네트워크를 이용한 음성 인식, 영상 인식, 영상 분류 등을 수행하는 스마트폰, 태블릿 디바이스, AR(Augmented Reality) 디바이스, IoT(Internet of Things) 디바이스, 자율주행 자동차, 로보틱스, 의료기기 등에 해당될 수 있으나, 이에 제한되지 않는다. 나아가서, 뉴럴 네트워크 장치(300)는 위와 같은 디바이스에 탑재되는 전용 하드웨어 가속기(HW accelerator)에 해당될 수 있고, 뉴럴 네트워크 장치(300)는 뉴럴 네트워크 구동을 위한 전용 모듈인 NPU(neural processing unit), TPU(Tensor Processing Unit), Neural Engine 등과 같은 하드웨어 가속기일 수 있으나, 이에 제한되지 않는다.The neural network apparatus 300 may be implemented with various types of devices such as a personal computer (PC), a server device, a mobile device, and an embedded device. As a specific example, voice recognition, image recognition, image classification, etc. using a neural network Smart phones, tablet devices, AR (Augmented Reality) devices, Internet of Things (IoT) devices, autonomous vehicles, robotics, medical devices, and the like, which are performed, but are not limited thereto. Further, the neural network device 300 may correspond to a dedicated hardware accelerator (HW accelerator) mounted on the above device, and the neural network device 300 may be a neural processing unit (NPU), a dedicated module for driving a neural network, It may be a hardware accelerator such as a TPU (Tensor Processing Unit), a Neural Engine, but is not limited thereto.

도 3을 참고하면, 뉴럴 네트워크 장치(300)는 프로세서(310) 및 메모리(320)를 포함한다. 도 3에 도시된 뉴럴 네트워크 장치(300)에는 본 실시예들와 관련된 구성요소들만이 도시되어 있다. 따라서, 뉴럴 네트워크 장치(300)에는 도 3에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음은 당해 기술분야의 통상의 기술자에게 자명하다.Referring to FIG. 3, the neural network device 300 includes a processor 310 and a memory 320. In the neural network device 300 illustrated in FIG. 3, only components related to the present exemplary embodiments are illustrated. Accordingly, it is apparent to those skilled in the art that the neural network device 300 may further include other general-purpose components in addition to the components shown in FIG. 3.

프로세서(310)는 뉴럴 네트워크 장치(300)를 실행하기 위한 전반적인 기능들을 제어하는 역할을 한다. 예를 들어, 프로세서(310)는 뉴럴 네트워크 장치(300) 내의 메모리(320)에 저장된 프로그램들을 실행함으로써, 뉴럴 네트워크 장치(300)를 전반적으로 제어한다. 프로세서(310)는 뉴럴 네트워크 장치(300) 내에 구비된 CPU(central processing unit), GPU(graphics processing unit), AP(application processor) 등으로 구현될 수 있으나, 이에 제한되지 않는다.The processor 310 serves to control overall functions for executing the neural network device 300. For example, the processor 310 generally controls the neural network device 300 by executing programs stored in the memory 320 in the neural network device 300. The processor 310 may be implemented as a central processing unit (CPU), graphics processing unit (GPU), or application processor (AP) provided in the neural network device 300, but is not limited thereto.

메모리(320)는 뉴럴 네트워크 장치(300) 내에서 처리되는 각종 데이터들을 저장하는 하드웨어로서, 예를 들어, 메모리(320)는 뉴럴 네트워크 장치(300)에서 처리된 데이터들 및 처리될 데이터들을 저장할 수 있다. 또한, 메모리(320)는 뉴럴 네트워크 장치(300)에 의해 구동될 애플리케이션들, 드라이버들 등을 저장할 수 있다. 메모리(320)는 DRAM(dynamic random access memory), SRAM(static random access memory) 등과 같은 RAM(random access memory), ROM(read-only memory), EEPROM(electrically erasable programmable read-only memory), CD-ROM, 블루레이 또는 다른 광학 디스크 스토리지, HDD(hard disk drive), SSD(solid state drive), 또는 플래시 메모리를 포함할 수 있다.The memory 320 is hardware that stores various data processed in the neural network device 300. For example, the memory 320 can store data processed by the neural network device 300 and data to be processed. have. In addition, the memory 320 may store applications, drivers, and the like to be driven by the neural network device 300. The memory 320 includes random access memory (RAM) such as dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and CD- ROM, Blu-ray or other optical disk storage, hard disk drive (HDD), solid state drive (SSD), or flash memory.

프로세서(310)는 메모리(320)로부터 뉴럴 네트워크 데이터, 예를 들어 이미지 데이터, 피처맵 데이터, 커널 데이터 등을 리드/라이트(read/write)하고, 리드/라이트된 데이터를 이용하여 뉴럴 네트워크를 실행한다. 뉴럴 네트워크가 실행될 때, 프로세서(310)는 출력 피처맵에 관한 데이터를 생성하기 위하여, 입력 피처맵과 커널 간의 컨볼루션 연산을 반복적으로 수행한다. 이때, 입력 피처맵의 채널 수, 커널의 채널 수, 입력 피처맵의 크기, 커널의 크기, 값의 정밀도(precision) 등의 다양한 팩터들에 의존하여 컨볼루션 연산의 연산량이 결정될 수 있다. 도 1에 도시된 뉴럴 네트워크(1)와 달리, 뉴럴 네트워크 장치(300)에서 구동되는 실제 뉴럴 네트워크는 보다 복잡한 아키텍처로 구현될 수 있다. 이하에서 설명된 방법들은 뉴럴 네트워크 장치(300)의 프로세서(310) 및 메모리(320)에 의해 수행될 수 있다.The processor 310 reads/writes neural network data, for example image data, feature map data, kernel data, and the like, from the memory 320 and executes the neural network using the read/written data. do. When the neural network is executed, the processor 310 iteratively performs a convolution operation between the input feature map and the kernel to generate data about the output feature map. At this time, the computation amount of the convolution operation may be determined depending on various factors such as the number of channels of the input feature map, the number of channels of the kernel, the size of the input feature map, the size of the kernel, and the precision of the value. Unlike the neural network 1 shown in FIG. 1, an actual neural network driven by the neural network device 300 may be implemented with a more complex architecture. The methods described below may be performed by the processor 310 and the memory 320 of the neural network device 300.

도 4는 뉴럴 네트워크의 컨볼루션 연산을 설명하기 위한 도면이다.4 is a diagram for describing a convolution operation of a neural network.

도 4의 예시에서, 입력 피처맵(410)은 6x6 크기이고, 원본 커널(420)은 3x3 크기이고, 출력 피처맵(430)은 4x4 크기인 것으로 가정하나, 이에 제한되지 않고 뉴럴 네트워크는 다양한 크기의 피처맵들 및 커널들로 구현될 수 있다. 또한, 입력 피처맵(410), 원본 커널(420) 및 출력 피처맵(430)에 정의된 값들은 모두 예시적인 값들일 뿐이고, 본 실시예들은 이에 제한되지 않는다. 한편, 원본 커널(420)은 앞서 설명된 바이너리-웨이트 커널에 해당된다.In the example of FIG. 4, it is assumed that the input feature map 410 is 6x6 size, the original kernel 420 is 3x3 size, and the output feature map 430 is 4x4 size. Can be implemented with feature maps and kernels. In addition, the values defined in the input feature map 410, the original kernel 420, and the output feature map 430 are all exemplary values, and the present embodiments are not limited thereto. Meanwhile, the original kernel 420 corresponds to the binary-weight kernel described above.

원본 커널(420)은 입력 피처맵(410)에서 3x3 크기의 윈도우 단위로 슬라이딩하면서 컨볼루션 연산을 수행한다. 컨볼루션 연산은 입력 피처맵(410)의 어느 윈도우의 각 픽셀 값 및 원본 커널(420)에서 대응 위치의 각 엘리먼트의 웨이트 간의 곱셈을 하여 획득된 값들을 모두 합산하여, 출력 피처맵(430)의 각 픽셀 값을 구하는 연산을 의미한다. 구체적으로, 원본 커널(420)은 먼저 입력 피처맵(410)의 제1윈도우(411)와 컨볼루션 연산을 수행한다. 즉, 제1윈도우(411)의 각 픽셀 값 1, 2, 3, 4, 5, 6, 7, 8, 9는 각각 원본 커널(420)의 각 엘리먼트의 웨이트 -1, -1, +1, +1, -1, -1, -1, +1, +1과 각각 곱해지고, 그 결과로서 -1, -2, 3, 4, -5, -6, -7, 8, 9가 획득된다. 다음으로, 획득된 값들 -1, -2, 3, 4, -5, -6, -7, 8, 9를 모두 더한 결과인 3이 계산되고, 출력 피처맵(430)의 1행1열의 픽셀 값(431)은 3으로 결정된다. 여기서, 출력 피처맵(430)의 1행1열의 픽셀 값(431)은 제1윈도우(411)에 대응된다. 마찬가지 방식으로, 입력 피처맵(410)의 제2윈도우(412)와 원본 커널(420) 간의 컨볼루션 연산이 수행됨으로써 출력 피처맵(430)의 1행2열의 픽셀 값(432)인 -3이 결정된다. 최종적으로, 입력 피처맵(410)의 마지막 윈도우인 제16윈도우(413)와 원본 커널(420) 간의 컨볼루션 연산이 수행됨으로써 출력 피처맵(430)의 4행4열의 픽셀 값(533)인 -13이 결정된다.The original kernel 420 performs a convolution operation while sliding in the input feature map 410 in a window size of 3x3. The convolution operation sums all the values obtained by multiplying each pixel value of a certain window of the input feature map 410 and the weight of each element of the corresponding position in the original kernel 420, so that the output feature map 430 Refers to the operation of obtaining the value of each pixel. Specifically, the original kernel 420 first performs a convolution operation with the first window 411 of the input feature map 410. That is, each pixel value 1, 2, 3, 4, 5, 6, 7, 8, 9 of the first window 411 is weight -1, -1, +1 of each element of the original kernel 420, respectively. Multiplied by +1, -1, -1, -1, +1, +1 respectively, resulting in -1, -2, 3, 4, -5, -6, -7, 8, 9 . Next, 3, which is the result of adding all of the obtained values -1, -2, 3, 4, -5, -6, -7, 8, and 9, is calculated, and the pixels in 1 row and 1 column of the output feature map 430 The value 431 is determined as 3. Here, the pixel value 431 of the first row and the first column of the output feature map 430 corresponds to the first window 411. In the same way, the convolution operation between the second window 412 of the input feature map 410 and the original kernel 420 is performed, so that the pixel value 432 of 1 row 2 column of the output feature map 430 is -3. Is decided. Finally, the convolution operation between the 16th window 413, which is the last window of the input feature map 410, and the original kernel 420 is performed, so that the pixel value 533 of 4 rows and 4 columns of the output feature map 430 is − 13 is decided.

즉, 하나의 입력 피처맵(410)과 하나의 원본 커널(420) 간의 컨볼루션 연산은 입력 피처맵(410) 및 원본 커널(420)에서 서로 대응하는 각 엘리먼트의 값들의 곱셈 및 곱셈 결과들의 합산을 반복적으로 수행함으로써 처리될 수 있고, 컨볼루션 연산의 결과로서 출력 피처맵(430)이 생성된다.That is, the convolution operation between one input feature map 410 and one original kernel 420 is multiplied by the values of each element corresponding to each other in the input feature map 410 and the original kernel 420 and the sum of the multiplication results. It can be processed by repeatedly performing, and the output feature map 430 is generated as a result of the convolution operation.

일 실시예에 따르면, 위성 영상의 물체 식별을 위한 뉴럴 네트워크 학습 방법에 의해 학습된 뉴럴 네트워크에 이미지를 입력하고, 학습된 뉴럴 네트워크를 통해 이미지에 상응하는 분류 결과를 출력할 수 있다. 분류 결과는 물체의 위치와 종류를 포함 할 수 있다. 예를 들어, 물체가 선박인지 항공기인지 식별결과를 출력 할 수 있다. 물체 식별을 위한 뉴럴 네트워크를 학습하는 방법에 대해서는 도 5 내지 도 11에서 상세히 후술한다.According to an embodiment, an image may be input to a neural network learned by a neural network learning method for object identification of a satellite image, and a classification result corresponding to the image may be output through the learned neural network. Classification results may include the location and type of object. For example, the identification result of whether the object is a ship or an aircraft may be output. The method of learning the neural network for object identification will be described later in detail with reference to FIGS. 5 to 11.

도 5는 일 실시예에 따른 물체 식별을 위한 뉴럴 네트워크의 구조를 설명하기 위한 블록도이다. 도 5를 참조하면, 뉴럴 네트워크는 물체가 포함된 이미지(Image)를 입력 받아 특징맵(Feature Map)을 추출할 수 있다. 특징맵에 기초하여 물체를 둘러싸는 물체 후보 박스가 생성 될 수 있다. 물체 후보 박스를 중심으로 N배(여기서, N은 1보다 큰 실수) 확장된 영역에 가중치가 적용된 마스크(Mask)를 이용하여 크기가 다른 물체 후보 박스가 생성될 수 있다. 크기가 다른 물체 후보 박스는 제2 물체 후보 박스 일 수 있다. 제2 물체 후보 박스가 RoI 풀링(Region of interest pooling)에 적용되어 분류 특징 벡터 추출부 (Classification Feature Vector Extractor)에 입력될 수 있다. 분류 특징 벡터 추출부에서 출력된 분류 특징 벡터(Classification Feature Vector)를 메트릭 러닝 네트워크(Metric Learning Network)에 입력하여 학습될 수 있다. 5 is a block diagram illustrating a structure of a neural network for object identification according to an embodiment. Referring to FIG. 5, the neural network may receive an image containing an object and extract a feature map. An object candidate box surrounding the object may be generated based on the feature map. Object candidate boxes of different sizes may be generated by using a mask weighted in an N-fold extended region around the object candidate box (where N is a real number greater than 1). The object candidate boxes having different sizes may be second object candidate boxes. The second object candidate box may be applied to RoI pooling (Region of interest pooling) and input to a classification feature vector extractor. The classification feature vector output from the classification feature vector extraction unit may be input to a metric learning network to be learned.

일실시예에 따르면 제2 물체 후보박스가 RoI 풀링에 적용되어 분류 네트워크(Classification Network)에 입력될 수 있다. 물체의 주변 정보가 더 제공 됨으로써, 분류(Classification) 성능이 향상될 수 있다.According to an embodiment, the second object candidate box may be applied to RoI pooling and input to a classification network. By providing the surrounding information of the object, classification performance may be improved.

도 6은 일 실시예에 따른 RoI 풀링을 설명하기 위한 도면이다. 6 is a view for explaining RoI pooling according to an embodiment.

도 6을 참조하면, 기존에는 출력된 특징맵의 관심 물체가 물체 후보 박스에 의해 검출되고, 검출된 물체 후보 박스의 크기가 규명된 그대로 RoI 풀링에 적용되었다. 반면에 본 발명에서는 물체 후보 박스의 n배에 해당되는 크기의 영역을 확장하여 방사형 가중치를 가할 수 있다. 이 과정을 통해 가공된 물체 후보 박스는 RoI 풀링에 적용되고 이후 분류 네트워크에 입력될 수 있다. 이로 인해 네트워크에 물체뿐만 아니라 물체 주변의 정보를 추가로 제공하는 효과가 있어 식별 성능이 향상 될 수 있다. 분류 네트워크는 최종 물체 박스의 분류 점수(Class score)와 위치(Location)를 출력 및 학습할 수 있다.Referring to FIG. 6, the object of interest of the output feature map is detected by the object candidate box, and the size of the detected object candidate box has been applied to RoI pooling as it is identified. On the other hand, in the present invention, an area of a size corresponding to n times the object candidate box can be expanded to apply a radial weight. The object candidate box processed through this process may be applied to RoI pooling and then input into the classification network. As a result, the identification performance may be improved because the network has an effect of additionally providing not only an object but also information around the object. The classification network can output and learn the class score and location of the final object box.

일 실시예에 따르면, 가중치는 중심에서 외부 영역으로 진행할수록 감소될 수 있다. 물체에서 멀어질수록 물체와 관련된 주변 정보의 중요도가 감소할 수 있다. 따라서, 물체와 근접한 중요도가 더 높은 정보들을 이용하여 학습될 수 있다. According to one embodiment, the weight may decrease as it progresses from the center to the outer region. As the distance from the object increases, the importance of surrounding information related to the object may decrease. Therefore, it can be learned using information having higher importance in proximity to an object.

도 7은 마스크를 설명하기 위한 도면이고, 도 8은 일 실시예에 따른 마스크를 설명하기 위한 도면이다. 도 7을 참조하면, 마스크는 행렬일 수 있다. 마스크는 제1 영역(710)과 제2 영역(730)으로 구성될 수 있다. 제1 영역(710)은 기존 RoI 풀링 영역을 의미할 수 있다. 기존 RoI 풀링 영역은 제1 물체 후보박스에 대응될 수 있다. 제2 영역(730)은 실시예에 따른 RoI 풀링 영역일 수 있다.

은 제1 영역(710)의 높이와 너비이다.

는 제2 영역(730)의 높이와 너비이다. 7 is a view for explaining a mask, and FIG. 8 is a view for explaining a mask according to an embodiment. Referring to FIG. 7, the mask may be a matrix. The mask may include a first region 710 and a second region 730. The first region 710 may mean an existing RoI pooling region. The existing RoI pooling area may correspond to the first object candidate box. The second region 730 may be a RoI pooling region according to an embodiment.

Is the height and width of the first region 710.

Is the height and width of the second region 730.

예를 들어, 마스크(mask)

는 수학식1에 의해 계산될 수 있다. For example, a mask

Can be calculated by Equation 1.

[수학식1][Equation 1]

여기서,

는 x축, y축에 대해서

와 임의의 제1 영역내의 좌표간 최단 맨해튼 거리를 의미한다. (

)

는 가중치의 정도를 조절하는 파라미터를 의미한다. 도 7은

인 경우의 예시일 뿐 다른 실수가 적용될 수 있음은 물론이다. here,

Is for the x-axis and y-axis

And the shortest Manhattan distance between coordinates in any first area. (

)

Means a parameter that controls the degree of weighting. Figure 7

Of course, this is only an example, and other mistakes may be applied.

일실시예에 따르면, 마스크의 제1 영역에서 제2 영역으로 진행할수록 가중치가 감소할 수 있다. According to an embodiment, the weight may decrease as the mask progresses from the first area to the second area.

일실시예에 따르면, 마스크 제2 영역의 높이 및 너비가 학습될 수 있다. 마스크에 의해 생성되는 제2 물체 후보 박스의 높이 및 너비가 학습될 수 있다.

는

의 N배(N은 1보다 큰 실수) 일 수 있다. 뉴럴 네트워크는 물체 식별에 적절한 N값을 학습할 수 있다. 이에 따라, 최대의 물체 식별 효율을 갖도록 적절한 물체 주변 정보가 뉴럴 네트워크에 입력 될 수 있다. According to one embodiment, the height and width of the second mask area may be learned. The height and width of the second object candidate box generated by the mask can be learned.

The

Can be N times (N is a real number greater than 1). The neural network can learn N values suitable for object identification. Accordingly, appropriate object surrounding information may be input to the neural network to have maximum object identification efficiency.

도 9는 마스크를 적용한 제2 후보 박스를 설명하기 위한 도면이다. 도 9를 참조하면 상기 제2 물체 후보 박스는 아다마르 곱(Hadamard Product)에 의해 생성 될 수 있다. 제1 물체 후보 박스를 중심으로 N배(여기서, N은 1보다 큰 실수) 확장된 영역 및 마스크의 아다마르 곱에 의해 제2 물체 후보 박스가 생성 될 수 있다. 아다마르 곱은 같은 크기의 두 행렬의 각 성분을 곱하는 연산이다.9 is a view for explaining a second candidate box to which a mask is applied. Referring to FIG. 9, the second object candidate box may be generated by Hadamard Product. A second object candidate box may be generated by an N-fold extension of the first object candidate box (where N is a real number greater than 1) and the Adamar product of the mask. Adammar product is an operation that multiplies each component of two matrices of the same size.

도 10은 일 실시예에 따른 메트릭 러닝을 설명하기 위한 도면이다. 도 10을 참조하면, 벡터1(Vector1)은 Class A인 객체1(Object1)의 분류 특징 벡터이다. 벡터2(Vector2)는 Class A인 객체2(Object2)의 분류 특징 벡터이다. 벡터3(Vector3)은 Class B인 객체3(Object3)의 분류 특징 벡터이다. 메트릭 러닝 이전에는, 같은 세부 분류인 벡터1과 벡터2의 거리가 다른 세부 분류인 벡터1과 벡터3의 거리보다 멀다. 메트릭 러닝 이후에는, 같은 분류인 벡터1과 벡터2의 거리가 가까워 지고, 다른 분류인 벡터3의 거리는 멀어진다.10 is a diagram for explaining metric learning according to an embodiment. Referring to FIG. 10, Vector1 is a classification feature vector of Object1, which is Class A. Vector2 is a classification feature vector of Object2, which is Class A. Vector3 is a classification feature vector of Object3, which is Class B. Prior to metric learning, the distance between vector 1 and vector 2, which are the same sub-category, is greater than the distance between vectors 1 and 3, which are different sub-categories. After metric learning, the distance between vector 1 and vector 2, which are the same classification, becomes closer, and the distance between vector 3, which is another classification, becomes longer.

이처럼, 동일 종류(class)에 속하는 물체들의 분류 특징 벡터들의 거리가 가깝게 되고, 다른 종류(class)에 속하는 물체들의 분류 특징 벡터들의 거리는 멀어지도록 학습할 수 있다. 이를 통해 유사도가 높지만(예를 들어, 선박) 구체적으로 특징이 다른 세부 분류(예를 들어, 선박의 종류)를 더 잘 구분하여 식별 성능을 향상시킬 수 있다. As such, the distance between the classification feature vectors of objects belonging to the same class may be close, and the classification feature vectors of objects belonging to different classes may be learned to be distant. Through this, it is possible to improve the identification performance by better classifying detailed classifications (for example, types of ships) having different similarities (for example, ships) but having different characteristics.

일실시예에 따르면, 메트릭 러닝 네트워크는 메트릭 손실(Metric Loss)을 출력하고, 손실 함수를 계산하는 단계를 더 포함할 수 있다. 손실함수는 CosLoss, ArcLoss, Triplet Loss 등 다양한 손실함수를 적용할 수 있다. According to an embodiment, the metric learning network may further include outputting metric loss and calculating a loss function. Various loss functions such as CosLoss, ArcLoss, and Triplet Loss can be applied.

도 11은 일 실시예에 따른 뉴럴 네트워크 학습 방법을 설명하기 위한 흐름도이다. 도 11에 도시된 뉴럴 네트워크 학습 방법은, 앞서 설명된 도면들에서 설명된 실시 예들에 관련되므로, 이하 생략된 내용이라 할지라도, 앞서 도면들에서 설명된 내용들은 도 11의 방법에도 적용될 수 있다.11 is a flowchart illustrating a neural network learning method according to an embodiment. Since the neural network learning method illustrated in FIG. 11 is related to the embodiments described in the above-described drawings, the contents described in the above drawings may be applied to the method of FIG. 11 even though the contents are omitted below.

도 11을 참조하면, 단계 1110에서, 뉴럴 네트워크가 이미지를 입력 받아 특징맵이 추출된다. 단계 1120에서, 특징맵에 기초하여 물체 후보 박스가 생성된다. 단계 1130에서 제1 물체 후보 박스보다 큰 영역에 가중치를 가하여 제2 물체 후보 박스가 생성된다. 단계 1140에서 제2 물체 후보 박스를 RoI 풀링에 적용하고 분류 특징 벡터 추출부에 입력한다. 단계 1150에서 분류 특징 벡터를 이용하여 메트릭 러닝하여 학습한다.Referring to FIG. 11, in step 1110, the neural network receives an image and a feature map is extracted. In step 1120, an object candidate box is generated based on the feature map. In step 1130, a second object candidate box is generated by applying a weight to a region larger than the first object candidate box. In step 1140, the second object candidate box is applied to RoI pooling and input to the classification feature vector extraction unit. In step 1150, learning is performed by performing metric learning using the classification feature vector.

이상과 같이 본 발명은 비록 한정된 실시 예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시 예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형 가능하다. 그러므로 본 발명의 범위는 실시 예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.As described above, although the present invention has been described by limited embodiments and drawings, the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains have various modifications and variations from these descriptions. It is possible. Therefore, the scope of the present invention should not be limited to the embodiments, and should be determined not only by the claims to be described later, but also by the claims and equivalents.

본 발명의 일 실시 예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비 분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비 분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다. An embodiment of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically include computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism, and includes any information delivery media.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration only, and those skilled in the art to which the present invention pertains will be able to understand that it can be easily modified to other specific forms without changing the technical spirit or essential features of the present invention. . Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and it should be interpreted that all changes or modified forms derived from the meaning and scope of the claims and equivalent concepts thereof are included in the scope of the present invention. do.

Claims

As a neural network learning method for object identification of satellite images,
Receiving an image including an object and extracting a feature map;
Generating a first object candidate box surrounding the object based on the feature map;
Generating a second object candidate box using a mask to which a predetermined weight is applied to an area that is expanded N times (where N is a real number greater than 1) around the first object candidate box;
The second object candidate box is applied to RoI pooling (Region of interest pooling) and input to a classification feature vector extractor; And
A method of learning a neural network for object identification of a satellite image, comprising inputting and learning a classification feature vector output from the classification feature vector extraction unit into a metric learning network.

According to claim 1,
The weight is reduced as it progresses from the center of the mask to the outer region, a neural network learning method for object identification of a satellite image.

According to claim 1,
The second object candidate box is an object of a satellite image, which is generated by an N-times extended region around the first object candidate box (where N is a real number greater than 1) and the Hadamard Product of the mask. Neural network learning method for identification.

According to claim 1,
A method of learning a neural network for object identification of a satellite image, in which height and width of the second object candidate box are learned.

According to claim 1,
The second object candidate box is applied to RoI pooling, further comprising the step of inputting to a classification network, a neural network learning method for object identification of a satellite image.

According to claim 1,
The metric learning network learns to identify objects in a satellite image, such that the distances of the classification feature vectors of objects belonging to the same class are close, and the distances of the classification feature vectors of objects belonging to different classes are increased. Neural network learning method for.

According to claim 1,
The metric learning network outputs metric loss, and calculates a loss function using the metric loss, a neural network learning method for object identification of a satellite image.

As an object identification method using a neural network,
Inputting an image into a neural network learned by the learning method according to any one of claims 1 to 7; And
And outputting a classification result corresponding to the image through the learned neural network.

A recording medium recording a program for executing a method according to any one of claims 1 to 7 on a computer.

As a neural network learning device,
Memory; And
Including a processor,
The processor,
The feature map is extracted by receiving an image, a first object candidate box surrounding the object is generated based on the feature map, and N times the height and width of the first object candidate box ( Here, N is a real object greater than 1) by applying a predetermined weight to a second object candidate box, and the second object candidate box is applied to RoI pooling (Region of interest pooling) to classify feature vectors A neural network learning apparatus input to a classification feature vector extractor and learned by inputting a classification feature vector output from the classification feature vector extractor into a metric learning network.

An object identification device using neural networks,
Memory; And
Including a processor,
The processor,
Using the neural network, which inputs an image into a neural network learned by the learning method according to any one of claims 1 to 6, and outputs a classification result corresponding to the image through the learned neural network. Object identification device.