KR102167011B1

KR102167011B1 - An image traning apparatus extracting hard negative samples being used to training a neural network based on sampling and a threshold adjusting adaptively and a method performed by the image training apparatus

Info

Publication number: KR102167011B1
Application number: KR1020180029380A
Authority: KR
Inventors: 임영철; 강민성
Original assignee: 재단법인대구경북과학기술원
Priority date: 2018-03-13
Filing date: 2018-03-13
Publication date: 2020-10-16
Also published as: KR20190107984A

Abstract

일실시예에 따른 영상 학습 장치는 학습 영상으로부터 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플을 추출할 수 있다. 하드 네거티브 샘플은 객체를 인식한 결과로써 바람직하지 않은 결과를 뉴럴 네트워크에 학습하는데 이용될 수 있다. 하드 네거티브 샘플은 학습 영상으로부터 샘플링된 탐색 영역들 중에서 결정될 수 있다. 영상 학습 장치는 샘플링된 탐색 영역들 각각이 학습 영상의 객체에 대응할 확률인 클래스 스코어를 결정한 다음, 결정된 클래스 스코어에 기초하여 탐색 영역들 중에서, 하드 네거티브 샘플을 결정할 수 있다. 하드 네거티브 샘플 중에서, 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수는 포지티브 샘플의 개수, 클래스 스코어와 비교되는 미리 설정된 임계치 및 포지티브 샘플 및 하드 네거티브 샘플 간의 미리 설정된 비율 중 적어도 하나에 기초하여 결정될 수 있다.The image learning apparatus according to an embodiment may extract a hard negative sample used for learning a neural network from a training image. The hard negative sample can be used to learn an undesirable result from the neural network as a result of recognizing an object. The hard negative sample may be determined among search regions sampled from the training image. The image learning apparatus may determine a class score, which is a probability that each of the sampled search regions corresponds to an object of the training image, and then determine a hard negative sample from among the search regions based on the determined class score. Among the hard negative samples, the number of hard negative samples used for learning of the neural network may be determined based on at least one of the number of positive samples, a preset threshold compared to the class score, and a preset ratio between the positive samples and the hard negative samples. have.

Description

An image learning apparatus for extracting a hard negative sample used to learn a neural network based on sampling and an adaptively changed threshold, and a method performed by the apparatus BASED ON SAMPLING AND A THRESHOLD ADJUSTING ADAPTIVELY AND A METHOD PERFORMED BY THE IMAGE TRAINING APPARATUS}

본 발명은 뉴럴 네트워크를 이용하여 영상에 포함된 객체를 식별하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for identifying an object included in an image using a neural network.

뉴럴 네트워크는 인간의 생물학적 신경 세포의 특성을 수학적 표현에 의해 모델링 한 것으로, 학습 영상에 포함된 객체를 추출하거나 식별하는데 활용될 수 있다. 기계 지도 학습(Machine supervised learning) 방법은 학습 영상 및 학습 영상에 포함된 객체와 관련된 정보를 포함하는 진리 데이터를 이용하여 뉴럴 네트워크를 학습시키는 방법이다. 즉, 뉴럴 네트워크가 진리 데이터를 이용하여 학습됨에 따라, 뉴럴 네트워크가 학습 영상으로부터 객체를 식별한 결과는 학습 영상에 대응하는 진리 데이터에 수렴할 수 있다.The neural network is modeled by mathematical expressions of the characteristics of human biological neurons, and can be used to extract or identify objects included in a learning image. The machine supervised learning method is a method of learning a neural network by using a training image and truth data including information related to an object included in the training image. That is, as the neural network is trained using the truth data, a result of the neural network identifying an object from the training image may converge to the truth data corresponding to the training image.

뉴럴 네트워크를 학습하는 과정에서, 학습 영상의 일부분인 포지티브 샘플 및 네거티브 샘플이 사용될 수 있다. 포지티브 샘플은 객체를 포함하는 학습 영상의 일부분이고, 뉴럴 네트워크는 포지티브 샘플을 이용하여 학습 영상에서 객체가 존재하는 영역을 식별하도록 학습될 수 있다. 네거티브 샘플은 객체를 포함하지 않거나 객체의 일부분을 포함하는 학습 영상의 일부분이고, 뉴럴 네트워크는 네거티브 샘플을 이용하여 학습 영상에서 객체가 존재하지 않는 영상을 식별하도록 학습될 수 있다. 일반적으로, 학습 영상에서 획득되는 네거티브 샘플의 개수가 학습 영상에서 획득되는 포지티브 샘플의 개수보다 훨씬 많을 수 있다. 네거티브 샘플의 개수가 포지티브 샘플의 개수보다 많은 것은 뉴럴 네트워크가 학습하는데 사용되는 데이터의 불균형(imbalance)을 야기할 수 있다.In the process of learning the neural network, positive samples and negative samples, which are part of the training image, may be used. The positive sample is a part of a training image including an object, and the neural network may be trained to identify a region in which an object exists in the training image using the positive sample. The negative sample is a part of a training image that does not include an object or includes a part of the object, and the neural network may be trained to identify an image in which an object does not exist in the training image by using the negative sample. In general, the number of negative samples acquired from the training image may be much larger than the number of positive samples acquired from the training image. If the number of negative samples is greater than the number of positive samples, it may cause imbalance of data used for learning by the neural network.

본 발명은 샘플링 및 적응적으로 변경되는 임계치에 기초하여 뉴럴 네트워크를 학습하는데 이용되는 하드 네거티브 샘플을 추출하는 영상 학습 장치 및 방법을 제안한다.The present invention proposes an image learning apparatus and method for extracting a hard negative sample used to learn a neural network based on sampling and an adaptively changed threshold.

본 발명은 학습 영상을 샘플링하여 획득된 탐색 영역들에 기초하여 하드 네거티브 샘플을 결정하는 영상 학습 장치 및 방법을 제안한다.The present invention proposes an image learning apparatus and method for determining a hard negative sample based on search regions obtained by sampling a training image.

본 발명은 적응적으로 변경되는 임계치에 기초하여 하드 네거티브 샘플을 결정하는데 사용되는 탐색 영역을 선택하는 영상 학습 장치 및 방법을 제안한다.The present invention proposes an image learning apparatus and method for selecting a search area used to determine a hard negative sample based on an adaptively changed threshold.

일실시예에 따르면, 뉴럴 네트워크를 이용한 영상 학습 방법에 있어서, 학습 영상으로부터 복수의 탐색 영역을 샘플링하는 단계, 상기 복수의 탐색 영역 각각이 상기 학습 영상에 포함된 객체와 대응하는 확률인 클래스 스코어를 결정하는 단계, 상기 복수의 탐색 영역 중에서, 미리 설정된 임계치보다 큰 클래스 스코어를 가지는 하드 네거티브 샘플을 식별하는 단계 및 상기 식별된 하드 네거티브 샘플에 기초하여, 상기 뉴럴 네트워크를 학습하는 단계를 포함하고, 상기 임계치는, 상기 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수에 기초하여 조절되는 영상 학습 방법이 제공된다.According to an embodiment, in an image learning method using a neural network, sampling a plurality of search regions from a learning image, wherein each of the plurality of search regions calculates a class score that is a probability corresponding to an object included in the learning image. Determining, among the plurality of search areas, identifying a hard negative sample having a class score greater than a preset threshold, and learning the neural network based on the identified hard negative sample, the An image learning method in which the threshold is adjusted based on the number of hard negative samples used for learning of the neural network is provided.

일실시예에 따르면, 상기 하드 네거티브 샘플을 식별하는 단계는, 상기 복수의 탐색 영역 중에서, 상기 뉴럴 네트워크를 학습하는데 이용되는 포지티브 샘플과 대응하지 않는 탐색 영역 중에서, 상기 하드 네거티브 샘플을 식별하는 영상 학습 방법이 제공된다.According to an embodiment, the step of identifying the hard negative sample comprises: an image learning of identifying the hard negative sample from among the plurality of search regions, among search regions that do not correspond to the positive samples used to learn the neural network. A method is provided.

일실시예에 따르면, 상기 뉴럴 네트워크를 학습하는 단계는, 상기 객체의 위치 및 상기 학습 영상의 모든 영역을 비교하여 결정된 포지티브 샘플을 식별하는 단계 및 상기 식별된 하드 네거티브 샘플 중에서, 클래스 스코어가 큰 하드 네거티브 샘플부터 순차적으로 상기 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플로 선택하는 단계를 포함하고, 상기 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수는, 상기 식별된 포지티브 샘플의 개수 및 포지티브 샘플 및 하드 네거티브 샘플 사이의 미리 설정된 비율 중 적어도 하나에 기초하여 결정되는 영상 학습 방법이 제공된다.According to an embodiment, the learning of the neural network includes: identifying a positive sample determined by comparing a location of the object and all regions of the training image; and, among the identified hard negative samples, a hard class score having a large class score Including the step of sequentially selecting a hard negative sample used for learning the neural network from the negative sample, the number of hard negative samples used for learning the neural network, the number of the identified positive samples and the positive samples and An image learning method is provided that is determined based on at least one of preset ratios between hard negative samples.

일실시예에 따르면, 상기 객체의 위치 및 상기 학습 영상의 모든 영역을 비교하여 결정된 포지티브 샘플의 개수 및 상기 식별된 하드 네거티브 샘플의 개수에 기초하여, 상기 임계치를 변경할지 여부를 결정하는 단계를 더 포함하는 영상 학습 방법이 제공된다.According to an embodiment, the step of determining whether to change the threshold based on the number of positive samples determined by comparing the location of the object and all regions of the training image and the number of identified hard negative samples is further performed. A video learning method including is provided.

일실시예에 따르면, 상기 임계치를 변경할지 여부를 결정하는 단계는, 상기 포지티브 샘플 및 상기 하드 네거티브 샘플 사이의 미리 설정된 비율에 상기 포지티브 샘플의 개수를 적용한 값이 상기 식별된 하드 네거티브 샘플의 개수보다 큰 경우, 상기 임계치를 변경하기로 결정하는 영상 학습 방법이 제공된다.According to an embodiment, the determining whether to change the threshold may include a value obtained by applying the number of positive samples to a preset ratio between the positive sample and the hard negative sample than the number of the identified hard negative samples. In large cases, an image learning method is provided for determining to change the threshold.

일실시예에 따르면, 상기 임계치는, 상기 임계치를 변경하는 경우, 상기 학습 영상과 다른 학습 영상에서, 상기 학습 영상에서 사용된 값 보다 작은 값을 가지는 영상 학습 방법이 제공된다.According to an embodiment, when the threshold is changed, an image learning method having a value smaller than a value used in the training image in a training image different from the training image is provided.

일실시예에 따르면, 상기 샘플링하는 단계는, 미리 설정된 확률에 기초하여 상기 학습 영상 중에서 상기 복수의 탐색 영역을 샘플링하는 영상 학습 방법이 제공된다.According to an embodiment, in the sampling step, an image learning method of sampling the plurality of search regions among the learning images based on a preset probability is provided.

일실시예에 따르면, 학습 영상의 모든 영역을 상기 학습 영상의 객체의 위치와 비교하여, 포지티브 샘플을 결정하는 단계 및 학습 영상을 샘플링한 복수의 영역 각각의 클래스 스코어 및 미리 설정된 임계치에 기초하여, 하드 네거티브 샘플을 결정하는 단계 - 상기 클래스 스코어는 상기 복수의 영역 각각이 상기 객체와 대응할 확률임 - 및 상기 결정된 하드 네거티브 샘플 및 상기 결정된 포지티브 샘플에 기초하여, 뉴럴 네트워크를 학습하는 단계를 포함하고, 상기 임계치는, 상기 하드 네거티브 샘플의 개수 및 상기 포지티브 샘플의 개수를 비교한 결과에 따라 조절되는 영상 학습 방법이 제공된다.According to an embodiment, determining a positive sample by comparing all regions of the learning image with the positions of the objects of the learning image, and based on a class score and a preset threshold of each of a plurality of regions sampled from the learning image, Determining a hard negative sample, wherein the class score is a probability that each of the plurality of regions corresponds to the object, and based on the determined hard negative sample and the determined positive sample, learning a neural network, An image learning method in which the threshold is adjusted according to a result of comparing the number of hard negative samples and the number of positive samples is provided.

일실시예에 따르면, 상기 포지티브 샘플을 결정하는 단계는, 상기 학습 영상에 대응하는 진리 데이터에 기초하여, 상기 객체의 위치를 식별하는 단계 및 상기 학습 영상의 모든 영역 각각이 상기 식별된 객체의 위치와 중첩되는 정도가 임계치 이상인지 여부에 기초하여, 상기 학습 영상의 모든 영역 중에서 상기 포지티브 샘플을 선택하는 단계를 포함하는 영상 학습 방법이 제공된다.According to an embodiment, the determining of the positive sample includes identifying a location of the object based on truth data corresponding to the training image, and each of all regions of the training image is a location of the identified object. An image learning method comprising the step of selecting the positive sample from all regions of the training image based on whether or not the degree of overlap with is equal to or greater than a threshold value is provided.

일실시예에 따르면, 상기 하드 네거티브 샘플을 결정하는 단계는, 상기 학습 영상을 샘플링한 복수의 영역 중에서, 상기 객체의 위치와 중첩되는 정도가 임계치 이하인지 여부에 기초하여 네거티브 샘플을 식별하는 단계 및 상기 식별된 네거티브 샘플 중에서, 상기 임계치보다 큰 클래스 스코어를 가지는 네거티브 샘플을 상기 하드 네거티브 샘플로 결정하는 단계를 포함하는 영상 학습 방법이 제공된다.According to an embodiment, the determining of the hard negative sample includes identifying a negative sample based on whether a degree of overlapping with a position of the object is less than or equal to a threshold, among a plurality of regions sampled from the training image; and An image learning method comprising the step of determining a negative sample having a class score greater than the threshold value from among the identified negative samples as the hard negative sample is provided.

일실시예에 따르면, 상기 학습 영상에서 샘플링된 복수의 영역은, 상기 임계치 이하의 클래스 스코어를 가지는 소프트 네거티브 샘플 및 상기 임계치보다 큰 클래스 스코어를 가지는 하드 네거티브 샘플을 포함하는 영상 학습 방법이 제공된다.According to an embodiment, there is provided a method of learning an image including a plurality of regions sampled from the training image including a soft negative sample having a class score less than or equal to the threshold value and a hard negative sample having a class score greater than the threshold value.

일실시예에 따르면, (1) 상기 복수의 영역 중에서, 상기 임계치보다 큰 클래스 스코어를 가지는 영역의 개수 및 (2) 포지티브 샘플 및 하드 네거티브 샘플 사이의 미리 설정된 비율 및 상기 포지티브 샘플의 개수에 기초하여 계산된 하드 네거티브 샘플의 목표치를 비교하여, 상기 임계치를 변경할지 여부를 결정하는 단계를 더 포함하는 영상 학습 방법이 제공된다.According to an embodiment, based on (1) the number of regions having a class score greater than the threshold, among the plurality of regions, and (2) a preset ratio between positive samples and hard negative samples and the number of positive samples. There is provided an image learning method further comprising the step of determining whether to change the threshold value by comparing target values of the calculated hard negative samples.

일실시예에 따르면, 상기 임계치를 변경할지 여부를 결정하는 단계는, 상기 목표치가 상기 임계치보다 큰 클래스 스코어를 가지는 영역의 개수 보다 큰 경우, 상기 임계치를 보다 작은 값으로 변경하기로 결정하는 영상 학습 방법이 제공된다.According to an embodiment, the determining whether to change the threshold value comprises: when the target value is greater than the number of regions having a class score greater than the threshold value, image learning of determining to change the threshold value to a smaller value A method is provided.

일실시예에 따르면, 상기 하드 네거티브 샘플을 결정하는 단계는, 상기 목표치가 상기 임계치보다 큰 클래스 스코어를 가지는 영역의 개수 보다 큰 경우, 상기 복수의 영역 중에서, 상기 임계치보다 큰 클래스 스코어를 가지는 하나 이상의 영역을 상기 하드 네거티브 샘플로 결정하는 영상 학습 방법이 제공된다.According to an embodiment, the determining of the hard negative sample comprises, when the target value is greater than the number of regions having a class score greater than the threshold value, one or more regions having a class score greater than the threshold value, among the plurality of regions. An image learning method for determining a region as the hard negative sample is provided.

일실시예에 따르면, 상기 하드 네거티브 샘플을 결정하는 단계는, 상기 목표치가 상기 임계치보다 큰 클래스 스코어를 가지는 영역의 개수 보다 작은 경우, 상기 복수의 영역 중에서, 가장 큰 클래스 스코어를 가지는 영역부터 내림차순으로 상기 목표치만큼 영역을 추출하는 단계 및 상기 추출된 영역들을 상기 하드 네거티브 샘플로 결정하는 단계를 포함하는 영상 학습 방법이 제공된다.According to an embodiment, the determining of the hard negative sample may include, when the target value is smaller than the number of regions having a class score greater than the threshold value, from among the plurality of regions, the region having the largest class score, in descending order. There is provided an image learning method including extracting an area as much as the target value and determining the extracted areas as the hard negative sample.

일실시예에 따르면, 뉴럴 네트워크를 이용한 영상 인식 방법에 있어서, 입력 영상을 식별하는 단계, 상기 입력 영상을 상기 뉴럴 네트워크에 입력하는 단계 및 상기 뉴럴 네트워크의 출력에 기초하여, 상기 입력 영상에 포함된 객체를 인식하는 단계를 포함하고, 상기 뉴럴 네트워크는, 학습 영상에서 샘플링된 복수의 탐색 영역 중에서, 미리 설정된 임계치에 기초하여 선택된 하나 이상의 하드 네거티브 샘플을 사전에 학습하고, 상기 임계치는, 상기 학습 영상의 모든 영역 및 상기 객체의 위치를 비교하여 결정된 포지티브 샘플의 개수 및 상기 선택된 하드 네거티브 샘플의 개수 중 적어도 하나에 기초하여 조절되는 영상 인식 방법이 제공된다.According to an embodiment, in the image recognition method using a neural network, identifying an input image, inputting the input image to the neural network, and based on the output of the neural network, included in the input image Recognizing an object, wherein the neural network learns in advance one or more hard negative samples selected based on a preset threshold from among a plurality of search regions sampled from the training image, and the threshold value is the training image An image recognition method is provided that is adjusted based on at least one of the number of positive samples determined by comparing all regions of and the position of the object and the number of the selected hard negative samples.

일실시예에 따르면, 상기 하드 네거티브 샘플은, 상기 복수의 탐색 영역 중에서, 상기 포지티브 샘플에 대응하는 탐색 영역을 제외한 탐색 영역인 네거티브 샘플 중에서 상기 임계치보다 큰 클래스 스코어를 가지는 탐색 영역이고, 상기 클래스 스코어는, 상기 복수의 탐색 영역 각각이 상기 객체와 대응할 확률인 영상 인식 방법이 제공된다.According to an embodiment, the hard negative sample is a search region having a class score greater than the threshold among negative samples, which is a search region excluding a search region corresponding to the positive sample, among the plurality of search regions, and the class score A method of recognizing an image is provided in which the probability that each of the plurality of search regions corresponds to the object.

일실시예에 따르면, 샘플링 및 적응적으로 변경되는 임계치에 기초하여 뉴럴 네트워크를 학습하는데 이용되는 하드 네거티브 샘플을 추출할 수 있다.According to an embodiment, a hard negative sample used to learn a neural network may be extracted based on sampling and a threshold value that is adaptively changed.

일실시예에 따르면, 학습 영상을 샘플링하여 획득된 탐색 영역들에 기초하여 하드 네거티브 샘플을 결정할 수 있다.According to an embodiment, a hard negative sample may be determined based on search regions obtained by sampling a training image.

일실시예에 따르면, 적응적으로 변경되는 임계치에 기초하여 하드 네거티브 샘플을 결정하는데 사용되는 탐색 영역을 선택할 수 있다.According to an embodiment, a search area used to determine a hard negative sample may be selected based on an adaptively changed threshold.

도 1은 일실시예에 따른 영상 학습 장치가 학습 영상을 이용하여 뉴럴 네트워크를 학습하는 동작을 설명하기 위한 흐름도이다.
도 2는 일실시예에 따른 영상 학습 장치가 뉴럴 네트워크를 학습하는데 사용하는 포지티브 샘플 및 하드 네거티브 샘플을 설명하기 위한 예시적인 도면이다.
도 3은 일실시예에 따른 영상 학습 장치가 복수의 탐색 영역을 정렬하는 동작을 설명하기 위한 예시적인 도면이다.
도 4는 일실시예에 따른 영상 학습 장치가 복수의 탐색 영역 중에서 하드 네거티브 샘플을 선택하는 동작을 설명하기 위한 예시적인 도면이다.
도 5는 일실시예에 따른 영상 학습 장치에 의해 학습된 뉴럴 네트워크를 이용하여 입력 영상에 존재하는 객체를 인식하는 동작을 설명하기 위한 흐름도이다.
도 6은 일실시예에 따른 영상 학습 장치의 구조를 설명하기 위한 도면이다.1 is a flowchart illustrating an operation of learning a neural network using a training image by an image learning apparatus according to an exemplary embodiment.
2 is an exemplary diagram for explaining a positive sample and a hard negative sample used by an image learning apparatus to learn a neural network according to an embodiment.
3 is an exemplary diagram for explaining an operation of aligning a plurality of search areas by an image learning apparatus according to an exemplary embodiment.
FIG. 4 is an exemplary diagram for explaining an operation of selecting a hard negative sample from among a plurality of search areas by an image learning apparatus according to an exemplary embodiment.
5 is a flowchart illustrating an operation of recognizing an object present in an input image using a neural network learned by an image learning apparatus according to an exemplary embodiment.
6 is a diagram illustrating a structure of an image learning apparatus according to an exemplary embodiment.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are exemplified only for the purpose of describing the embodiments according to the concept of the present invention, and embodiments according to the concept of the present invention They may be implemented in various forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Since the embodiments according to the concept of the present invention can apply various changes and have various forms, the embodiments will be illustrated in the drawings and described in detail herein. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes changes, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various elements, but the elements should not be limited by the terms. The above terms are only for the purpose of distinguishing one component from other components, for example, without departing from the scope of rights according to the concept of the present invention, the first component may be named as the second component, Similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 “연결되어” 있다거나 “접속되어” 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 “직접 연결되어” 있다거나 “직접 접속되어” 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 “~사이에”와 “바로~사이에” 또는 “~에 직접 이웃하는” 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected” or “connected” to another component, it is understood that it may be directly connected or connected to the other component, but other components may exist in the middle. Should be. On the other hand, when a component is referred to as being “directly connected” or “directly connected” to another component, it should be understood that there is no other component in the middle. Expressions that describe the relationship between components, for example, “between” and “just between” or “directly adjacent to” should be interpreted as well.

본 명세서에서 사용한 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, “포함하다” 또는 “가지다” 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present specification are only used to describe specific embodiments and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present specification, terms such as "comprise" or "have" are intended to designate that the specified features, numbers, steps, actions, components, parts, or combinations thereof exist, but one or more other features or numbers, It is to be understood that the presence or addition of steps, actions, components, parts or combinations thereof does not preclude the possibility of preliminary exclusion.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art to which the present invention belongs. Terms as defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in this specification. Does not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. The same reference numerals in each drawing indicate the same members.

도 1은 일실시예에 따른 영상 학습 장치가 학습 영상을 이용하여 뉴럴 네트워크를 학습하는 동작을 설명하기 위한 흐름도이다. 영상 학습 장치에 의해 학습된 뉴럴 네트워크는 학습 영상 또는 학습 영상을 제외한 다른 영상에 포함된 객체를 인식하는데 사용될 수 있다.1 is a flowchart illustrating an operation of learning a neural network using a training image by an image learning apparatus according to an exemplary embodiment. The neural network learned by the image learning apparatus may be used to recognize a training image or an object included in another image other than the training image.

도 1을 참고하면, 단계(101)에서, 일실시예에 따른 영상 학습 장치는 뉴럴 네트워크를 학습하는데 사용할 학습 영상을 식별할 수 있다. 학습 영상은 영상 학습 장치와 연결된 네트워크(예를 들어, 인터넷) 또는 영상 학습 장치와 연결된 다른 전자 장치(예를 들어, 이미지 센서를 포함하는 카메라, 스마트폰 등)로부터 전송될 수 있다. 영상 학습 장치는 네트워크 또는 다른 전자 장치에서 전송되어 영상 학습 장치의 메모리에 저장된 학습 영상을 식별할 수 있다. 학습 영상은 피사체(즉, 객체)가 존재하는 영역 및 피사체가 존재하지 않는 영역(예를 들어, 배경 또는 상기 피사체와 구분되는 다른 피사체)으로 구분될 수 있다.Referring to FIG. 1, in step 101, the image learning apparatus according to an exemplary embodiment may identify a training image to be used for learning a neural network. The learning image may be transmitted from a network (eg, the Internet) connected to the video learning device or another electronic device (eg, a camera including an image sensor, a smartphone, etc.) connected to the video learning device. The image learning device may identify a training image transmitted from a network or another electronic device and stored in a memory of the image learning device. The training image may be divided into an area in which a subject (ie, an object) exists and an area in which the subject does not exist (eg, a background or another subject separated from the subject).

도 1을 참고하면, 단계(102)에서, 일실시예에 따른 영상 학습 장치는 학습 영상의 일부분인 탐색 영역을 샘플링할 수 있다. 영상 학습 장치는 학습 영상의 일부분을 분할하여 탐색 영역을 결정할 수 있다. 영상 학습 장치는 미리 설정된 확률(예를 들어, 1/k (k는 1 보다 큰 실수))에 기초하여 학습 영상으로부터 탐색 영역을 샘플링할 수 있다. 탐색 영역은 상기 확률에 기초하여 학습 영상으로부터 하나 이상 선택될 수 있다. 탐색 영역이 복수 개 샘플링되는 경우, 탐색 영역들의 크기는 서로 다를 수 있다.Referring to FIG. 1, in step 102, the image learning apparatus according to an embodiment may sample a search area that is a part of a training image. The image learning apparatus may determine a search area by dividing a portion of the training image. The image learning apparatus may sample the search area from the training image based on a preset probability (eg, 1/k (k is a real number greater than 1)). One or more search regions may be selected from the training image based on the probability. When a plurality of search regions are sampled, the sizes of the search regions may be different.

도 1을 참고하면, 단계(103)에서, 일실시예에 따른 영상 학습 장치는 선택된 하나 이상의 탐색 영역이 학습 영상에 포함된 객체와 대응하는 확률인 클래스 스코어를 결정할 수 있다. 즉, 클래스 스코어는 대응하는 탐색 영역이 객체에 대응할 확률을 나타낸 값일 수 있다. 탐색 영역이 복수 개 샘플링된 경우, 영상 학습 장치가 클래스 스코어를 결정하는 것은 탐색 영역들 각각에 대하여 수행될 수 있다. 탐색 영역은 학습 영상의 일부분을 샘플링한 것이므로, 영상 학습 장치는 학습 영상의 모든 영역에서 클래스 스코어를 계산하지 않을 수 있다. 따라서, 영상 학습 장치가 클래스 스코어를 계산하는데 필요한 계산량이 줄어들 수 있다.Referring to FIG. 1, in step 103, the image learning apparatus according to an exemplary embodiment may determine a class score, which is a probability corresponding to an object included in the training image in which one or more selected search regions correspond. That is, the class score may be a value indicating a probability that a corresponding search area corresponds to an object. When a plurality of search regions are sampled, the image learning apparatus may determine a class score for each of the search regions. Since the search area is a sample of a part of the training image, the video learning apparatus may not calculate a class score in all areas of the training image. Accordingly, the amount of calculation required for the video learning apparatus to calculate the class score can be reduced.

도 1을 참고하면, 단계(104)에서, 일실시예에 따른 영상 학습 장치는 하나 이상의 탐색 영역 중에서, 미리 설정된 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역을 선택하여 저장할 수 있다. 임계치(θ)는 0 이상 1 이하의 실수로써, 예를 들어, 초기값은 0.5 내지 0.9 사이의 실수로 결정될 수 있다. 임계치(θ)는 서로 다른 학습 영상을 이용하여 뉴럴 네트워크를 학습할 때마다 적응적으로 변경될 수 있다. 영상 학습 장치는 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역 중에서, 네거티브 샘플만을 선택할 수 있다. 네거티브 샘플은 객체를 포함하지 않거나 객체의 일부분을 포함하는 학습 영상의 일부분이고, 객체가 네거티브 샘플에 포함되어 있지 않음을 뉴럴 네트워크에 학습시키는데 이용될 수 있다. 단계(102)에서, 영상 학습 장치가 복수의 탐색 영역을 선택한 경우, 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역이 하나 이상 선택될 수 있다.Referring to FIG. 1, in step 104, the image learning apparatus according to an embodiment may select and store a search region having a class score greater than a preset threshold θ from among one or more search regions. The threshold value θ is a real number of 0 or more and 1 or less, and, for example, the initial value may be determined as a real number between 0.5 and 0.9. The threshold value θ may be adaptively changed every time a neural network is trained using different training images. The image learning apparatus may select only negative samples from among the search regions having a class score greater than the threshold value θ. The negative sample is a part of a training image that does not include an object or includes a part of the object, and may be used to learn to the neural network that the object is not included in the negative sample. In step 102, when the image learning apparatus selects a plurality of search regions, one or more search regions having a class score greater than the threshold value θ may be selected.

네거티브 샘플은 임계치(θ) 보다 큰 클래스 스코어를 가지는 하드 네거티브 샘플 및 임계치(θ) 이하의 클래스 스코어를 가지는 소프트 네거티브 샘플로 구분될 수 있다. 따라서, 영상 학습 장치가 단계(104)에 의해 저장하는 탐색 영역은 하드 네거티브 샘플에 대응하는 탐색 영역일 수 있다. The negative sample may be classified into a hard negative sample having a class score greater than the threshold value θ and a soft negative sample having a class score less than the threshold value θ. Accordingly, the search region stored by the image learning apparatus in step 104 may be a search region corresponding to the hard negative sample.

도 1을 참고하면, 단계(105)에서, 일실시예에 따른 영상 학습 장치는 저장된 탐색 영역을 그룹핑할 수 있다. 더 나아가서, 복수의 탐색 영역이 선택되어 저장된 경우, 영상 학습 장치는 저장된 복수의 탐색 영역을 대응하는 클래스 스코어 순서에 기초하여 정렬할 수 있다. 예를 들어, 영상 학습 장치에 저장된 복수의 탐색 영역은 클래스 스코어의 내림차순으로 정렬될 수 있다.Referring to FIG. 1, in step 105, the image learning apparatus according to an embodiment may group stored search regions. Furthermore, when a plurality of search regions are selected and stored, the image learning apparatus may arrange the stored search regions based on a corresponding class score order. For example, a plurality of search areas stored in the video learning apparatus may be arranged in descending order of class scores.

도 1을 참고하면, 단계(106)에서, 일실시예에 따른 영상 학습 장치는 학습 영상의 일부분으로써, 학습 영상에서 객체가 존재하는 영역을 뉴럴 네트워크에 학습하는데 이용되는 포지티브 샘플을 식별할 수 있다. 바꾸어 말하면, 포지티브 샘플은 객체를 포함하는 학습 영상의 일부분이고, 객체가 포지티브 샘플에 포함되어 있음을 뉴럴 네트워크에 학습시키는데 이용될 수 있다. 영상 학습 장치는 학습 영상의 모든 영역을 탐색하여, 하나 이상의 포지티브 샘플을 식별할 수 있다. 보다 구체적으로, 영상 학습 장치는 객체의 위치 및 학습 영상의 모든 영역을 비교하여, 하나 이상의 포지티브 샘플을 결정할 수 있다. 이하에서는 영상 학습 장치가 N_p개의 포지티브 샘플을 식별한 것으로 가정한다.Referring to FIG. 1, in step 106, the image learning apparatus according to an exemplary embodiment may identify a positive sample used to learn a region in which an object exists in the training image from the neural network as a part of the training image. . In other words, the positive sample is a part of the training image including the object, and may be used to learn to the neural network that the object is included in the positive sample. The image learning apparatus may identify one or more positive samples by searching all regions of the training image. More specifically, the image learning apparatus may determine one or more positive samples by comparing the location of the object and all regions of the training image. Hereinafter, it is assumed that the image learning apparatus has identified N _p positive samples.

단계(104)에서 저장되는 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역은, 복수의 탐색 영역 중에서 포지티브 샘플에 대응하는 탐색 영역을 제외한 나머지 탐색 영역 중에서 선택될 수 있다. 즉, 학습 영상에서 샘플링된 탐색 영역은 하드 네거티브 샘플만을 추출하는데 이용될 수 있다. 포지티브 샘플이 학습 영상의 모든 영역을 탐색하여 결정되지만, 네거티브 샘플은 학습 영상을 샘플링한 탐색 영역에 기초하여 결정될 수 있다. 따라서, 네거티브 샘플을 결정하는데 소요되는 시간 및 연산량이 절감될 수 있다.The search region having a class score greater than the threshold value θ stored in step 104 may be selected from the remaining search regions excluding the search region corresponding to the positive sample from among the plurality of search regions. That is, the search region sampled from the training image may be used to extract only hard negative samples. The positive sample is determined by searching all regions of the training image, but the negative sample may be determined based on the search region sampled from the training image. Accordingly, the time and computation amount required to determine a negative sample can be reduced.

도 1을 참고하면, 단계(107)에서, 일실시예에 따른 영상 학습 장치는 포지티브 샘플의 개수에 기초하여, 뉴럴 네트워크의 학습에 사용할 하드 네거티브 샘플의 개수 N_n을 결정할 수 있다. 포지티브 샘플의 개수 N_p 및 하드 네거티브 샘플의 개수 N_n 사이의 비율이 미리 설정될 수 있다. 이 경우, 하드 네거티브 샘플의 개수 N_n은 미리 설정된 비율 및 단계(106)에서 식별된 포지티브 샘플의 개수 N_p에 기초하여 결정될 수 있다. 예를 들어, 포지티브 샘플의 개수 N_p 및 하드 네거티브 샘플의 개수 N_n 사이의 비율이 1:m으로 사전에 결정된 경우, 하드 네거티브 샘플의 개수 N_n은 mN_p가 될 수 있다. 이 경우, 하드 네거티브 샘플의 개수 N_n은 포지티브 샘플의 개수 N_p에 비례할 수 있다.Referring to FIG. 1, in step 107, the image learning apparatus according to an embodiment may determine the number N _n of hard negative samples to be used for learning a neural network based on the number of positive samples. A ratio between the number of positive samples N _p and the number of hard negative samples N _n may be preset. In this case, the number of hard negative samples N _n may be determined based on a preset ratio and the number N _p of positive samples identified in step 106. For example, when the ratio between the number of positive samples N _p and the number of hard negative samples N _n is previously determined as 1:m, the number N _n of hard negative samples may be mN _p . In this case, the number N _n of hard negative samples may be proportional to the number N _p of positive samples.

도 1을 참고하면, 단계(108)에서, 일실시예에 따른 영상 학습 장치는 단계(105)에서 그룹핑된 탐색 영역의 개수를, 단계(107)에서 결정된 하드 네거티브 샘플의 개수 N_n과 비교할 수 있다. 영상 학습 장치는 그룹핑된 탐색 영역들 중에서, 클래스 스코어 및 하드 네거티브 샘플의 개수 N_n에 기초하여, 뉴럴 네트워크를 학습하는데 사용할 하드 네거티브 샘플을 선택할 수 있다. 즉, N_n은 뉴럴 네트워크를 학습하는데 사용할 하드 네거티브 샘플의 목표치일 수 있다.Referring to FIG. 1, in step 108, the image learning apparatus according to an embodiment may compare the number of search regions grouped in step 105 with the number of hard negative samples N _n determined in step 107. have. The image learning apparatus may select a hard negative sample to be used to learn a neural network from among the grouped search regions, based on the class score and the number of hard negative samples N _n . That is, N _n may be a target value of a hard negative sample to be used for learning a neural network.

보다 구체적으로, 그룹핑된 탐색 영역의 개수가 하드 네거티브 샘플의 목표치 Nn 보다 큰 경우, 단계(109)에서, 일실시예에 따른 영상 학습 장치는 그룹핑된 탐색 영역들 각각에 대응하는 클래스 스코어에 기초하여, 그룹핑된 탐색 영역들 중에서 하드 네거티브 샘플을 선택할 수 있다. 하드 네거티브 샘플로 선택되는 탐색 영역의 개수는 하드 네거티브 샘플의 목표치 Nn에 대응할 수 있다.More specifically, when the number of grouped search regions is greater than the target value Nn of the hard negative sample, in step 109, the video learning apparatus according to an embodiment is based on a class score corresponding to each of the grouped search regions. , It is possible to select a hard negative sample from among the grouped search areas. The number of search regions selected as the hard negative samples may correspond to the target value Nn of the hard negative samples.

예를 들어, 영상 학습 장치는 상대적으로 높은 클래스 스코어를 가지는 Nn개의 탐색 영역들을 하드 네거티브 샘플로 선택할 수 있다. 저장된 복수의 탐색 영역이 클래스 스코어 순서에 기초하여 정렬된 경우, 예를 들어, 복수의 탐색 영역이 클래스 스코어의 내림차순으로 정렬되어 저장된 경우, 영상 학습 장치는 첫번째로 저장된 탐색 영역부터 Nn번째의 탐색 영역까지 총 Nn개의 탐색 영역들을 하드 네거티브 샘플로 선택할 수 있다.For example, the image learning apparatus may select Nn search regions having a relatively high class score as hard negative samples. When a plurality of stored search areas are sorted based on the class score order, for example, when the plurality of search areas are arranged and stored in descending order of the class score, the video learning apparatus is the Nn-th search area from the first stored search area. Up to Nn search regions can be selected as hard negative samples.

그룹핑된 탐색 영역의 개수가 하드 네거티브 샘플의 목표치 Nn 보다 작은 경우, 단계(110)에서, 일실시예에 따른 영상 학습 장치는 탐색 영역을 선택하는데 사용된 임계치(θ)를 조절할 수 있다. 임계치(θ)는 미리 설정된 실수 α가 적용된 α × θ에 대응하여 변경될 수 있다. 예를 들어, 영상 학습 장치는 다음 학습 영상을 학습할 때에 보다 많은 수의 하드 네거티브 샘플을 선택할 수 있도록, 임계치(θ)의 크기를 줄일 수 있다. 이 경우, 임계치(θ)에 적용되는 α는 1 보다 작은 양의 실수(예를 들어, 0.99)일 수 있다. 따라서, 다음 학습 영상에 대하여, 영상 학습 장치는 보다 많은 숫자의 하드 네거티브 샘플을 뉴럴 네트워크를 학습하는데 이용할 수 있다.When the number of grouped search regions is less than the target value Nn of the hard negative sample, in step 110, the image learning apparatus according to an embodiment may adjust the threshold value θ used to select the search region. The threshold value θ may be changed corresponding to α × θ to which a preset real number α is applied. For example, the image learning apparatus may reduce the size of the threshold value θ so that a larger number of hard negative samples may be selected when learning the next training image. In this case, α applied to the threshold value θ may be a positive real number less than 1 (eg, 0.99). Accordingly, for the next training image, the image learning apparatus may use a larger number of hard negative samples to learn the neural network.

그룹핑된 탐색 영역의 개수가 하드 네거티브 샘플의 목표치 Nn 보다 작은 경우, 단계(111)에서, 일실시예에 따른 영상 학습 장치는 단계(105)에서 그룹핑된 탐색 영역 전부를 하드 네거티브 샘플로 선택할 수 있다. 단계(110) 및 단계(111)의 순서는 도 1에 도시된 바와 다를 수 있으며, 영상 학습 장치는 단계(110) 및 단계(111)를 독립적으로 수행할 수 있다.When the number of grouped search regions is smaller than the target value Nn of the hard negative samples, in step 111, the image learning apparatus according to an embodiment may select all of the search regions grouped in step 105 as hard negative samples. . The order of steps 110 and 111 may be different from that shown in FIG. 1, and the image learning apparatus may independently perform steps 110 and 111.

도 1을 참고하면, 단계(112)에서, 일실시예에 따른 영상 학습 장치는 선택된 하드 네거티브 샘플 및 식별된 포지티브 샘플을 이용하여 뉴럴 네트워크를 학습시킬 수 있다. 따라서, 뉴럴 네트워크는 포지티브 샘플에 기초하여 학습 영상에서 객체가 존재하는 영역을 식별하도록 학습될 수 있다. 또한, 뉴럴 네트워크는 하드 네거티브 샘플에 기초하여 학습 영상에서 객체가 존재하지 않는 영역을 식별할 수 있다. 상술한 바와 같이, 하드 네거티브 샘플은 임계치(θ) 보다 큰 클래스 스코어를 가지는 네거티브 샘플이므로, 뉴럴 네트워크가 객체를 인식한 바람직하지 않은 결과를 학습하는 데에 이용될 수 있다.Referring to FIG. 1, in step 112, the image learning apparatus according to an embodiment may train a neural network using a selected hard negative sample and an identified positive sample. Accordingly, the neural network may be trained to identify a region in which an object exists in the training image based on the positive sample. Also, the neural network may identify a region in which an object does not exist in the training image based on the hard negative sample. As described above, since the hard negative sample is a negative sample having a class score greater than the threshold value θ, it can be used to learn an undesirable result of recognizing an object by the neural network.

뉴럴 네트워크를 학습하는데 이용되는 하드 네거티브 샘플은 임계치(θ) 이상의 클래스 스코어를 가지는 학습 영상의 일부분일 수 있다. 즉, 뉴럴 네트워크를 학습하는데 이용되는 하드 네거티브 샘플은 학습 영상으로부터 추출될 수 있는 복수의 네거티브 샘플들 중에서, 뉴럴 네트워크가 객체를 정확하게 인식한 것으로 판단하기 상대적으로 어려운 네거티브 샘플일 수 있다. 일실시예에 따른 영상 학습 장치가 하드 네거티브 샘플만을 선택하여 뉴럴 네트워크를 학습시키기 때문에, 뉴럴 네트워크는 보다 정확하게 객체를 식별하는데 이용될 수 있다.The hard negative sample used to train the neural network may be a part of a training image having a class score equal to or greater than the threshold value θ. That is, the hard negative sample used to learn the neural network may be a negative sample that is relatively difficult to determine that the neural network correctly recognized the object among a plurality of negative samples that can be extracted from the training image. Since the image learning apparatus according to an embodiment trains a neural network by selecting only a hard negative sample, the neural network may be used to more accurately identify an object.

또한, 포지티브 샘플의 개수 및 네거티브 샘플의 개수가 상이한 것에 따른 불균형에 대하여, 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수가 최대 N_n 개로 제한되어, 뉴럴 네트워크를 학습하는데 소요되는 시간이 절감될 수 있다. 더 나아가서, 학습 영상의 모든 영역의 클래스 스코어를 계산하는 대신에, 하드 네거티브 샘플이 미리 설정된 확률에 기초하여 샘플링된 학습 영상의 일부분(예를 들어, 단계(102)에서 샘플링된 탐색 영역들)의 클래스 스코어를 계산하여 결정되므로, 하드 네거티브 샘플을 결정하는데 소요되는 시간 및 연산량이 절감될 수 있다.In addition, for the imbalance due to the difference in the number of positive samples and the number of negative samples, the number of hard negative samples used for learning a neural network is limited to a maximum of N _n , so that the time required to learn a neural network can be reduced. I can. Furthermore, instead of calculating the class scores of all regions of the learning image, a hard negative sample is used for a portion of the sampled training image (e.g., the search regions sampled in step 102). Since it is determined by calculating the class score, it is possible to reduce the amount of time and computation required to determine the hard negative sample.

도 2는 일실시예에 따른 영상 학습 장치가 뉴럴 네트워크를 학습하는데 사용하는 포지티브 샘플(220, 230) 및 하드 네거티브 샘플(210)을 설명하기 위한 예시적인 도면이다.FIG. 2 is an exemplary diagram for explaining positive samples 220 and 230 and hard negative samples 210 used by an image learning apparatus to learn a neural network according to an embodiment.

영상 학습 장치는 학습 영상(200)을 미리 설정된 확률에 기초하여 샘플링하여 획득한 탐색 영역으로부터, 뉴럴 네트워크를 학습하는데 이용될 하드 네거티브 샘플(210)을 결정할 수 있다. 도 2를 참고하면, 하드 네거티브 샘플(210)은 학습 영상(200)의 배경을 포함할 수 있다. 또는, 하드 네거티브 샘플(210)은 뉴럴 네트워크를 통해 인식하고자 하는 객체를 제외한 다른 객체를 포함할 수 있다. 하드 네거티브 샘플(210)은 임계치(θ) 이상이거나 또는 임계치(θ) 보다 큰 클래스 스코어를 가지는 학습 영상의 일부분으로 결정되기 때문에, 뉴럴 네트워크에 의해 객체가 존재하는 영역으로 결정되는 것이 상대적으로 용이한 일부분일 수 있다. The image learning apparatus may determine a hard negative sample 210 to be used to learn a neural network from a search area obtained by sampling the training image 200 based on a preset probability. Referring to FIG. 2, the hard negative sample 210 may include a background of the training image 200. Alternatively, the hard negative sample 210 may include an object other than an object to be recognized through a neural network. Since the hard negative sample 210 is determined as a part of the training image having a class score greater than the threshold value θ or greater than the threshold value θ, it is relatively easy to determine the area in which the object exists. May be part.

영상 학습 장치는 학습 영상(200)의 모든 영역에서 결정된 클래스 스코어에 기초하여, 포지티브 샘플(220, 230)을 결정할 수 있다. 영상 학습 장치가 학습 영상(200)에 대응하는 진리 데이터(240)를 식별한 경우, 포지티브 샘플(220, 230)은 진리 데이터(240)를 고려하여 결정될 수 있다. 진리 데이터(240)는 뉴럴 네트워크를 통해 인식하고자 하는 객체의 학습 영상(200)에서의 위치를 표시한 정보(예를 들어, 좌표 정보)를 포함할 수 있다.The image learning apparatus may determine the positive samples 220 and 230 based on the class score determined in all areas of the training image 200. When the image learning apparatus identifies the truth data 240 corresponding to the training image 200, the positive samples 220 and 230 may be determined in consideration of the truth data 240. The truth data 240 may include information (eg, coordinate information) indicating a position in the training image 200 of an object to be recognized through a neural network.

예를 들어, 영상 학습 장치는 학습 영상(200)의 모든 영역을 진리 데이터(240)로부터 식별되는 객체의 위치와 비교하여, 객체와 중첩되는 정도가 미리 설정된 임계치 이상인 특정 영역을 포지티브 샘플(220, 230)로 결정할 수 있다. 객체와 중첩되는 정도는, 예를 들어, Intersection-of-Union(IOU)에 기초하여 결정될 수 있다. 예를 들어, 특정 영역 및 객체가 중첩되는 정도인 IOU가 0.5 이상이면, 영상 학습 장치는 특정 영역을 포지티브 샘플(220, 230)로 결정할 수 있다. 요약하면, 학습 영상(200)의 모든 영역 중에서, 진리 데이터(240)로부터 식별되는 객체의 위치와 상대적으로 많이 중첩되는 영역이 포지티브 샘플(220, 230)로 결정될 수 있다. 또는, 영상 학습 장치는 진리 데이터(240)가 나타내는 객체의 위치와 가장 많이 겹치는 하나의 영역을 포지티브 샘플로 결정할 수 있다. 또는, 영상 학습 장치는 학습 영상(200)의 모든 영역 각각이 식별된 객체의 위치와 중첩되는 정도(예를 들어, 상기 IOU로 평가될 수 있음)가 임계치(예를 들어, 상기 0.5) 이상인지 여부에 기초하여, 포지티브 샘플을 선택할 수 있다.For example, the image learning apparatus compares all regions of the learning image 200 with the positions of the objects identified from the truth data 240, and determines a specific region whose degree of overlapping with the object is equal to or greater than a preset threshold value by using the positive sample 220, 230). The degree of overlap with the object may be determined based on, for example, Intersection-of-Union (IOU). For example, if the IOU, which is the degree to which the specific area and the object overlap, is 0.5 or more, the video learning apparatus may determine the specific area as the positive samples 220 and 230. In summary, among all the regions of the training image 200, a region that relatively much overlaps with the position of the object identified from the truth data 240 may be determined as the positive samples 220 and 230. Alternatively, the image learning apparatus may determine, as a positive sample, one area that most closely overlaps the location of the object indicated by the truth data 240. Alternatively, the video learning apparatus determines whether each of the areas of the training image 200 overlaps with the location of the identified object (eg, can be evaluated by the IOU) is equal to or greater than a threshold (eg, 0.5). Based on whether or not, a positive sample can be selected.

뉴럴 네트워크가 학습 영상(200)에서 객체가 존재하는 영역을 식별하도록 학습되는 경우, 네거티브 샘플(210)은 뉴럴 네트워크를 통해 얻고 싶지 않은 잘못된 식별 결과를 나타낼 수 있다. 예를 들어, 포지티브 샘플(220, 230) 및 하드 네거티브 샘플(210)을 비교하면, 하드 네거티브 샘플(210) 및 객체가 중첩되는 영역의 크기는 포지티브 샘플(220, 230) 및 객체가 중첩되는 영역의 크기보다 상대적으로 적을 수 있다. 즉, 하드 네거티브 샘플(210)은 객체가 존재하는 영역을 포지티브 샘플(220, 230) 보다 정확하지 않게 식별한 결과를 나타낼 수 있다. 뉴럴 네트워크가 포지티브 샘플(220, 230)뿐만 아니라 하드 네거티브 샘플(210)을 학습함으로써, 뉴럴 네트워크는 보다 정확하게 학습 영상(220)에서 객체가 존재하는 영역을 식별할 수 있다.When the neural network is learned to identify a region in which an object exists in the training image 200, the negative sample 210 may indicate an incorrect identification result that is not desired to be obtained through the neural network. For example, when comparing the positive samples 220 and 230 and the hard negative samples 210, the size of the area where the hard negative sample 210 and the object overlap is the area where the positive samples 220 and 230 and the object overlap. May be relatively smaller than the size of. That is, the hard negative sample 210 may represent a result of identifying the area where the object is present in less accurate than that of the positive samples 220 and 230. As the neural network learns the positive samples 220 and 230 as well as the hard negative samples 210, the neural network can more accurately identify a region in which an object exists in the training image 220.

도 3은 일실시예에 따른 영상 학습 장치가 복수의 탐색 영역을 정렬하는 동작을 설명하기 위한 예시적인 도면이다.3 is an exemplary diagram for explaining an operation of aligning a plurality of search areas by an image learning apparatus according to an exemplary embodiment.

도 3을 참고하면, 영상 학습 장치가 획득한 복수의 탐색 영역들(탐색 영역 1(310) 내지 탐색 영역 4(340))이 도시된다. 탐색 영역들은 입력 영상(300)을 샘플링하여 획득될 수 있다. 영상 학습 장치는 복수의 탐색 영역들이 학습 영상(300)에 포함된 객체와 대응하는 확률인 클래스 스코어를 결정할 수 있다. 예를 들어, 탐색 영역 1(310)의 클래스 스코어는 0.21, 탐색 영역 2(320)의 클래스 스코어는 0.32, 탐색 영역 3(330)의 클래스 스코어는 0.12 및 탐색 영역 4(340)의 클래스 스코어는 0.57인 것으로 가정한다.Referring to FIG. 3, a plurality of search regions (search region 1 310 to search region 4 340) obtained by the image learning apparatus are illustrated. The search regions may be obtained by sampling the input image 300. The image learning apparatus may determine a class score, which is a probability that a plurality of search regions correspond to an object included in the learning image 300. For example, the class score of search area 1 (310) is 0.21, the class score of search area 2 (320) is 0.32, the class score of search area 3 (330) is 0.12, and the class score of search area 4 (340) is It is assumed to be 0.57.

영상 학습 장치는 복수의 탐색 영역들 중에서, 미리 설정된 임계치(예를 들어, 도 1의 임계치(θ)) 이하의 클래스 스코어를 가지는 탐색 영역(즉, 소프트 네거티브 샘플)을 제거할 수 있다. 예를 들어, 임계치가 0.2인 경우, 클래스 스코어가 0.12인 탐색 영역 3(330)이 제거될 수 있다. 영상 학습 장치는 임계치를 초과하는 클래스 스코어를 가지는 탐색 영역들에 대하여 군집화를 수행할 수 있다. 영상 학습 장치가 군집화를 수행하면서, 상대적으로 낮은 클래스 스코어를 가지는 탐색 영역이 제거될 수 있다. 영상 학습 장치는 군집화를 수행한 다음, 남아있는 탐색 영역들을 클래스 스코어에 기초하여 정렬하여 저장할 수 있다. 따라서, 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역(즉, 하드 네거티브 샘플)이 영상 학습 장치에 저장될 수 있다.The image learning apparatus may remove a search region (ie, a soft negative sample) having a class score less than or equal to a preset threshold (eg, threshold θ in FIG. 1) from among the plurality of search regions. For example, when the threshold is 0.2, the search area 3 330 having a class score of 0.12 may be removed. The image learning apparatus may perform clustering on search regions having a class score exceeding a threshold. While the image learning apparatus performs clustering, a search area having a relatively low class score may be removed. After performing clustering, the image learning apparatus may sort and store the remaining search regions based on the class score. Accordingly, a search area (ie, a hard negative sample) having a class score greater than the threshold value θ may be stored in the image learning apparatus.

군집화를 수행한 다음, 탐색 영역 1(310), 탐색 영역 2(320) 및 탐색 영역 4(340)가 남은 것으로 가정하자. 영상 학습 장치는 상기 탐색 영역들을, 클래스 스코어가 높은 순서인 탐색 영역 4(340), 탐색 영역 2(320) 및 탐색 영역 1(310) 순서대로 정렬할 수 있다. 영상 학습 장치는 정렬된 복수의 탐색 영역들 중에서, 학습 영상(300)으로부터 식별된 포지티브 샘플의 개수에 기초하여, 하나 이상의 탐색 영역을 선택하여 뉴럴 네트워크의 학습에 이용할 수 있다. 영상 학습 장치가 선택하는 탐색 영역(즉, 하드 네거티브 샘플)의 개수는 포지티브 샘플의 개수에 비례할 수 있다.After clustering is performed, it is assumed that search area 1 (310), search area 2 (320), and search area 4 (340) remain. The video learning apparatus may arrange the search regions in the order of the search region 4 340, the search region 2 320, and the search region 1 310 in the order of the highest class score. The image learning apparatus may select one or more search regions from among a plurality of sorted search regions based on the number of positive samples identified from the training image 300 and use them for learning the neural network. The number of search regions (ie, hard negative samples) selected by the video learning apparatus may be proportional to the number of positive samples.

도 4는 일실시예에 따른 영상 학습 장치가 복수의 탐색 영역 중에서 하드 네거티브 샘플을 선택하는 동작을 설명하기 위한 예시적인 도면이다.FIG. 4 is an exemplary diagram for explaining an operation of selecting a hard negative sample from among a plurality of search areas by an image learning apparatus according to an exemplary embodiment.

도 4를 참고하면, 영상 학습 장치가 학습 영상(400)에서 추출한 복수의 탐색 영역이 도시된다. 복수의 탐색 영역은 미리 설정된 확률에 기초하여 학습 영상(400)에서 샘플링된 영역일 수 있다. 영상 학습 장치는 샘플링된 복수의 탐색 영역의 클래스 스코어를 결정할 수 있다. 바꾸어 말하면, 클래스 스코어를 결정하는 것이 학습 영상(400)의 모든 영역이 아닌 샘플링된 복수의 탐색 영역에서 수행되므로, 클래스 스코어를 결정하는데 필요한 계산량이 절감될 수 있다.Referring to FIG. 4, a plurality of search areas extracted from the training image 400 by the video learning apparatus are illustrated. The plurality of search regions may be regions sampled from the training image 400 based on a preset probability. The video learning apparatus may determine class scores of a plurality of sampled search regions. In other words, since the determination of the class score is performed in a plurality of sampled search regions instead of all regions of the training image 400, the amount of calculation required to determine the class score can be reduced.

복수의 탐색 영역 각각의 클래스 스코어가 결정되면, 영상 학습 장치는 복수의 탐색 영역에서, 미리 설정된 임계치(θ)를 초과하는 클래스 스코어를 가지는 탐색 영역을 하나 이상 추출할 수 있다. 도 4를 참고하면, 탐색 영역들(420, 430, 440, 450, 460)이 미리 설정된 임계치(θ)를 초과하는 클래스 스코어를 가질 수 있다. 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460)을 제외한 나머지 탐색 영역들을 버릴 수 있다(discard). 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460)에 대해 군집화 또는 정렬을 수행할 수 있다. 영상 학습 장치가 탐색 영역들(420, 430, 440, 450, 460)에 대해 수행하는 군집화 또는 정렬은 도 3에서 설명한 바와 유사하므로, 상세한 설명을 생략한다.When the class score of each of the plurality of search areas is determined, the video learning apparatus may extract one or more search areas having a class score exceeding a preset threshold value θ from the plurality of search areas. Referring to FIG. 4, search areas 420, 430, 440, 450, and 460 may have a class score exceeding a preset threshold θ. The image learning apparatus may discard the remaining search regions except for the search regions 420, 430, 440, 450, and 460 (discard). The image learning apparatus may cluster or align the search regions 420, 430, 440, 450, and 460. The clustering or alignment performed by the image learning apparatus on the search regions 420, 430, 440, 450, and 460 is similar to that described with reference to FIG. 3, and a detailed description thereof is omitted.

영상 학습 장치는 복수의 탐색 영역 중에서 포지티브 샘플과 대응하는 탐색 영역을 제외한 나머지 탐색 영역에 대해서 상술한 군집화 또는 정렬을 수행할 수 있다. 따라서, 탐색 영역들(420, 430, 440, 450, 460)은 미리 설정된 임계치(θ)를 초과하는 클래스 스코어를 가지는 하드 네거티브 샘플일 수 있다.The image learning apparatus may perform the above-described clustering or alignment on the remaining search regions, except for the search region corresponding to the positive sample, among the plurality of search regions. Accordingly, the search regions 420, 430, 440, 450, and 460 may be hard negative samples having a class score exceeding a preset threshold value θ.

영상 학습 장치는 미리 설정된 임계치(θ)를 초과하는 클래스 스코어를 가지는 탐색 영역들(420, 430, 440, 450, 460) 중 적어도 하나를 뉴럴 네트워크를 학습하는데 이용할 수 있다. 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 중 적어도 하나를 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플로 결정하기에 앞서, 학습 영상(400)에서 포지티브 샘플(410)을 하나 이상 결정할 수 있다. 영상 학습 장치가 학습 영상(400)에서 포지티브 샘플(410)을 하나 이상 결정하는 동작은 도 2에서 설명한 바와 유사하므로, 상세한 설명을 생략한다.The image learning apparatus may use at least one of the search regions 420, 430, 440, 450, and 460 having a class score exceeding a preset threshold value θ to learn the neural network. Before determining at least one of the search regions 420, 430, 440, 450, and 460 as a hard negative sample to be used for learning a neural network, the image learning apparatus determines one positive sample 410 in the training image 400. You can decide more. An operation of determining one or more positive samples 410 from the training image 400 by the video learning apparatus is similar to that described with reference to FIG. 2, and thus a detailed description thereof will be omitted.

도 4를 참고하면, 영상 학습 장치가 학습 영상(400)에서 1 개의 포지티브 샘플(410)을 추출한 것으로 가정하자. 탐색 영역들(420, 430, 440, 450, 460) 중에서 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수는 포지티브 샘플(410)의 개수에 기초하여 결정될 수 있다. 예를 들어, 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수 및 포지티브 샘플(410)의 개수 사이의 비율이 미리 설정된 비율을 만족하도록, 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 중에서 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플을 선택할 수 있다.Referring to FIG. 4, it is assumed that the image learning apparatus extracts one positive sample 410 from the training image 400. The number of hard negative samples used for learning of a neural network among the search areas 420, 430, 440, 450 and 460 may be determined based on the number of positive samples 410. For example, so that the ratio between the number of hard negative samples and the number of positive samples 410 used for learning a neural network satisfies a preset ratio, the video learning apparatus includes search areas 420, 430, 440, and 450 , 460), a hard negative sample used for learning a neural network may be selected.

예를 들어, 상기 비율이 1:3인 경우, 포지티브 샘플(410)의 개수가 1이므로, 영상 학습 장치는 임계치(θ)를 초과하는 클래스 스코어를 가지는 5개의 탐색 영역들(420, 430, 440, 450, 460) 중에서 3개의 탐색 영역을 선택하여 뉴럴 네트워크의 학습에 이용할 수 있다. 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 각각의 클래스 스코어에 기초하여, 가장 큰 클래스 스코어를 가지는 탐색 영역부터 내림차순으로 3개의 탐색 영역을 선택하여 뉴럴 네트워크의 학습에 이용할 수 있다.For example, when the ratio is 1:3, since the number of positive samples 410 is 1, the image learning apparatus includes five search areas 420, 430, and 440 having a class score exceeding the threshold value θ. , 450, and 460), three search regions may be selected and used for learning a neural network. The video learning apparatus selects three search regions in descending order from the search region having the largest class score based on the class scores of each of the search regions 420, 430, 440, 450, and 460 to use for learning of the neural network. I can.

포지티브 샘플(410)의 개수 또는 미리 설정된 비율에 기초하여 결정되는 하드 네거티브 샘플의 개수가 임계치(θ)를 초과하는 클래스 스코어를 가지는 탐색 영역들(420, 430, 440, 450, 460)의 개수 이상인 경우, 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 전부를 뉴럴 네트워크의 학습에 이용할 수 있다. 예를 들어, 상기 비율이 1:6인 경우, 포지티브 샘플(410)의 개수가 1이므로, 영상 학습 장치는 6개의 탐색 영역을 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플의 개수로 결정할 수 있다. 탐색 영역들(420, 430, 440, 450, 460)의 개수가 결정된 하드 네거티브 샘플의 개수보다 작으므로, 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 전부를 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플로 결정할 수 있다. 따라서, 하드 네거티브 샘플의 개수가 6으로 결정되었음에도 불구하고, 뉴럴 네트워크의 학습에 이용되는 탐색 영역 또는 하드 네거티브 샘플의 최종적인 개수는 5일 수 있다.The number of positive samples 410 or the number of hard negative samples determined based on a preset ratio is greater than or equal to the number of search regions 420, 430, 440, 450, 460 having a class score exceeding the threshold value θ. In this case, the image learning apparatus may use all of the search regions 420, 430, 440, 450, and 460 for learning of the neural network. For example, when the ratio is 1:6, since the number of positive samples 410 is 1, the image learning apparatus may determine six search regions as the number of hard negative samples to be used for learning a neural network. Since the number of search regions 420, 430, 440, 450, 460 is smaller than the determined number of hard negative samples, the image learning apparatus replaces all of the search regions 420, 430, 440, 450, 460 of the neural network. It can be determined as a hard negative sample to use for learning. Accordingly, although the number of hard negative samples is determined to be 6, the final number of search regions or hard negative samples used for learning a neural network may be 5.

포지티브 샘플(410)의 개수 또는 미리 설정된 비율에 기초하여 결정되는 하드 네거티브 샘플의 개수가 임계치(θ)를 초과하는 클래스 스코어를 가지는 탐색 영역들(420, 430, 440, 450, 460)의 개수 이상인 경우, 영상 학습 장치는 임계치(θ)를 변경할 수 있다. 예를 들어, 학습 영상(400)의 학습이 완료된 이후, 새로운 학습 영상을 학습하는 경우, 새로운 학습 영상에 대응하는 하드 네거티브 샘플의 개수가 증가하도록, 임계치(θ)는 보다 작은 값으로 변경될 수 있다. 바꾸어 말하면, 학습 영상(400)에 적용된 임계치를 θ라 하고, 새로운 학습 영상에 적용되는 임계치를 θ’라 하는 경우, 미리 설정된 1보다 작은 실수 α(예를 들어, 0.99)에 대하여, θ’ = α × θ일 수 있다.The number of positive samples 410 or the number of hard negative samples determined based on a preset ratio is greater than or equal to the number of search regions 420, 430, 440, 450, 460 having a class score exceeding the threshold value θ. In this case, the image learning apparatus may change the threshold value θ. For example, after learning of the training image 400 is completed, when learning a new training image, the threshold value θ may be changed to a smaller value so that the number of hard negative samples corresponding to the new training image increases. have. In other words, if the threshold applied to the training image 400 is θ and the threshold applied to the new training image is θ', for a real number α less than 1 (for example, 0.99), θ'= It may be α × θ.

포지티브 샘플(410) 및 상술한 동작에 의해 선택된 하나 이상의 하드 네거티브 샘플은 학습 영상(400)과 함께 뉴럴 네트워크의 학습에 이용될 수 있다. 영상 학습 장치가 학습 영상(400)을 포함하는 복수의 학습 영상을 이용하여 뉴럴 네트워크를 학습시키면서, 복수의 학습 영상에 순차적으로 적용되는 임계치(θ)의 크기는 점진적으로 감소될 수 있다. 따라서, 복수의 학습 영상에서 결정되는 하드 네거티브 샘플의 클래스 스코어도 임계치(θ)의 크기에 따라 점진적으로 감소될 수 있다. 따라서, 뉴럴 네트워크의 정확도가 빠르게 수렴할 수 있다.The positive sample 410 and one or more hard negative samples selected by the above-described operation may be used together with the training image 400 to learn a neural network. While the image learning apparatus learns the neural network by using a plurality of training images including the training image 400, the size of the threshold value θ sequentially applied to the plurality of training images may be gradually decreased. Accordingly, the class score of the hard negative sample determined from the plurality of training images may also be gradually decreased according to the size of the threshold value θ. Therefore, the accuracy of the neural network can converge rapidly.

도 5는 일실시예에 따른 영상 학습 장치에 의해 학습된 뉴럴 네트워크를 이용하여 입력 영상에 존재하는 객체를 인식하는 동작을 설명하기 위한 흐름도이다. 일실시예에 따른 영상 학습 장치에 의해 학습된 뉴럴 네트워크는, 예를 들어, 지능형 자동차, 영상 보안 장치, 게임 및 로봇 등의 다양한 응용 분야에서 입력 영상을 인식하는데 사용될 수 있다. 이하에서는 일실시예에 따른 영상 인식 장치가 일실시예에 따른 영상 학습 장치에 의해 학습된 뉴럴 네트워크를 이용하여 입력 영상을 인식하는 동작을 설명한다. 영상 인식 장치는 지능형 자동차, 영상 보안 장치, 게임 및 로봇 등에 적용될 수 있다. 영상 학습 장치 또한 도 1 내지 도 4의 동작에 의해 학습된 뉴럴 네트워크에 기초하여 입력 영상에 포함된 객체를 인식할 수 있다.5 is a flowchart illustrating an operation of recognizing an object present in an input image using a neural network learned by an image learning apparatus according to an exemplary embodiment. The neural network learned by the image learning apparatus according to an embodiment may be used to recognize input images in various application fields, such as, for example, intelligent cars, image security devices, games, and robots. Hereinafter, an operation of recognizing an input image by using a neural network learned by the image learning apparatus according to an exemplary embodiment by the image recognition apparatus according to an exemplary embodiment will be described. The image recognition device can be applied to intelligent cars, image security devices, games, and robots. The image learning apparatus may also recognize an object included in the input image based on the neural network learned by the operation of FIGS. 1 to 4.

도 5를 참고하면, 단계(510)에서, 일실시예에 따른 영상 인식 장치는 입력 영상을 식별할 수 있다. 입력 영상은 영상 인식 장치와 연결된 네트워크를 통해 수신되어, 영상 인식 장치의 메모리에 저장될 수 있다. 또는, 영상 인식 장치와 연결된 다른 전자 장치(예를 들어, 이미지 센서를 포함하는 카메라, 스마트폰 등)로부터 전송되어, 영상 인식 장치의 메모리에 저장될 수 있다. 영상 인식 장치는 메모리에 저장된 입력 영상을 식별할 수 있다.Referring to FIG. 5, in step 510, the image recognition apparatus according to an embodiment may identify an input image. The input image may be received through a network connected to the image recognition device and may be stored in a memory of the image recognition device. Alternatively, it may be transmitted from another electronic device (eg, a camera including an image sensor, a smartphone, etc.) connected to the image recognition device and stored in a memory of the image recognition device. The image recognition device may identify an input image stored in the memory.

도 5를 참고하면, 단계(520)에서, 일실시예에 따른 영상 인식 장치는 입력 영상을 뉴럴 네트워크에 입력하기 위하여, 입력 영상을 전처리할 수 있다. 입력 영상의 밝기(intensity), 크기 등이 뉴럴 네트워크의 입력을 고려하여 변경될 수 있으며, 입력 영상과 관련된 특징 벡터가 추출될 수 있다.Referring to FIG. 5, in step 520, the image recognition apparatus according to an embodiment may pre-process the input image in order to input the input image to the neural network. The intensity, size, etc. of the input image may be changed in consideration of the input of the neural network, and feature vectors related to the input image may be extracted.

도 5를 참고하면, 단계(530)에서, 일실시예에 따른 영상 인식 장치는 전처리된 입력 영상을 사전에 학습된 뉴럴 네트워크에 입력할 수 있다. 뉴럴 네트워크는 도 1 내지 도 4에서 설명한 영상 학습 장치의 동작에 의해 학습될 수 있다. 뉴럴 네트워크는 복수의 노드를 포함하는 입력 레이어, 하나 이상의 히든 레이어 및 출력 레이어를 포함할 수 있다. 입력 영상과 관련된 정보(예를 들어, 상기 특징 벡터)가 입력 레이어로 입력될 수 있다.Referring to FIG. 5, in step 530, the image recognition apparatus according to an embodiment may input a preprocessed input image to a pre-learned neural network. The neural network may be learned by the operation of the image learning apparatus described in FIGS. 1 to 4. The neural network may include an input layer including a plurality of nodes, one or more hidden layers, and an output layer. Information related to the input image (eg, the feature vector) may be input as an input layer.

도 5를 참고하면, 단계(540)에서, 일실시예에 따른 영상 인식 장치는 뉴럴 네트워크의 출력에 기초하여, 입력 영상에서 객체가 존재하는 영역을 결정할 수 있다. 바꾸어 말하면, 영상 인식 장치는 뉴럴 네트워크의 출력에 기초하여, 입력 영상에 포함된 객체를 인식할 수 있다. 뉴럴 네트워크는 복수의 노드를 포함하는 출력 레이어를 포함할 수 있다. 영상 인식 장치는 출력 레이어에 포함된 노드의 출력 값을 획득할 수 있다. 영상 인식 장치는 획득된 출력 값에 기초하여, 입력 영상에서 객체가 존재하는 영역을 결정할 수 있다.Referring to FIG. 5, in step 540, the image recognition apparatus according to an embodiment may determine a region in which an object exists in an input image based on an output of a neural network. In other words, the image recognition apparatus may recognize an object included in the input image based on the output of the neural network. The neural network may include an output layer including a plurality of nodes. The image recognition apparatus may obtain an output value of a node included in the output layer. The image recognition apparatus may determine a region in which the object exists in the input image based on the obtained output value.

뉴럴 네트워크가 도 1 내지 도 4에서 설명한 영상 학습 장치의 동작에 의해 포지티브 샘플 및 하드 네거티브 샘플을 학습하였으므로, 영상 인식 장치는 입력 영상에서 객체가 존재하는 영역과 관련된 정보를 뉴럴 네트워크로부터 획득할 수 있다. 영상 인식 장치는 뉴럴 네트워크의 출력에 기초하여, 입력 영상에서 객체가 존재하는 영역을 바운딩 박스로 표시하여 출력할 수 있다. 입력 영상에서 객체가 존재하는 영역을 바운딩 박스로 표시하여 출력하면서, 영상 인식 장치는 바운딩 박스에 객체가 존재할 확률을 출력할 수 있다. 또는, 영상 인식 장치는 입력 영상에서 객체가 존재하는 확률이 높은 영역을 추출하여 출력할 수 있다.Since the neural network learns the positive sample and the hard negative sample by the operation of the image learning apparatus described in FIGS. 1 to 4, the image recognition apparatus may obtain information related to an area where an object exists in the input image from the neural network. . The image recognition apparatus may display and output an area where an object exists in the input image as a bounding box based on the output of the neural network. While displaying and outputting an area where an object exists in the input image as a bounding box, the image recognition apparatus may output a probability that the object exists in the bounding box. Alternatively, the image recognition apparatus may extract and output a region having a high probability of the existence of an object from the input image.

지능형 자동차, 영상 보안 장치, 게임 및 로봇 등에 뉴럴 네트워크를 이용한 영상 인식 장치를 적용하기 위하여, 뉴럴 네트워크와 관련된 다양한 파라미터를 조절하면서, 해당 분야의 환경에 최적화된 뉴럴 네트워크를 개발하고 검증할 필요가 있다. 일실시예에 따른 영상 인식 장치가 영상 학습 장치에 의해 학습된 뉴럴 네트워크를 사용함에 따라, 뉴럴 네트워크의 개발 및 검증에 소요되는 시간이 절감될 수 있다. 따라서, 영상 인식 장치의 개발에 소요되는 시간이 단축될 수 있다.In order to apply an image recognition device using a neural network to intelligent cars, image security devices, games, and robots, it is necessary to develop and verify a neural network optimized for the environment in the field while adjusting various parameters related to the neural network. . As the image recognition apparatus according to an embodiment uses the neural network learned by the image learning apparatus, time required for development and verification of the neural network may be reduced. Accordingly, the time required to develop the image recognition device can be shortened.

도 6은 일실시예에 따른 영상 학습 장치의 구조를 설명하기 위한 도면이다.6 is a diagram illustrating a structure of an image learning apparatus according to an exemplary embodiment.

도 6을 참조하면, 일실시예에 따른 영상 학습 장치는 메모리(610) 및 프로세서(620)를 포함할 수 있다. 메모리(610) 및 프로세서(620)는 버스(bus)(630)를 통하여 서로 통신할 수 있다.Referring to FIG. 6, an image learning apparatus according to an embodiment may include a memory 610 and a processor 620. The memory 610 and the processor 620 may communicate with each other through a bus 630.

메모리(610)는 컴퓨터에서 읽을 수 있는 명령어를 저장할 수 있다. 뉴럴 네트워크를 학습하는데 이용되는 학습 영상, 학습 영상에서 샘플링된 탐색 영역들 및 탐색 영역들 각각에 대응하는 클래스 스코어가 메모리(610)에 저장될 수 있다. 영상 학습 장치가 임계치 이상의 탐색 영역들을 정렬하는 경우, 임계치 이상의 탐색 영역들이 클래스 스코어의 내림차순으로 메모리(610)에 저장될 수 있다. 또한, 뉴럴 네트워크와 관련된 파라미터들이 메모리(610)에 저장될 수 있다.The memory 610 may store instructions that can be read by a computer. A learning image used to learn the neural network, search regions sampled from the learning image, and a class score corresponding to each of the search regions may be stored in the memory 610. When the image learning apparatus sorts search regions greater than or equal to the threshold, search regions greater than or equal to the threshold may be stored in the memory 610 in descending order of the class score. In addition, parameters related to the neural network may be stored in the memory 610.

프로세서(620)는 메모리(610)에 저장된 명령어가 프로세서(620)에서 실행됨에 따라 상술한 동작들을 수행할 수 있다. 메모리(610)는 RAM(Random Access Memory)과 같은 휘발성 메모리이거나, HDD(Hard Disk Drive) 또는 SSD(Solid State Drive)와 같은 비휘발성 메모리일 수 있다.The processor 620 may perform the above-described operations as the instructions stored in the memory 610 are executed by the processor 620. The memory 610 may be a volatile memory such as a random access memory (RAM) or a nonvolatile memory such as a hard disk drive (HDD) or a solid state drive (SSD).

프로세서(620)는 명령어들, 혹은 프로그램들을 실행하거나, 영상 학습 장치를 제어하는 장치로서, 예를 들어, CPU(Central Processing Unit) 및 GPU(Graphic Processing Unit)를 포함할 수 있다. 영상 학습 장치는 입출력 장치(도면 미 표시)를 통하여 외부 장치(예를 들어, 영상 촬영 장치, 퍼스널 컴퓨터 또는 네트워크)에 연결되고, 데이터를 교환할 수 있다. 예를 들어, 영상 학습 장치는 이미지 센서를 통해 학습 영상을 수신할 수 있다. 영상 학습 장치는 퍼스널 컴퓨터, 태블릿 컴퓨터, 넷북 등 컴퓨팅 장치, 이동 전화, 스마트 폰, PDA, 태블릿 컴퓨터, 랩톱 컴퓨터 등 모바일 장치, 또는 스마트 텔레비전, 게이트 제어를 위한 보안 장치 등 전자 제품 등의 적어도 일부로 구현될 수 있다.The processor 620 is a device that executes instructions or programs, or controls an image learning apparatus, and may include, for example, a CPU (Central Processing Unit) and a GPU (Graphic Processing Unit). The video learning device is connected to an external device (for example, an image capturing device, a personal computer, or a network) through an input/output device (not shown), and can exchange data. For example, the image learning apparatus may receive a learning image through an image sensor. The video learning device is implemented as at least a part of computing devices such as personal computers, tablet computers, netbooks, mobile devices such as mobile phones, smart phones, PDAs, tablet computers, and laptop computers, or electronic products such as smart televisions and security devices for gate control. Can be.

프로세서(620)는 미리 설정된 확률에 기초하여 학습 영상으로부터 복수의 탐색 영역을 샘플링하고, 복수의 탐색 영역 각각이 학습 영상에 포함된 객체와 대응하는지를 나타낸 클래스 스코어를 결정하고, 복수의 탐색 영역 중에서, 미리 설정된 임계치 보다 큰 클래스 스코어를 가지는 탐색 영역들을 선택할 수 있고, 선택된 탐색 영역들을 클래스 스코어에 따라 정렬하여 메모리(610)에 저장할 수 있다. 프로세서(620)는 학습 영상의 일부분으로써, 학습 영상에서 객체가 존재하는 영역을 뉴럴 네트워크에 학습하는데 이용되는 포지티브 샘플을 식별할 수 있고, 식별된 포지티브 샘플의 개수 및 클래스 스코어에 기초하여, 메모리(610)에 저장된 탐색 영역들 중에서, 학습 영상에서 객체가 존재하지 않는 영역을 뉴럴 네트워크에 학습하는데 이용되는 하드 네거티브 샘플을 선택할 수 있다. 프로세서(620)는 선택된 하드 네거티브 샘플 및 포지티브 샘플에 기초하여 뉴럴 네트워크를 학습할 수 있다.The processor 620 samples a plurality of search areas from the training image based on a preset probability, determines a class score indicating whether each of the plurality of search areas corresponds to an object included in the training image, and among the plurality of search areas, Search regions having a class score greater than a preset threshold may be selected, and the selected search regions may be sorted according to the class score and stored in the memory 610. As a part of the training image, the processor 620 may identify a positive sample used to learn the neural network a region in which an object exists in the training image, and based on the number of identified positive samples and a class score, the memory ( Among the search regions stored in 610), a hard negative sample used to learn a region in which an object does not exist in the training image by the neural network may be selected. The processor 620 may learn a neural network based on the selected hard negative sample and positive sample.

도 6에 도시된 각 구성요소들에는 도 1 내지 도 5를 통하여 전술한 사항들이 그대로 적용되므로, 보다 상세한 설명은 생략한다.Since the above-described matters through FIGS. 1 to 5 are applied to each of the components shown in FIG. 6 as they are, a more detailed description will be omitted.

요약하면, 일실시예에 따른 영상 학습 장치는 학습 영상으로부터 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플을 추출할 수 있다. 하드 네거티브 샘플은 객체를 인식한 바람직하지 않은 결과를 뉴럴 네트워크에 학습하는데 이용될 수 있다. 하드 네거티브 샘플은 미리 설정된 확률에 따라 학습 영상으로부터 샘플링된 탐색 영역들 중에서 결정될 수 있다. 영상 학습 장치는 샘플링된 탐색 영역들이 객체에 대응할 확률인 클래스 스코어를 결정한 다음, 결정된 클래스 스코어에 기초하여 탐색 영역들 중에서 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플을 결정할 수 있다. 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플의 개수는 포지티브 샘플의 개수, 클래스 스코어와 비교되는 미리 설정된 임계치 및 미리 설정된 포지티브 샘플 및 하드 네거티브 샘플간의 비율 중 적어도 하나에 기초하여 결정될 수 있다. 따라서, 하드 네거티브 샘플을 정렬하는데 소요되는 시간 및 연산량이 절감될 수 있다.In summary, the image learning apparatus according to an embodiment may extract a hard negative sample used for learning a neural network from a training image. The hard negative sample can be used to learn to the neural network an undesirable result of recognizing an object. The hard negative sample may be determined among search regions sampled from the training image according to a preset probability. The image learning apparatus may determine a class score that is a probability that the sampled search regions correspond to an object, and then determine a hard negative sample to be used for learning a neural network among the search regions based on the determined class score. The number of hard negative samples to be used for learning of the neural network may be determined based on at least one of the number of positive samples, a preset threshold compared to the class score, and a ratio between the preset positive samples and the hard negative samples. Accordingly, it is possible to reduce the amount of time and computation required to align the hard negative samples.

임계치 이상의 클래스 스코어를 가지는 탐색 영역이 하드 네거티브 샘플로 결정되므로, 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플은 객체의 존재 여부를 판단하는 것이 상대적으로 어려운 탐색 영역에 대응할 수 있다. 영상 학습 장치가 미리 설정된 확률에 따라 학습 영상으로부터 샘플링된 탐색 영역들 중에서 하드 네거티브 샘플을 결정하므로, 하드 네거티브 샘플을 결정하는데 소요되는 시간이 절감될 수 있다.Since a search region having a class score equal to or greater than the threshold is determined as a hard negative sample, a hard negative sample used for learning of a neural network may correspond to a search region where it is relatively difficult to determine whether an object exists. Since the image learning apparatus determines a hard negative sample from among the search regions sampled from the training image according to a preset probability, time required to determine the hard negative sample may be reduced.

하드 네거티브 샘플로 결정되는 탐색 영역의 개수는 임계치를 초과하는 클래스 스코어를 가지는 탐색 영역의 개수를 초과하지 않을 수 있다. 하드 네거티브 샘플로 결정되는 탐색 영역의 개수가 부족한 것으로 결정되는 경우, 임계치는 보다 작은 값으로 변경될 수 있다. 영상 학습 장치는 포지티브 샘플의 개수 및 미리 설정된 포지티브 샘플 및 하드 네거티브 샘플간의 비율에 기초하여, 하드 네거티브 샘플로 결정되는 탐색 영역의 개수가 부족한지 판단할 수 있다. 영상 학습 장치가 복수의 학습 영상을 순차적으로 뉴럴 네트워크의 학습에 이용하는 경우, 학습 영상으로부터 하드 네거티브 샘플을 추출할 때마다, 임계치가 적응적으로 변경될 수 있다. 적응적으로 변경되는 임계치 및 상기 확률에 따라 샘플링된 탐색 영역을 이용하여, 일실시예에 따른 영상 학습 장치는 성능의 저하 없이 뉴럴 네트워크를 보다 빠르게 학습할 수 있다.The number of search regions determined as hard negative samples may not exceed the number of search regions having a class score exceeding the threshold. When it is determined that the number of search areas determined as hard negative samples is insufficient, the threshold may be changed to a smaller value. The video learning apparatus may determine whether the number of search regions determined as hard negative samples is insufficient based on the number of positive samples and a ratio between a preset positive sample and a hard negative sample. When the image learning apparatus sequentially uses a plurality of training images for learning a neural network, each time a hard negative sample is extracted from the training image, the threshold may be adaptively changed. Using the adaptively changed threshold and the search region sampled according to the probability, the image learning apparatus according to an embodiment may learn a neural network more quickly without deteriorating performance.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices and components described in the embodiments are, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, such as one or more general purpose computers or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to behave as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodyed in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of the program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited embodiments and drawings, various modifications and variations are possible from the above description by those of ordinary skill in the art. For example, the described techniques are performed in a different order from the described method, and/or components such as a system, structure, device, circuit, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다. Therefore, other implementations, other embodiments, and claims and equivalents fall within the scope of the claims to be described later.

Claims

In the image learning method using a neural network,
Sampling, by an image learning apparatus, a plurality of search regions from the training image;
An image learning apparatus compares whether each of the plurality of search regions corresponds to a position of an object included in the learning image with respect to each of the plurality of search regions, so that each of the plurality of search regions is an object included in the learning image. Determining a class score that is a probability corresponding to and;
Extracting, by an image learning apparatus, one or more search regions having a class score exceeding a preset threshold from among the plurality of search regions;
Clustering the extracted one or more search regions by an image learning apparatus;
Arranging, by an image learning apparatus, the one or more search regions according to an order of class scores determined in each of the clustered one or more search regions to select a search region used for learning a neural network;
Identifying, by an image learning apparatus, one or more positive samples used to learn, by a neural network, an area in which the object exists in the training image by comparing the location of the object in the training image and all areas of the training image;
An image learning apparatus is a neural region in which an object does not exist in the learning image among the sorted one or more search regions based on the number of the identified one or more positive samples and the class score determined for each of the aligned one or more search regions. Selecting one or more search areas used for network learning;
Identifying, by an image learning apparatus, a hard negative sample for each of the selected one or more search areas; And
Learning, by an image learning device, the neural network based on the identified one or more hard negative samples
Including,
The threshold is,
An image learning method that is adjusted based on the number of hard negative samples used for learning of the neural network.

The method of claim 1,
Identifying the hard negative sample,
An image learning method for identifying the hard negative sample from among the plurality of search regions, among the remaining search regions in which an object is not included except for a search region corresponding to a positive sample used to learn the neural network.

The method of claim 1,
Learning the neural network,
Identifying a positive sample determined by comparing the location of the object and all regions of the training image; And
Selecting a hard negative sample used for learning the neural network sequentially from the hard negative sample having a large class score among the identified hard negative samples
Including,
The number of hard negative samples used for learning of the neural network is,
An image learning method that is determined based on at least one of the identified number of positive samples and a preset ratio between a positive sample and a hard negative sample.

The method of claim 1,
Determining whether to change the threshold based on the number of positive samples determined by comparing the location of the object and all regions of the training image and the number of identified hard negative samples
Video learning method further comprising a.

The method of claim 4,
The step of determining whether to change the threshold value,
When a value obtained by applying the number of positive samples to a preset ratio between the positive samples and the hard negative samples is greater than the identified number of hard negative samples, it is determined to change the threshold.

The method of claim 5,
The threshold is,
When the threshold value is changed, the learning image different from the training image has a value smaller than the value used in the training image.

The method of claim 1,
The sampling step,
An image learning method for sampling a plurality of search regions having different sizes to divide a portion of the training image based on a preset probability determined as 1/k (k is a real number greater than 1).

Determining, by an image learning apparatus, a positive sample by comparing all regions of the training image with positions of objects in the training image; And
Determining, by the video learning apparatus, a hard negative sample based on a class score of each of the plurality of regions in which the learning image is sampled and a preset threshold value-The class score is for each of the plurality of search regions, each of the plurality of search regions It is a probability that each of the plurality of regions corresponds to an object included in the training image by comparing whether it corresponds to the position of the object included in the training image; And
Learning, by an image learning device, a neural network based on the determined hard negative sample and the determined positive sample
Including,
The threshold is,
It is adjusted according to a result of comparing the number of hard negative samples and the number of positive samples,
The step of determining the hard negative sample,
The video learning apparatus extracts one or more search regions having a class score exceeding a preset threshold from among the plurality of search regions,
The image learning apparatus clusters the extracted one or more search regions,
The image learning apparatus arranges the one or more search regions in accordance with an order of class scores determined in each of the clustered one or more search regions in order to select a search region used for learning a neural network,
The video learning apparatus compares the position of the object in the training image and all regions of the training image to identify one or more positive samples used to learn the search for the existence of the object in the training image in a neural network,
An image learning apparatus is a neural region in which an object does not exist in the learning image among the sorted one or more search regions based on the number of the identified one or more positive samples and the class score determined for each of the aligned one or more search regions. Select one or more search areas used for learning of the network,
An image learning method in which an image learning apparatus identifies hard negative samples for each of the selected one or more search regions.

The method of claim 8,
The step of determining the positive sample,
Identifying a location of the object based on truth data corresponding to the learning image; And
Selecting the positive sample from among all the regions of the learning image based on whether all regions of the learning image overlap with the position of the identified object equal to or greater than a threshold.
Video learning method comprising a.

The method of claim 8,
The step of determining the hard negative sample,
Identifying a negative sample based on whether or not a degree of overlapping with the position of the object is less than or equal to a threshold value from among the plurality of regions sampled from the training image; And
Among the identified negative samples, determining a negative sample having a class score greater than the threshold as the hard negative sample
Video learning method comprising a.

The method of claim 8,
A plurality of regions sampled from the training image,
A soft negative sample having a class score less than or equal to the threshold; And
Hard negative sample having a class score greater than the threshold
Video learning method comprising a.

The method of claim 8,
Among the plurality of regions, the number of regions having a class score greater than the threshold value, and (2) a preset ratio between positive samples and hard negative samples and a target value of hard negative samples calculated based on the number of positive samples are compared Thus, determining whether to change the threshold value
Video learning method further comprising a.

The method of claim 12,
The step of determining whether to change the threshold value,
When the target value is greater than the number of regions having a class score greater than the threshold value, it is determined to change the threshold value to a smaller value.

The method of claim 12,
The step of determining the hard negative sample,
When the target value is greater than the number of regions having a class score greater than the threshold value, one or more regions having a class score greater than the threshold value from among the plurality of regions are determined as the hard negative samples.

The method of claim 12,
The step of determining the hard negative sample,
If the target value is smaller than the number of regions having a class score greater than the threshold value, extracting regions as many as the target value from the region having the largest class score from among the plurality of regions in descending order; And
Determining the extracted regions as the hard negative samples
Video learning method comprising a.

In the image recognition method using a neural network,
Identifying, by an image learning device, an input image;
Inputting, by an image learning device, the input image to the neural network; And
Recognizing, by an image learning apparatus, an object included in the input image based on the output of the neural network
Including,
The neural network,
Among the plurality of search regions sampled from the learning image, the image learning apparatus compares with respect to each of the plurality of search regions whether each of the plurality of search regions corresponds to a position of an object included in the learning image. Each search area determines a class score, which is a probability corresponding to an object included in the training image,
The video learning apparatus extracts one or more search regions having a class score exceeding a preset threshold from among the plurality of search regions,
The image learning apparatus clusters the extracted one or more search regions,
The image learning apparatus arranges the one or more search regions in accordance with an order of class scores determined in each of the clustered one or more search regions in order to select a search region used for learning a neural network,
The video learning apparatus compares the location of the object in the training image and all areas of the training image to identify one or more positive samples used to learn the area where the object exists in the training image in a neural network,
An image learning apparatus is a neural region in which an object does not exist in the learning image among the sorted one or more search regions based on the number of the identified one or more positive samples and the class score determined for each of the aligned one or more search regions. Select one or more search areas used for learning of the network,
The video learning device identifies a hard negative sample for each of the selected one or more search areas,
The video learning device learns the identified one or more hard negative samples in advance,
The threshold is,
An image recognition method that is adjusted based on at least one of the number of positive samples determined by comparing all regions of the training image and the position of the object and the number of the selected hard negative samples.

The method of claim 16,
The hard negative sample,
Among the plurality of search regions, a search region having a class score greater than the threshold value among negative samples, which is a search region excluding a search region corresponding to the positive sample,
The class score is an object included in the training image by comparing whether each of the plurality of search areas corresponds to a position of an object included in the training image for each of a plurality of discovery areas. The image recognition method that is the probability of corresponding to.