KR102615510B1

KR102615510B1 - Device for detecting the localization of object, method for detecting the localization of object and computer program stored in a recording medium to execute the method

Info

Publication number: KR102615510B1
Application number: KR1020210085271A
Authority: KR
Inventors: 변혜란; 기민송; 어영정; 이원영; 고성필
Original assignee: 연세대학교 산학협력단
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2023-12-19
Also published as: KR20230003769A

Abstract

개시된 발명의 일 실시예에 따른 비지도 학습 기반의 객체 위치 검출 방법은, (a) 변환 이미지 생성부에 의하여, 학습용 이미지를 변환하여 변환 이미지를 생성하는 단계; (b) 객체 추출부에 의하여, 상기 학습용 이미지로부터 제1 객체 영역을 추출하고, 상기 변환 이미지로부터 제2 객체 영역을 추출하는 단계; 및 (c) 인공지능 학습부에 의하여, 상기 제1 객체 영역과 상기 제2 객체 영역을 비교하여 인공지능 모델을 학습하는 단계;를 포함할 수 있다.An unsupervised learning-based object location detection method according to an embodiment of the disclosed invention includes the steps of: (a) converting a learning image by a conversion image generator to generate a conversion image; (b) extracting a first object area from the learning image and extracting a second object area from the converted image by an object extractor; and (c) learning an artificial intelligence model by comparing the first object area and the second object area by an artificial intelligence learning unit.

Description

A computer program stored in a recording medium to execute an object location detection device, an object location detection method, and an object location detection method THE METHOD}

본 발명은 이미지상 객체의 추가적인 위치 관련 정보에 기초하여 객체의 위치를 검출하는 성능을 개선할 수 있는 비지도 학습 기반의 객체 위치 검출 방법 및 객체 위치 검출 장치에 관한 것이다.The present invention relates to an unsupervised learning-based object location detection method and object location detection device that can improve the performance of detecting the location of an object based on additional location-related information of the object in the image.

기계 학습(Machine Learning)의 한 방법으로서 지도 학습(Supervised Learning)은 이미지에서의 객체 위치를 검출하기 위한 바운딩 박스(bounding box), 세그먼테이션을 사람의 주석 작업에 의하여 컨볼루션 뉴럴 네트워크(CNN: Convolution Neural Network)로 학습하는 방법이다. 그러나, 정확한 주석을 얻기 위해 사람의 큰 노동력을 필요로 한다는 문제가 있다.As a method of machine learning, supervised learning uses a convolution neural network (CNN) to detect object locations in images by using bounding boxes and segmentation through human annotation. It is a method of learning through a network. However, there is a problem that a large amount of human labor is required to obtain accurate annotations.

또다른 기계 학습 방법으로서 비지도 학습(Unsupervised Learning)은 인공 레이블을 분류하기 위하여 분류 손실 함수(classification loss function)를 이용하여 인공지능 모델을 훈련시키는 자가 지도 학습을 적용한다. 그러나, 비지도 학습은 타겟과 관련된 정보를 인공지능 모델이 학습하기 때문에 배경까지 활성화가 되어 학습의 성능을 낮춘다는 문제가 발생할 수 있다.Unsupervised Learning, another machine learning method, applies self-supervised learning to train an artificial intelligence model using a classification loss function to classify artificial labels. However, in unsupervised learning, since the artificial intelligence model learns information related to the target, the background may be activated, lowering the learning performance.

또한, 기존의 대조 표현 학습은 이미지를 신경망을 통하여 특징 벡터로 인코딩하는데, 특징 벡터에는 공간 정보가 포함되어 있지 않기 때문에 손실 함수가 객체만을 커버하지 않고 차별적인 부분에 집중하거나 배경을 과도하게 커버한다는 문제가 있다. 따라서, 새로운 대조 손실 함수를 이용하여 보다 효율적인 객체의 위치를 검출할 수 있는 방법이 필요하다.In addition, existing contrast expression learning encodes images into feature vectors through a neural network, but since feature vectors do not contain spatial information, the loss function does not cover only the object, but focuses on discriminatory parts or excessively covers the background. there is a problem. Therefore, a method that can detect the location of an object more efficiently using a new contrast loss function is needed.

본 발명은 학습용 이미지상 객체의 위치 관련 정보를 추가적으로 포함하고 있는 변환 이미지에 기초하여 객체의 전체 영역을 커버할 수 있으며, 학습용 이미지의 차별적인 부분에만 집중하거나 배경 영역을 과도하게 커버하지 않을 수 있어서 객체의 위치를 정확하게 검출할 수 있는 객체 인식 시스템, 객체 인식 방법 및 컴퓨터 프로그램을 제공하기 위한 것이다.The present invention can cover the entire area of the object based on a converted image that additionally contains information related to the location of the object in the learning image, and does not focus on only the discriminatory part of the learning image or excessively cover the background area. The purpose is to provide an object recognition system, object recognition method, and computer program that can accurately detect the location of an object.

또한, 본 발명은 인공지능 모델에 대한 학습 횟수가 적더라도 종래의 객체 위치 검출 기술보다 이미지 상에서의 객체 위치를 정확하게 검출할 수 있는 비지도 학습 기반의 객체 위치 검출 장치, 객체 위치 검출 방법 및 컴퓨터 프로그램을 제공하기 위한 것이다.In addition, the present invention provides an unsupervised learning-based object location detection device, object location detection method, and computer program that can more accurately detect the object location on an image than conventional object location detection technology even if the number of learning times for the artificial intelligence model is small. It is intended to provide.

개시된 발명의 일 측면에 따른 비지도 학습 기반의 객체 위치 검출 방법은, (a) 변환 이미지 생성부에 의하여, 학습용 이미지를 변환하여 변환 이미지를 생성하는 단계; (b) 객체 추출부에 의하여, 상기 학습용 이미지로부터 제1 객체 영역을 추출하고, 상기 변환 이미지로부터 제2 객체 영역을 추출하는 단계; 및 (c) 인공지능 학습부에 의하여, 상기 제1 객체 영역과 상기 제2 객체 영역을 비교하여 인공지능 모델을 학습하는 단계;를 포함할 수 있다.An unsupervised learning-based object location detection method according to one aspect of the disclosed invention includes the steps of: (a) converting a learning image by a conversion image generator to generate a conversion image; (b) extracting a first object area from the learning image and extracting a second object area from the converted image by an object extractor; and (c) learning an artificial intelligence model by comparing the first object area and the second object area by an artificial intelligence learning unit.

또한, 상기 (c) 단계는: 상기 인공지능 학습부에 의하여, 상기 학습용 이미지에서 상기 제1 객체 영역을 제외한 제1 배경 영역과 상기 변환 이미지에서 상기 제2 객체 영역을 제외한 제2 배경 영역을 비교하여 상기 인공지능 모델을 학습하는 단계;를 포함할 수 있다.In addition, step (c) is: comparing, by the artificial intelligence learning unit, a first background area excluding the first object area in the learning image and a second background area excluding the second object area in the converted image. It may include a step of learning the artificial intelligence model.

또한, 상기 (c) 단계는: 상기 제1 객체 영역과 상기 제2 객체 영역의 유사성을 분석하고, 상기 제1 배경 영역과 상기 제1 객체 영역의 차이를 분석하여 손실 함수를 연산하는 단계; 및 상기 손실 함수가 감소하도록 상기 인공지능 모델을 학습하는 단계;를 포함할 수 있다.In addition, step (c) includes: analyzing the similarity between the first object area and the second object area, analyzing the difference between the first background area and the first object area, and calculating a loss function; and learning the artificial intelligence model to reduce the loss function.

또한, 상기 (b) 단계는: (b1) 상기 학습용 이미지를 구성하는 복수개의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널인 제1 채널에 기초하여 상기 제1 객체 영역을 추출하는 단계; 및 (b2) 상기 변환 이미지를 구성하는 복수개의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널인 제2 채널에 기초하여 상기 제2 객체 영역을 추출하는 단계;를 포함할 수 있다.In addition, step (b) is: (b1) extracting the first object area based on the first channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the learning image. ; and (b2) extracting the second object area based on a second channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the converted image.

또한, 상기 (b1) 단계는: 상기 학습용 이미지의 채널을 구성하는 픽셀이 상기 인식된 객체의 정보를 포함하는 픽셀인지 판단하는 단계; 상기 학습용 이미지의 채널에 대하여 상기 객체의 정보를 포함하는 픽셀의 개수를 결정하는 단계; 및 상기 픽셀의 개수에 기초하여 상기 복수개의 학습용 이미지의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널을 상기 제1 채널로 결정하는 단계;를 포함하고, 상기 (b2) 단계는: 상기 변환 이미지의 채널을 구성하는 픽셀이 상기 인식된 객체의 정보를 포함하는 픽셀인지 판단하는 단계; 상기 변환 이미지의 채널에 대하여 상기 객체의 정보를 포함하는 픽셀의 개수를 결정하는 단계; 및 상기 픽셀의 개수에 기초하여 상기 복수개의 변환 이미지의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널을 상기 제2 채널로 결정하는 단계;를 포함할 수 있다.In addition, step (b1) includes: determining whether a pixel constituting a channel of the learning image is a pixel containing information about the recognized object; determining the number of pixels containing information about the object for a channel of the training image; and determining, as the first channel, a channel containing relatively more information about the recognized object among the channels of the plurality of training images based on the number of pixels. Step (b2) includes: determining whether a pixel constituting a channel of the converted image contains information about the recognized object; determining the number of pixels containing information about the object for a channel of the converted image; and determining, as the second channel, a channel containing relatively more information about the recognized object among channels of the plurality of converted images based on the number of pixels.

또한, 객체 추출부에 의하여, 상기 인공지능 모델을 이용하여 입력 이미지로부터 상기 제1 객체 영역을 추출하는 단계; 및 상기 추출된 제1 객체 영역을 기초로 상기 입력 이미지에서의 객체 위치를 결정하는 단계;를 더 포함할 수 있다.Additionally, extracting the first object area from the input image using the artificial intelligence model by an object extractor; and determining an object location in the input image based on the extracted first object area.

또한, 상기 (a) 단계는: 상기 변환 이미지 생성부에 의하여, 상기 학습용 이미지를 미리 정해진 각도로 회전시켜 상기 변환 이미지를 생성하는 단계;를 포함할 수 있다.In addition, step (a) may include: generating the converted image by rotating the learning image at a predetermined angle, by the converted image generator.

또한, 상기 (a) 단계는: 상기 변환 이미지 생성부에 의하여, 상기 학습용 이미지를 확대하거나 축소하여 상기 변환 이미지를 생성하는 단계;를 포함할 수 있다.Additionally, step (a) may include: generating the converted image by enlarging or reducing the learning image by the converted image generator.

또한, 상기 (a) 단계는: 상기 변환 이미지 생성부에 의하여, 상기 학습용 이미지를 대칭 변환하여 상기 변환 이미지를 생성하는 단계;를 포함할 수 있다.In addition, step (a) may include: generating the converted image by symmetrically converting the learning image by the converted image generator.

개시된 발명의 일 측면에 따른 컴퓨터 프로그램은, 상기 비지도 학습 기반의 객체 위치 검출 방법을 실행시키도록 컴퓨터로 판독 가능한 기록매체에 저장될 수 있다.A computer program according to one aspect of the disclosed invention may be stored in a computer-readable recording medium to execute the unsupervised learning-based object location detection method.

개시된 발명의 일 측면에 따른 비지도 학습 기반의 객체 위치 검출 장치는, 학습용 이미지를 변환하여 변환 이미지를 생성하도록 구성되는 변환 이미지 생성부; 상기 학습용 이미지로부터 제1 객체 영역을 추출하고, 상기 변환 이미지로부터 제2 객체 영역을 추출하도록 구성되는 객체 추출부; 및 상기 제1 객체 영역과 상기 제2 객체 영역을 비교하여 인공지능 모델을 학습하도록 구성되는 인공지능 학습부;를 포함할 수 있다.An unsupervised learning-based object location detection device according to one aspect of the disclosed invention includes a conversion image generator configured to convert a learning image to generate a conversion image; an object extraction unit configured to extract a first object area from the learning image and a second object area from the converted image; and an artificial intelligence learning unit configured to learn an artificial intelligence model by comparing the first object area and the second object area.

또한, 상기 인공지능 학습부는, 상기 학습용 이미지에서 상기 제1 객체 영역을 제외한 제1 배경 영역과 상기 변환 이미지에서 상기 제2 객체 영역을 제외한 제2 배경 영역을 비교하여 상기 인공지능 모델을 학습하도록 구성될 수 있다.In addition, the artificial intelligence learning unit is configured to learn the artificial intelligence model by comparing a first background area excluding the first object area in the learning image and a second background area excluding the second object area in the converted image. It can be.

또한, 상기 인공지능 학습부는: 상기 제1 객체 영역과 상기 제2 객체 영역의 유사성을 분석하고, 상기 제1 배경 영역과 상기 제1 객체 영역의 차이를 분석하여 손실 함수를 연산하고; 그리고 상기 손실 함수가 감소하도록 상기 인공지능 모델을 학습하도록 구성될 수 있다.In addition, the artificial intelligence learning unit: analyzes the similarity between the first object area and the second object area, analyzes the difference between the first background area and the first object area, and calculates a loss function; And it may be configured to learn the artificial intelligence model so that the loss function decreases.

또한, 상기 객체 추출부는: 상기 학습용 이미지를 구성하는 복수개의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널인 제1 채널에 기초하여 상기 제1 객체 영역을 추출하고; 그리고 상기 변환 이미지를 구성하는 복수개의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널인 제2 채널에 기초하여 상기 제2 객체 영역을 추출하도록 구성될 수 있다.In addition, the object extractor: extracts the first object area based on a first channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the learning image; Additionally, the second object area may be extracted based on a second channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the converted image.

또한, 상기 객체 추출부는: 상기 학습용 이미지의 채널을 구성하는 픽셀이 상기 인식된 객체의 정보를 포함하는 픽셀인지 판단하고; 상기 학습용 이미지의 채널에 대하여 상기 객체의 정보를 포함하는 픽셀의 개수를 결정하고; 상기 픽셀의 개수에 기초하여 상기 복수개의 학습용 이미지의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널을 상기 제1 채널로 결정하고; 상기 변환 이미지의 채널을 구성하는 픽셀이 상기 인식된 객체의 정보를 포함하는 픽셀인지 판단하고; 상기 변환 이미지의 채널에 대하여 상기 객체의 정보를 포함하는 픽셀의 개수를 결정하고; 그리고 상기 픽셀의 개수에 기초하여 상기 복수개의 변환 이미지의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널을 상기 제2 채널로 결정하도록 구성될 수 있다.In addition, the object extractor: determines whether a pixel constituting a channel of the learning image is a pixel containing information about the recognized object; determining the number of pixels containing information about the object for a channel of the training image; determining a channel containing relatively more information about a recognized object among channels of the plurality of training images as the first channel based on the number of pixels; determining whether a pixel constituting a channel of the converted image is a pixel containing information about the recognized object; determine the number of pixels containing information about the object for a channel of the converted image; And, based on the number of pixels, a channel containing relatively more information about the recognized object among channels of the plurality of converted images may be determined as the second channel.

또한, 프로세서;를 더 포함하고, 상기 객체 추출부는, 상기 인공지능 모델을 이용하여 입력 이미지로부터 상기 제1 객체 영역을 추출하도록 구성되고, 상기 프로세서는, 상기 추출된 제1 객체 영역을 기초로 상기 입력 이미지에서의 객체 위치를 결정하도록 구성될 수 있다.In addition, it further includes a processor, wherein the object extraction unit is configured to extract the first object area from the input image using the artificial intelligence model, and the processor is configured to extract the first object area based on the extracted first object area. It may be configured to determine an object location in an input image.

또한, 상기 변환 이미지 생성부는, 상기 학습용 이미지를 미리 정해진 각도로 회전시켜 상기 변환 이미지를 생성하도록 구성될 수 있다.Additionally, the converted image generator may be configured to generate the converted image by rotating the learning image at a predetermined angle.

또한, 상기 변환 이미지 생성부는, 상기 학습용 이미지를 확대하거나 축소하여 상기 변환 이미지를 생성하도록 구성될 수 있다.Additionally, the converted image generator may be configured to generate the converted image by enlarging or reducing the learning image.

또한, 상기 변환 이미지 생성부는, 상기 학습용 이미지를 대칭 변환하여 상기 변환 이미지를 생성하도록 구성될 수 있다.Additionally, the converted image generator may be configured to symmetrically transform the learning image to generate the converted image.

개시된 발명의 일 측면에 따르면, 학습용 이미지 및 변환 이미지에 기초하여 이미지상 객체의 전체 영역을 커버할 수 있고, 학습용 이미지의 차별적인 부분에만 집중하지 않으면서도 배경 영역을 과도하게 커버하지 않도록 인공지능 모델을 학습하여 객체의 위치를 정확하게 검출할 수 있다.According to one aspect of the disclosed invention, an artificial intelligence model can cover the entire area of the object in the image based on the training image and the converted image, and does not excessively cover the background area while not focusing only on the discriminatory part of the training image. By learning, the location of the object can be accurately detected.

또한, 본 발명의 실시예에 의하면, 인공지능 모델에 대한 학습 횟수가 적더라도 종래의 객체 위치 검출 기술보다 이미지 상에서의 객체 위치를 정확하게 검출하여 객체 위치 검출 성능을 높일 수 있다.In addition, according to an embodiment of the present invention, even if the number of training sessions for the artificial intelligence model is small, object location detection performance can be improved by detecting the object location on the image more accurately than conventional object location detection technology.

도 1은 일 실시예에 따른 객체 위치 검출 장치의 구성도이다.
도 2는 일 실시예에 따른 비지도 학습 방법을 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 손실 함수를 연산하는 방법을 도시한 도면이다.
도 4는 일 실시예에 따라 복수개의 채널 중 일부 채널에 기초하여 객체 영역을 추출하는 것을 설명하기 위한 도면이다.
도 5는 일 실시예에 따라 복수개의 채널 중 일부 채널에 기초하여 객체 영역을 추출하는 것을 설명하기 위한 또다른 도면이다.
도 6은 일 실시예에 따라 입력 이미지에서의 객체 위치를 결정하는 과정을 도시한 도면이다.
도 7은 일 실시예에 따른 객체 위치 검출 결과와 종래의 객체 위치 검출 결과를 나타낸 도면이다.
도 8은 일 실시예에 따른 객체 위치 검출 방법의 순서도이다.
도 9는 일 실시예에 따른 객체 위치 검출 방법이 종래의 객체 위치 검출 방법에 비해 개선된 정도를 나타낸 표이다.
도 10은 일 실시예에 따라 복수개의 채널 중 일부 채널을 결정하는 기준을 설명하기 위한 도면이다.
도 11은 일 실시예에 따라 복수개의 채널 중 일부 채널을 결정하는 기준을 설명하기 위한 또다른 도면이다.
도 12는 다양한 입력 이미지 변환에 대하여 일 실시예에 따른 위치 검출 방법이 종래의 위치 검출 방법에 비하여 개선된 정도를 나타낸 표이다.1 is a configuration diagram of an object location detection device according to an embodiment.
Figure 2 is a diagram for explaining an unsupervised learning method according to an embodiment.
Figure 3 is a diagram illustrating a method of calculating a loss function according to one embodiment.
FIG. 4 is a diagram illustrating extracting an object area based on some channels among a plurality of channels according to an embodiment.
FIG. 5 is another diagram illustrating extracting an object area based on some channels among a plurality of channels according to an embodiment.
FIG. 6 is a diagram illustrating a process for determining an object location in an input image according to an embodiment.
Figure 7 is a diagram showing object position detection results according to one embodiment and conventional object position detection results.
Figure 8 is a flowchart of an object location detection method according to an embodiment.
Figure 9 is a table showing the degree of improvement of the object position detection method according to an embodiment compared to the conventional object position detection method.
FIG. 10 is a diagram for explaining criteria for determining some channels among a plurality of channels according to an embodiment.
FIG. 11 is another diagram for explaining criteria for determining some channels among a plurality of channels according to an embodiment.
Figure 12 is a table showing the degree to which the position detection method according to an embodiment is improved compared to the conventional position detection method for various input image transformations.

명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 명세서가 실시예들의 모든 요소들을 설명하는 것은 아니며, 개시된 발명이 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 '~부'라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의 '~부'가 하나의 구성요소로 구현되거나, 하나의 '~부'가 복수의 구성요소들을 포함하는 것도 가능하다.Like reference numerals refer to like elements throughout the specification. This specification does not describe all elements of the embodiments, and general content or overlapping content between the embodiments in the technical field to which the disclosed invention pertains is omitted. The term '~unit' used in the specification may be implemented as software or hardware, and depending on the embodiments, multiple '~units' may be implemented as one component, or one '~unit' may be implemented as a plurality of components. It is also possible to include elements.

또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Additionally, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary.

제1, 제2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 전술된 용어들에 의해 제한되는 것은 아니다.Terms such as first and second are used to distinguish one component from another component, and the components are not limited by the above-mentioned terms.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.Singular expressions include plural expressions unless the context clearly makes an exception.

각 단계들에 있어 식별부호는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다.The identification code for each step is used for convenience of explanation. The identification code does not explain the order of each step, and each step may be performed differently from the specified order unless a specific order is clearly stated in the context. there is.

이하 첨부된 도면들을 참고하여 개시된 발명의 작용 원리 및 실시예들에 대해 설명한다.Hereinafter, the operating principle and embodiments of the disclosed invention will be described with reference to the attached drawings.

도 1은 일 실시예에 따른 객체 위치 검출 장치의 구성도이며, 도 2는 일 실시예에 따른 비지도 학습 방법을 설명하기 위한 도면이다.FIG. 1 is a configuration diagram of an object location detection device according to an embodiment, and FIG. 2 is a diagram for explaining an unsupervised learning method according to an embodiment.

도 1을 참조하면, 본 발명의 실시예에 따른 객체 검출 장치(100)는 변환 이미지 생성부(110), 객체 추출부(120), 인공지능 학습부(130), 프로세서(140), 메모리(150)를 포함할 수 있다.Referring to FIG. 1, the object detection device 100 according to an embodiment of the present invention includes a conversion image generation unit 110, an object extraction unit 120, an artificial intelligence learning unit 130, a processor 140, and a memory ( 150) may be included.

객체 위치 검출 장치(100)는 객체를 포함하는 학습용 이미지(200) 또는 입력 이미지(300)에서 객체를 인식하고 이미지상에서 인식된 객체의 위치를 결정할 수 있다.The object location detection device 100 may recognize an object in the learning image 200 or the input image 300 including the object and determine the location of the recognized object in the image.

학습용 이미지(200)는 이미지 데이터의 형태로 객체 검출 장치(100)에 입력될 수 있다. 학습용 이미지(200)는 일 실시예에 따른 객체 검출 방법의 비지도 학습에 이용되는 이미지일 수 있다.The training image 200 may be input to the object detection device 100 in the form of image data. The training image 200 may be an image used for unsupervised learning of an object detection method according to an embodiment.

비지도 학습(Unsupervised Learning)이란 기계 학습의 일종으로서, 데이터가 어떻게 구성되어 있는지를 알아내는 학습 중 한 종류일 수 있다. 비지도 학습은 지도 학습(Supervised Learning) 또는 강화 학습(Reinforcement Learning)과는 달리 입력값에 대한 목표값이 주어지지 않을 수 있다.Unsupervised learning is a type of machine learning and can be a type of learning that finds out how data is structured. Unlike supervised learning or reinforcement learning, unsupervised learning may not be given a target value for the input value.

변환 이미지 생성부(110)는 학습용 이미지(200)를 변환하여 변환 이미지(400)를 생성하도록 구성될 수 있다.The converted image generator 110 may be configured to generate the converted image 400 by converting the learning image 200.

이때, 변환 이미지 생성부(110)는 학습용 이미지(200)를 미리 정해진 각도로 회전시켜 변환 이미지(400)를 생성하도록 구성될 수 있다.At this time, the converted image generator 110 may be configured to generate the converted image 400 by rotating the learning image 200 at a predetermined angle.

예를 들어, 변환 이미지 생성부(110)는 학습용 이미지(200)를 시계 방향으로 90도 회전시켜 변환 이미지(400)를 생성하거나, 학습용 이미지(200)를 반시계 방향으로 90도 회전시켜 변환 이미지(400)를 생성할 수 있으나, 학습용 이미지(200)를 회전시키는 각도가 90도에 한정되는 것은 아니다.For example, the converted image generator 110 rotates the learning image 200 90 degrees clockwise to generate the converted image 400, or rotates the learning image 200 90 degrees counterclockwise to create the converted image 400. 400 can be generated, but the angle at which the learning image 200 is rotated is not limited to 90 degrees.

변환 이미지 생성부(110)는 학습용 이미지(200)를 확대하거나 축소하여 변환 이미지(400)를 생성하도록 구성될 수 있다.The converted image generator 110 may be configured to generate the converted image 400 by enlarging or reducing the learning image 200.

예를 들어, 변환 이미지 생성부(110)는 학습용 이미지(200)의 크기를 2배 확대하여 변환 이미지(400)를 생성하거나, 학습용 이미지(200)의 크기를 절반으로 축소하여 변환 이미지(400)를 생성할 수 있으나, 학습용 이미지(200)를 확대 또는 축소시키는 스케일(scale)이 이에 한정되는 것은 아니다.For example, the converted image generator 110 generates the converted image 400 by enlarging the size of the learning image 200 by two times, or reduces the size of the learning image 200 by half to create the converted image 400. However, the scale for enlarging or reducing the learning image 200 is not limited to this.

변환 이미지 생성부(110)는 학습용 이미지(200)를 대칭 변환하여 변환 이미지(400)를 생성하도록 구성될 수 있다.The converted image generator 110 may be configured to symmetrically transform the learning image 200 to generate the converted image 400.

전술한 바와 같이 변환 이미지 생성부(110)는 학습용 이미지(200)를 회전, 확대, 축소 및 대칭 변환의 방법으로 변환하여 변환 이미지(400)를 생성할 수 있으나, 학습용 이미지(200)를 변환시켜 변환 이미지(400)를 생성하는 방법이 이에 한정되는 것은 아니며, 다양한 변환 방법이 이용될 수 있다.As described above, the conversion image generator 110 may generate the conversion image 400 by converting the learning image 200 by rotating, enlarging, reducing, and symmetrically transforming the learning image 200. The method of generating the converted image 400 is not limited to this, and various conversion methods may be used.

객체 추출부(120)는 학습용 이미지(200)로부터 제1 객체 영역(201)을 추출하고, 변환 이미지(400)로부터 제2 객체 영역(401)을 추출하도록 구성될 수 있다.The object extractor 120 may be configured to extract a first object area 201 from the training image 200 and a second object area 401 from the converted image 400.

객체 추출부(120)는 딥러닝 기반의 객체 검출 기술에 기초하여 이미지로부터 객체 영역을 인식하는 인코더일 수 있다.The object extractor 120 may be an encoder that recognizes an object area from an image based on deep learning-based object detection technology.

딥러닝 기반의 객체 검출 기술은 이미지로부터 추출되는 객체 영역의 특징(feature)을 데이터를 기반으로 학습한다. 이때, 이미지로부터 특징(feature)을 추출하는 방식을 학습하기 위해 여러 단계의 컨볼루션 계층(convolution layer)을 쌓은 CNN(Convolutional Neural Networks) 구조가 활용될 수 있으나 이에 한정되는 것은 아니다.Deep learning-based object detection technology learns the features of object areas extracted from images based on data. At this time, a CNN (Convolutional Neural Networks) structure that stacks several stages of convolution layers can be used to learn how to extract features from images, but is not limited to this.

도 2를 참조하면, 인공지능 학습부(130)는 제1 객체 영역(201)과 제2 객체 영역(401)을 비교하여 인공지능 모델을 학습하도록 구성될 수 있다.Referring to FIG. 2, the artificial intelligence learning unit 130 may be configured to learn an artificial intelligence model by comparing the first object area 201 and the second object area 401.

즉, 인공지능 학습부(130)는 변환되기 전의 학습용 이미지(200)에서 추출된 객체 영역과 변환된 후의 변환 이미지(400)에서 추출된 객체 영역을 비교하여 인공지능 모델을 학습할 수 있다.That is, the artificial intelligence learning unit 130 can learn an artificial intelligence model by comparing the object area extracted from the training image 200 before conversion and the object area extracted from the converted image 400 after conversion.

객체 추출부(120)는 학습용 이미지(200)와 변환 이미지(400)에 대해서 각각 개별적으로 객체를 추출할 수 있다. 따라서, 제1 객체 영역(201)과 제2 객체 영역(401)이 반드시 일치하는 것은 아닐 수 있다. 하지만, 제2 객체 영역(401)은 학습용 이미지(200)가 일정한 방식의 변환 과정을 거쳐 생성된 변환 이미지(400)로부터 추출된 것이므로 제1 객체 영역(201)과 어느정도 일치하는 경향을 가질 수 있다. 구체적으로, 제2 객체 영역(401)을 역변환하여 생성된 영역은 제1 객체 영역(201)과 어느정도 일치하는 형태의 객체 영역일 수 있다.The object extractor 120 may individually extract objects from the learning image 200 and the conversion image 400. Accordingly, the first object area 201 and the second object area 401 may not necessarily coincide. However, the second object area 401 may tend to match the first object area 201 to some extent because the learning image 200 is extracted from the converted image 400 generated through a certain conversion process. . Specifically, the area created by inversely transforming the second object area 401 may be an object area that matches the first object area 201 to some extent.

예를 들어, 변환 이미지(400)가 학습용 이미지(200)를 시계 방향으로 90도 회전시켜 생성된 이미지일 경우, 제2 객체 영역(401)을 반시계 방향으로 90도 회전시켜 생성된 영역은 제1 객체 영역(201)과 어느정도 일치하는 형태의 객체 영역일 수 있다.For example, if the converted image 400 is an image created by rotating the learning image 200 90 degrees clockwise, the area created by rotating the second object area 401 90 degrees counterclockwise is the second object area 401. 1 It may be an object area that matches the object area 201 to some extent.

만약 객체 위치 검출 장치(100)의 성능이 뛰어나다면 변환되기 전의 이미지인 학습용 이미지(200)에서 객체를 인식하든, 변환된 후의 이미지인 변환 이미지(400)에서 객체를 인식하든 거의 일치하는 영역을 제1 객체 영역(201) 및 제2 객체 영역(401)으로 추출할 것이다. 따라서 일 실시예에 따른 인공지능 학습부(130)는 학습을 반복할수록 제1 객체 영역(201)과 제2 객체 영역(401)이 서로 유사한 영역이 되도록 인공지능 모델을 학습할 수 있다.If the performance of the object position detection device 100 is excellent, an almost identical area will be generated whether the object is recognized in the training image 200, which is the image before conversion, or the converted image 400, which is the image after conversion. It will be extracted into the first object area (201) and the second object area (401). Therefore, the artificial intelligence learning unit 130 according to one embodiment can learn the artificial intelligence model so that the first object area 201 and the second object area 401 become similar areas as learning is repeated.

인공지능 학습부(130)는 제1 배경 영역(202)과 제2 배경 영역(402)을 비교하여 인공지능 모델을 학습하도록 구성될 수 있다.The artificial intelligence learning unit 130 may be configured to learn an artificial intelligence model by comparing the first background area 202 and the second background area 402.

제1 배경 영역(202)은 학습용 이미지(200)에서 제1 객체 영역(201)이 제외된 영역일 수 있다. 또한, 제2 배경 영역(402)은 변환 이미지(400)에서 제2 객체 영역(401)이 제외된 영역일 수 있다.The first background area 202 may be an area in which the first object area 201 is excluded from the learning image 200. Additionally, the second background area 402 may be an area in which the second object area 401 is excluded from the converted image 400.

즉, 인공지능 학습부(130)는 변환되기 전의 학습용 이미지(200)에서 추출된 배경 영역과 변환된 후의 변환 이미지(400)에서 추출된 배경 영역을 비교하여 인공지능 모델을 학습할 수 있다.That is, the artificial intelligence learning unit 130 can learn an artificial intelligence model by comparing the background area extracted from the training image 200 before conversion with the background area extracted from the converted image 400 after conversion.

제2 배경 영역(402)은 학습용 이미지(200)가 일정한 방식의 변환 과정을 거쳐 생성된 변환 이미지(400)로부터 추출된 것이므로 제1 배경 영역(202)과 어느정도 일치하는 경향을 가질 수 있다. 구체적으로, 제2 배경 영역(402)을 역변환하여 생성된 영역은 제1 배경 영역(202)과 어느정도 일치하는 형태의 객체 영역일 수 있다.Since the second background area 402 is extracted from the converted image 400 generated through a certain conversion process of the learning image 200, it may tend to match the first background area 202 to some extent. Specifically, the area created by inversely transforming the second background area 402 may be an object area that matches the first background area 202 to some extent.

예를 들어, 변환 이미지(400)가 학습용 이미지(200)를 시계 방향으로 90도 회전시켜 생성된 이미지일 경우, 제2 배경 영역(402)을 반시계 방향으로 90도 회전시켜 생성된 영역은 제1 배경 영역(202)과 어느정도 일치하는 형태의 객체 영역일 수 있다.For example, if the converted image 400 is an image created by rotating the learning image 200 90 degrees clockwise, the area created by rotating the second background area 402 90 degrees counterclockwise is the second background area 402. 1 It may be an object area that matches the background area 202 to some extent.

만약 객체 위치 검출 장치(100)의 성능이 뛰어나다면 거의 일치하는 영역을 제1 배경 영역(202) 및 제2 배경 영역(402)으로 추출할 것이다. 따라서 일 실시예에 따른 인공지능 학습부(130)는 학습을 반복할수록 제1 배경 영역(202)과 제2 배경 영역(402)이 서로 유사한 영역이 되도록 인공지능 모델을 학습할 수 있다.If the object location detection apparatus 100 has excellent performance, almost identical areas will be extracted as the first background area 202 and the second background area 402. Therefore, the artificial intelligence learning unit 130 according to one embodiment can learn the artificial intelligence model so that the first background area 202 and the second background area 402 become similar areas as learning is repeated.

인공지능 학습부(130)는 학습용 데이터를 통해, 학습용 이미지(200)의 특징을 입력 변수로 설정하고 이미지에서의 객체의 위치를 출력 변수로 설정하여 인공지능 모델을 학습하도록 구성될 수 있다.The artificial intelligence learning unit 130 may be configured to learn an artificial intelligence model by setting the characteristics of the learning image 200 as input variables and the position of the object in the image as an output variable through learning data.

학습용 이미지(200)의 특징은 해당 학습용 이미지(200)에 대한 다양한 특징을 나타내는 정보일 수 있다. 예를 들어, 학습용 이미지(200)의 특징은 해당 학습용 이미지(200)의 각 픽셀 단위에서의 색상, 명도 등에 대한 정보일 수 있다.The characteristics of the learning image 200 may be information representing various characteristics of the corresponding learning image 200. For example, the characteristics of the learning image 200 may be information about color, brightness, etc. in each pixel unit of the learning image 200.

인공지능 모델을 학습하는 것은 기계 학습(Machine Learning) 방식을 통한 학습일 수 있다. 기계 학습이란 다수의 파라미터로 구성된 모델을 이용하며, 주어진 데이터로 파라미터를 최적화하는 것을 의미할 수 있다. 인공지능 학습부(130)는 입력에 따라 인공지능 모델을 통해 최종적으로 나온 결과물(output)인 출력들을 이용하여 인공지능 모델을 학습할 수 있다. 인공지능 모델은 메모리(150)에 저장될 수 있다.Learning an artificial intelligence model can be done through machine learning. Machine learning can mean using a model composed of multiple parameters and optimizing the parameters with given data. The artificial intelligence learning unit 130 can learn an artificial intelligence model using outputs that are the final results obtained through the artificial intelligence model according to the input. The artificial intelligence model may be stored in memory 150.

변환 이미지 생성부(110), 객체 추출부(120), 인공지능 학습부(130)는 객체 위치 검출 장치(100)에 포함된 복수개의 프로세서(140) 중 어느 하나의 프로세서(140)를 포함할 수 있다. 또한, 지금까지 설명된 본 발명의 실시예 및 앞으로 설명할 실시예에 따른 객체 인식 방법은, 프로세서(140)에 의해 구동될 수 있는 프로그램의 형태로 구현될 수 있다.The conversion image generator 110, the object extraction unit 120, and the artificial intelligence learning unit 130 may include one processor 140 among the plurality of processors 140 included in the object location detection device 100. You can. Additionally, the object recognition method according to the embodiments of the present invention described so far and the embodiments to be described in the future may be implemented in the form of a program that can be driven by the processor 140.

여기서 프로그램은, 프로그램 명령, 데이터 파일 및 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 프로그램은 기계어 코드나 고급 언어 코드를 이용하여 설계 및 제작된 것일 수 있다. 프로그램은 상술한 부호 수정을 위한 방법을 구현하기 위하여 특별히 설계된 것일 수도 있고, 컴퓨터 소프트웨어 분야에서 통상의 기술자에게 기 공지되어 사용 가능한 각종 함수나 정의를 이용하여 구현된 것일 수도 있다. 전술한 정보 표시 방법을 구현하기 위한 프로그램은, 프로세서(140)에 의해 판독 가능한 기록매체에 기록될 수 있다. 이때, 기록매체는 메모리(150)일 수 있다.Here, the program may include program instructions, data files, and data structures, etc., singly or in combination. Programs may be designed and produced using machine code or high-level language code. The program may be specially designed to implement the above-described method for modifying the code, or may be implemented using various functions or definitions known and available to those skilled in the art in the field of computer software. A program for implementing the above-described information display method may be recorded on a recording medium readable by the processor 140. At this time, the recording medium may be the memory 150.

메모리(150)는 전술한 동작 및 후술하는 동작을 수행하는 프로그램을 저장할 수 있으며, 메모리(150)는 저장된 프로그램을 실행시킬 수 있다. 프로세서(140)와 메모리(150)가 복수인 경우에, 이들이 하나의 칩에 집적되는 것도 가능하고, 물리적으로 분리된 위치에 마련되는 것도 가능하다. 메모리(150)는 데이터를 일시적으로 기억하기 위한 S램(Static Random Access Memory, S-RAM), D랩(Dynamic Random Access Memory) 등의 휘발성 메모리를 포함할 수 있다. 또한, 메모리(150)는 제어 프로그램 및 제어 데이터를 장기간 저장하기 위한 롬(Read Only Memory), 이피롬(Erasable Programmable Read Only Memory: EPROM), 이이피롬(Electrically Erasable Programmable Read Only Memory: EEPROM) 등의 비휘발성 메모리를 포함할 수 있다.The memory 150 can store programs that perform the operations described above and the operations described later, and the memory 150 can execute the stored programs. When the processor 140 and the memory 150 are plural, they may be integrated into one chip or may be provided in physically separate locations. The memory 150 may include volatile memory such as Static Random Access Memory (S-RAM) or Dynamic Random Access Memory (D-Lab) for temporarily storing data. In addition, the memory 150 includes read only memory (ROM), erasable programmable read only memory (EPROM), and electrically erasable programmable read only memory (EEPROM) for long-term storage of control programs and control data. May include non-volatile memory.

프로세서(140)는 각종 논리 회로와 연산 회로를 포함할 수 있으며, 메모리(150)로부터 제공된 프로그램에 따라 데이터를 처리하고, 처리 결과에 따라 제어 신호를 생성할 수 있다.The processor 140 may include various logic circuits and operation circuits, process data according to a program provided from the memory 150, and generate a control signal according to the processing results.

도 3은 일 실시예에 따른 손실 함수를 연산하는 방법을 도시한 도면이다.Figure 3 is a diagram illustrating a method of calculating a loss function according to one embodiment.

도 3을 참조하면, 인공지능 학습부(130)는 제1 객체 영역(201)과 제2 객체 영역(401)의 유사성을 분석하고, 제1 배경 영역(202)과 제1 객체 영역(201)의 차이를 분석하여 손실 함수를 연산할 수 있다.Referring to FIG. 3, the artificial intelligence learning unit 130 analyzes the similarity between the first object area 201 and the second object area 401, and determines the similarity between the first background area 202 and the first object area 201. The loss function can be calculated by analyzing the difference.

[방정식 1][Equation 1]

구체적으로, [방정식 1]을 참조하면, Mfg는 이미지의 배경 영역을 산출하기 위하여 객체 영역을 마킹하는데 이용되는 인자이고, A는 attention map으로서 이미지에서 각 픽셀별로 객체 영역으로 인식된 정도를 나타내는 임베딩 값일 수 있다. Mfg는 각 픽셀에 대해서 1에서 A를 차감한 값(1-A)이 미리 정해진 임계값(θbg)을 초과하면 1로 설정될 수 있으며, 각 픽셀에 대해서 1에서 A를 차감한 값(1-A)이 미리 정해진 임계값(θbg) 이하이면 0으로 설정될 수 있다.Specifically, referring to [Equation 1], Mfg is a factor used to mark the object area to calculate the background area of the image, and A is an attention map, an embedding that indicates the degree to which each pixel in the image is recognized as an object area. It can be a value. Mfg can be set to 1 if the value obtained by subtracting A from 1 for each pixel (1-A) exceeds a predetermined threshold value θbg, and the value obtained by subtracting A from 1 for each pixel is 1-A. If A) is less than or equal to a predetermined threshold value (θbg), it may be set to 0.

[방정식 2][Equation 2]

[방정식 2]를 참조하면, Abg는 background attention map으로서 객체 영역이 마킹이 된 상태의 이미지에 대응되는 임베딩 인자일 수 있다. Abg는 각 픽셀에 대해서 1에서 A를 차감한 값(1-A)에 각 픽셀 별 Mfg값을 곱하여 산출할 수 있다. 결과적으로 Abg는 이미지에서 객체 영역이 제외되어 배경 영역을 나타내는 맵일 수 있다.Referring to [Equation 2], Abg is a background attention map and may be an embedding factor corresponding to an image in which the object area is marked. Abg can be calculated by subtracting A from 1 for each pixel (1-A) and multiplying the Mfg value for each pixel. As a result, Abg may be a map representing the background area by excluding the object area from the image.

[방정식 3][Equation 3]

[방정식 3]을 참조하면, 는 일 실시예에 따른 손실함수일 수 있다. 구체적으로, 는 트리플렛 손실(Triplet Loss)을 나타내는 함수일 수 있다.Referring to [Equation 3], may be a loss function according to one embodiment. Specifically, May be a function representing triplet loss.

트리플렛 손실은 앵커(anchor)를 positive input 및 negative input과 비교하는 인공 신경 네트워크에 대한 손실 함수(loss function)일 수 있다. 이때 앵커 인풋(anchor input) 과 positive input 사이의 거리(distance)는 최소화되어야 하며, 앵커 인풋(anchor input) 과 negative input과의 거리(distance)는 최대가 되도록 하는 것이 바람직할 수 있다. 트리플렛 손실은 워드 임베딩(word embeddings), 벡터 학습, 행렬 학습과 같은 임베딩 학습의 목적으로 유사성을 학습하는데 이용될 수 있다. 트리플렛 손실 함수(Triplet Loss Function)는 [방정식 3]과 같은 형태의 유클리디안 거리 함수(Euclidean Distance Function)의 형태를 할 수 있다.Triplet loss can be a loss function for an artificial neural network that compares an anchor with positive and negative inputs. At this time, it may be desirable to minimize the distance between the anchor input and the positive input, and to maximize the distance between the anchor input and the negative input. Triplet loss can be used to learn similarity for the purpose of embedding learning, such as word embeddings, vector learning, and matrix learning. The triplet loss function may take the form of a Euclidean distance function such as [Equation 3].

학습용 이미지(200)에 대한 임베딩 값인 Aorig 및 변환 이미지(400)에 대한 임베딩 값인 Atf는 positive input에 해당하는 임베딩 값일 수 있다.Aorig, which is an embedding value for the training image 200, and Atf, an embedding value for the converted image 400, may be an embedding value corresponding to a positive input.

Aorig2tf는 Aorig에 대하여 학습용 이미지(200)에 가해진 변환과 동일한 변환을 했을 때 생성되는 임베딩 값일 수 있다. Atf2orig는 Atf에 대하여 학습용 이미지(200)에 가해진 변환의 역변환을 했을 때 생성되는 임베딩 값일 수 있다. 이러한 Atf2orig 및 Aorig2tf는 앵커 인풋(anchor input)에 해당하는 임베딩 값일 수 있다. Abg는 negative inpout에 해당하는 임베딩 값일 수 있다.Aorig2tf may be an embedding value generated when the same transformation as the transformation applied to the training image 200 is performed on Aorig. Atf2orig may be an embedding value generated when the transformation applied to the training image 200 is inversely transformed to Atf. These Atf2orig and Aorig2tf may be embedding values corresponding to anchor input. Abg may be an embedding value corresponding to negative inpout.

인공지능 학습부(130)는 손실 함수가 감소하도록 인공지능 모델을 학습할 수 있다. 즉, 인공지능 학습부(130)는 일 실시예에 따른 트리플렛 손실 함수()가 최소화되도록 학습을 수행할 수 있다.The artificial intelligence learning unit 130 can learn an artificial intelligence model so that the loss function is reduced. That is, the artificial intelligence learning unit 130 uses a triplet loss function ( ) can be performed so that learning is minimized.

즉, 인공지능 학습부(130)는 Atf2orig와 Aorig 사이의 거리(distance)가 최소가 되고, Atf2orig와 Abg와의 거리(distance)는 최대가 되도록 학습을 반복하여 수행할 수 있다. 또한, 인공지능 학습부(130)는 Aorig2tf와 Atf 사이의 거리(distance)가 최소가 되고, Aorig2tf와 Abg와의 거리(distance)는 최대가 되도록 학습을 반복하여 수행할 수 있다.That is, the artificial intelligence learning unit 130 can repeatedly perform learning so that the distance between Atf2orig and Aorig is minimized and the distance between Atf2orig and Abg is maximized. Additionally, the artificial intelligence learning unit 130 may repeatedly perform learning so that the distance between Aorig2tf and Atf is minimized and the distance between Aorig2tf and Abg is maximized.

결과적으로, 인공지능 학습부(130)는 앵커 인풋(anchor input) 과 positive input 사이의 거리(distance)는 최소화하고, 앵커 인풋(anchor input) 과 negative input과의 거리(distance)를 최대화하여 손실 함수가 감소하도록 인공지능 모델을 학습할 수 있다.As a result, the artificial intelligence learning unit 130 minimizes the distance between the anchor input and the positive input and maximizes the distance between the anchor input and the negative input to create a loss function. You can learn an artificial intelligence model to reduce .

도 4는 일 실시예에 따라 복수개의 채널 중 일부 채널에 기초하여 객체 영역을 추출하는 것을 설명하기 위한 도면이며, 도 5는 일 실시예에 따라 복수개의 채널 중 일부 채널에 기초하여 객체 영역을 추출하는 것을 설명하기 위한 또다른 도면이다.FIG. 4 is a diagram illustrating extracting an object area based on some of a plurality of channels according to an embodiment, and FIG. 5 is a diagram illustrating extracting an object area based on some of a plurality of channels according to an embodiment. This is another drawing to explain what to do.

도 4를 참조하면, 기존의 객체 위치 검출 방법은 학습용 이미지(200)를 구성하는 복수개의 채널 전부에 대해서 풀링(Pooling)을 수행하여 객체 영역을 추출했다. 즉, 종래의 방법은 각 채널에 포함된 객체의 정보와 상관없이 단순히 각 채널들에 대한 객체 추출 결과를 평균 내어 최종적으로 객체 영역을 추출하였다.Referring to FIG. 4, the existing object location detection method extracts the object area by performing pooling on all of the plurality of channels constituting the training image 200. That is, the conventional method ultimately extracted the object area by simply averaging the object extraction results for each channel, regardless of the object information included in each channel.

이러한 기존의 방식은 객체의 정보가 다른 채널에 비하여 다소 적게 포함된 채널에 대해서도 다른 채널과 동일한 비중을 두어 객체 영역을 추출하게 되므로 객체의 주변 배경에 대해서도 잘못된 인식을 수행할 수 있다는 문제가 있었다.This existing method extracts the object area by giving the same weight to other channels even for channels that contain slightly less object information than other channels, so there is a problem that it may perform incorrect recognition of the surrounding background of the object.

도 4 및 도 5를 참조하면, 일 실시예에 따른 객체 추출부(120)는 학습용 이미지(200)를 구성하는 복수개의 채널들 중 일부 채널인 제1 채널에 기초하여 제1 객체 영역(201)을 추출할 수 있다. 제1 채널은 복수개의 채널들 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널일 수 있다.Referring to FIGS. 4 and 5, the object extractor 120 according to one embodiment extracts the first object area 201 based on the first channel, which is a partial channel among the plurality of channels constituting the learning image 200. can be extracted. The first channel may be a channel that contains relatively more information about the recognized object among the plurality of channels.

또한, 객체 추출부(120)는 변환 이미지(400)를 구성하는 복수개의 채널들 중 일부 채널인 제2 채널에 기초하여 제2 객체 영역(401)을 추출할 수 있다. 제2 채널은 복수개의 채널들 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널일 수 있다.Additionally, the object extractor 120 may extract the second object area 401 based on the second channel, which is a partial channel among the plurality of channels constituting the converted image 400. The second channel may be a channel that contains relatively more information about the recognized object among the plurality of channels.

즉, 객체 추출부(120)는 복수개의 채널들 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널에 대해 상대적으로 많은 가중치를 부여하고, 인식된 객체의 정보가 상대적으로 적게 포함된 채널에 대해 상대적으로 적은 가중치를 부여하여 평균을 내는 방식으로 객체의 영역을 추출할 수 있다.That is, the object extractor 120 assigns a relatively large weight to a channel containing relatively much information about the recognized object among the plurality of channels, and to a channel containing relatively little information about the recognized object. The area of an object can be extracted by assigning a relatively small weight and averaging.

결과적으로, 일 실시예에 따른 객체 위치 검출 방법은 객체의 정보가 상대적으로 많이 포함된 채널에 보다 높은 가중치를 두어 객체 영역을 추출할 수 있어서, 종래의 기술에 비하여 주변 배경을 객체로 인식하는 오류가 적게 발생할 수 있다는 효과가 있다.As a result, the object location detection method according to one embodiment can extract the object area by placing a higher weight on the channel containing relatively more information about the object, resulting in an error in recognizing the surrounding background as an object compared to the conventional technology. This has the effect of reducing the occurrence of .

학습용 이미지(200) 및 변환 이미지(400)는 복수개의 픽셀로 구성될 수 있다. 이때 일부 픽셀은 이미지에 포함된 객체의 정보를 포함하는 픽셀이고, 일부 픽셀은 객체의 정보를 전혀 포함하지 않은 주변 배경에 대한 픽셀일 수 있다.The learning image 200 and the converted image 400 may be composed of a plurality of pixels. At this time, some pixels may be pixels containing information about the object included in the image, and some pixels may be pixels for the surrounding background that do not contain any information about the object.

객체 추출부(120)는 학습용 이미지(200)의 채널을 구성하는 픽셀이 인식된 객체의 정보를 포함하는 픽셀인지 판단하고, 해당 학습용 이미지(200)의 채널에 대하여 객체의 정보를 포함하는 픽셀의 개수를 결정할 수 있다.The object extractor 120 determines whether the pixel constituting the channel of the training image 200 is a pixel containing information on the recognized object, and determines whether the pixel containing information on the object for the channel of the training image 200 is selected. The number can be determined.

객체 추출부(120)는 픽셀의 개수에 기초하여 복수개의 학습용 이미지(200)의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널을 제1 채널로 결정할 수 있다.The object extractor 120 may determine a channel containing relatively more information about the recognized object among the channels of the plurality of training images 200 as the first channel based on the number of pixels.

이때, 제1 채널을 결정하는 방식은 복수개의 채널들 중 객체의 정보를 포함하는 픽셀의 개수가 많은 상위 몇 퍼센트의 채널들을 제1 채널로 결정하는 방식일 수 있다.At this time, a method of determining the first channel may be a method of determining the top few percent of channels with a large number of pixels containing object information among a plurality of channels as the first channel.

예를 들어, 채널의 개수가 4이고, 미리 설정된 비율 값이 50% 라면 객체 추출부(120)는 4개의 채널 중 객체의 정보를 포함하는 픽셀의 개수가 가장 많은 채널과 두번째로 많은 채널, 즉 2개의 채널을 제1 채널로 결정할 수 있다.For example, if the number of channels is 4 and the preset ratio value is 50%, the object extractor 120 selects the channel with the largest number of pixels containing object information among the four channels and the channel with the second largest number, that is, Two channels can be determined as first channels.

한편, 전술한 방식은 학습용 이미지(200)에서 객체를 추출하는 과정만이 아니라 변환 이미지(400)에서 객체를 추출하는 과정에서도 사용될 수 있다.Meanwhile, the above-described method can be used not only in the process of extracting an object from the learning image 200 but also in the process of extracting an object from the conversion image 400.

객체 추출부(120)는 변환 이미지(400)의 채널을 구성하는 픽셀이 인식된 객체의 정보를 포함하는 픽셀인지 판단하고, 변환 이미지(400)의 채널에 대하여 객체의 정보를 포함하는 픽셀의 개수를 결정할 수 있다.The object extractor 120 determines whether the pixel constituting the channel of the converted image 400 is a pixel containing information about the recognized object, and calculates the number of pixels containing information about the object for the channel of the converted image 400. can be decided.

또한, 객체 추출부(120)는 픽셀의 개수에 기초하여 복수개의 변환 이미지(400)의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널을 제2 채널로 결정할 수 있다.Additionally, the object extractor 120 may determine a channel containing relatively more information about the recognized object among the channels of the plurality of converted images 400 as the second channel based on the number of pixels.

한편, 제1 채널 및 제2 채널을 결정하는 방식이 반드시 복수개의 채널들 중 객체의 정보를 포함하는 픽셀의 개수가 많은 상위 몇 퍼센트의 채널들을 제1 채널 및 제2 채널로 결정하는 방식으로 한정되는 것은 아니며, 복수개의 채널들 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널에 대해 상대적으로 많은 가중치를 부여하여 평균을 내는 방식으로 객체의 영역을 추출할 수 있다면 어떠한 방식이 사용되더라도 상관없다.Meanwhile, the method of determining the first channel and the second channel is limited to determining the top few percent of channels with the largest number of pixels containing object information among a plurality of channels as the first channel and the second channel. It does not matter which method is used as long as the area of the object can be extracted by assigning a relatively large weight to the channel containing relatively more information about the recognized object among the plurality of channels and averaging it. .

도 6은 일 실시예에 따라 입력 이미지에서의 객체 위치를 결정하는 과정을 도시한 도면이다.FIG. 6 is a diagram illustrating a process for determining an object location in an input image according to an embodiment.

도 6을 참조하면, 객체 추출부(120)는 인공지능 모델을 이용하여 입력 이미지(300)로부터 제1 객체 영역(201)을 추출하도록 구성될 수 있다. 즉, 객체 추출부(120)는 학습용 이미지(200)에 기초하여 학습된 인공지능 모델을 통하여 실제로 사용자가 객체의 위치를 알고자 하는 입력 이미지(300)로부터 객체 영역을 추출할 수 있다.Referring to FIG. 6, the object extraction unit 120 may be configured to extract the first object area 201 from the input image 300 using an artificial intelligence model. That is, the object extractor 120 can extract the object area from the input image 300, where the user wants to know the location of the object, through an artificial intelligence model learned based on the training image 200.

프로세서(140)는 추출된 제1 객체 영역(201)을 기초로 입력 이미지(300)에서의 객체 위치를 결정할 수 있다. 이때, 프로세서(140)는 제1 객체 영역(201)을 기초로 입력 이미지(300) 상에 바운딩 박스(Bounding box)를 생성하고, 바운딩 박스의 좌표에 기초하여 입력 이미지(300)에서의 객체 위치를 결정할 수 있다.The processor 140 may determine the object location in the input image 300 based on the extracted first object area 201. At this time, the processor 140 creates a bounding box on the input image 300 based on the first object area 201 and determines the object location in the input image 300 based on the coordinates of the bounding box. can be decided.

도 7은 일 실시예에 따른 객체 위치 검출 결과와 종래의 객체 위치 검출 결과를 나타낸 도면이다.Figure 7 is a diagram showing object position detection results according to one embodiment and conventional object position detection results.

도 7을 참조하면, 인공지능 모델을 학습하지 않은 상태(0 epoch), 인공지능 모델을 10번 학습한 상태(10 epoch) 및 인공지능 모델을 40번 학습한 상태(40 epoch) 별로 객체 위치 검출을 수행했을 때 종래의 객체 위치 검출 방법(Baseline) 및 본 발명의 객체 위치 검출 방법(Ours)에 따른 객체 위치 검출 결과를 확인할 수 있다.Referring to Figure 7, object location detection for each state in which the artificial intelligence model was not trained (0 epoch), the artificial intelligence model was trained 10 times (10 epochs), and the artificial intelligence model was trained 40 times (40 epochs). When performing, you can check the object position detection results according to the conventional object position detection method (Baseline) and the object position detection method of the present invention (Ours).

적색 박스(red box)는 미리 정해진 객체 위치의 정답(ground truth)을 나타낸 바운딩 박스일 수 있다.A red box may be a bounding box indicating the ground truth of a predetermined object location.

녹색 박스(green box)는 종래의 객체 위치 검출 방법 및 본 발명의 객체 위치 검출 방법에 따라 검출된 객체의 위치를 나타낸 바운딩 박스일 수 있다.The green box may be a bounding box indicating the location of the object detected according to the conventional object location detection method and the object location detection method of the present invention.

도 7을 참조하면, 인공지능 모델을 학습하지 않은 상태에서도 본 발명의 객체 위치 검출 방법(Ours)이 종래의 방법(Baseline)보다 정답에 가까운 객체 위치를 검출해낸 것을 확인할 수 있다. 즉, 본 발명의 객체 위치 검출 방법(Ours)은 객체의 주변 배경을 객체로 잘못 인식하는 오류가 덜 일어나는 것을 확인할 수 있다.Referring to Figure 7, it can be seen that the object location detection method (Ours) of the present invention detects an object location that is closer to the correct answer than the conventional method (Baseline) even without learning an artificial intelligence model. In other words, it can be confirmed that the object location detection method (Ours) of the present invention causes fewer errors in misrecognizing the surrounding background of an object as an object.

이는 본 발명의 객체 위치 검출 방법의 경우 종래의 방법과 달리 단순히 각 채널들에 대한 객체 추출 결과를 평균 내어 최종적으로 객체 영역을 추출한 것이 아니라 객체의 정보가 상대적으로 많이 포함된 채널에 보다 높은 가중치를 두어 객체 영역을 추출하였기 때문에 나타난 결과일 수 있다.In the case of the object location detection method of the present invention, unlike the conventional method, the final object area is not extracted by simply averaging the object extraction results for each channel, but rather a higher weight is assigned to the channel containing relatively more object information. This may be the result because two object areas were extracted.

인공지능 모델을 10번 학습한 상태 및 인공지능 모델을 40번 학습한 상태에서는 본 발명의 객체 위치 검출 방법(Ours)이 정답에 가까운 객체 위치를 보다 정확하게 검출한 것을 확인할 수 있다.It can be seen that the object location detection method (Ours) of the present invention more accurately detected the object location close to the correct answer in the state where the artificial intelligence model was learned 10 times and the artificial intelligence model was learned 40 times.

이상에서 설명된 구성요소들의 성능에 대응하여 적어도 하나의 구성요소가 추가되거나 삭제될 수 있다. 또한, 구성요소들의 상호 위치는 시스템의 성능 또는 구조에 대응하여 변경될 수 있다는 것은 당해 기술 분야에서 통상의 지식을 가진 자에게 용이하게 이해될 것이다.At least one component may be added or deleted in response to the performance of the components described above. Additionally, it will be easily understood by those skilled in the art that the mutual positions of the components may be changed in response to the performance or structure of the system.

도 8은 일 실시예에 따른 객체 위치 검출 방법의 순서도이다. 이는 본 발명의 목적을 달성하기 위한 바람직한 실시예일 뿐이며, 필요에 따라 일부 구성이 추가되거나 삭제될 수 있음은 물론이다.Figure 8 is a flowchart of an object location detection method according to an embodiment. This is only a preferred embodiment for achieving the purpose of the present invention, and of course, some components may be added or deleted as needed.

도 8을 참조하면, 변환 이미지 생성부(110)는 학습용 이미지(200)를 변환하여 변환 이미지(400)를 생성할 수 있다(1001). 이때, 변환 이미지 생성부(110)는 학습용 이미지(200)를 미리 정해진 각도로 회전시키거나, 확대하거나, 축소하거나, 대칭 변환하는 방식 중 하나의 방식으로 변환 이미지(400)를 생성할 수 있다.Referring to FIG. 8, the converted image generator 110 may generate a converted image 400 by converting the learning image 200 (1001). At this time, the converted image generator 110 may generate the converted image 400 by rotating the learning image 200 at a predetermined angle, enlarging it, reducing it, or symmetrically converting it.

객체 추출부(120)는 학습용 이미지(200)를 구성하는 복수개의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널인 제1 채널에 기초하여 제1 객체 영역(201)을 추출할 수 있다(1002).The object extractor 120 may extract the first object area 201 based on the first channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the learning image 200. (1002).

이때, 객체 추출부(120)는 학습용 이미지(200)의 채널을 구성하는 픽셀이 인식된 객체의 정보를 포함하는 픽셀인지 판단하고, 학습용 이미지(200)의 채널에 대하여 객체의 정보를 포함하는 픽셀의 개수를 결정하고, 픽셀의 개수에 기초하여 복수개의 학습용 이미지(200)의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널을 제1 채널로 결정할 수 있다.At this time, the object extractor 120 determines whether the pixel constituting the channel of the learning image 200 is a pixel containing information about the recognized object, and the pixel containing information about the object for the channel of the learning image 200 The number of may be determined, and based on the number of pixels, a channel containing relatively more information about the recognized object among the channels of the plurality of training images 200 may be determined as the first channel.

객체 추출부(120)는 변환 이미지(400)를 구성하는 복수개의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널인 제2 채널에 기초하여 제2 객체 영역(401)을 추출할 수 있다(1003).The object extractor 120 may extract the second object area 401 based on the second channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the converted image 400. (1003).

이때, 객체 추출부(120)는 변환 이미지(400)의 채널을 구성하는 픽셀이 인식된 객체의 정보를 포함하는 픽셀인지 판단하고, 변환 이미지(400)의 채널에 대하여 객체의 정보를 포함하는 픽셀의 개수를 결정하고, 픽셀의 개수에 기초하여 복수개의 변환 이미지(400)의 채널 중 인식된 객체의 정보가 상대적으로 많이 포함된 채널을 제2 채널로 결정할 수 있다.At this time, the object extractor 120 determines whether the pixel constituting the channel of the converted image 400 is a pixel containing information about the recognized object, and determines whether the pixel containing information about the object for the channel of the converted image 400 The number of may be determined, and based on the number of pixels, a channel containing relatively more information about the recognized object among the channels of the plurality of converted images 400 may be determined as the second channel.

인공지능 학습부(130)는 제1 객체 영역(201)과 제2 객체 영역(401)의 유사성을 분석할 수 있다(1004). 또한, 인공지능 학습부(130)는 제1 배경 영역(202)과 제1 객체 영역(201)의 차이를 분석할 수 있다(1005).The artificial intelligence learning unit 130 may analyze the similarity between the first object area 201 and the second object area 401 (1004). Additionally, the artificial intelligence learning unit 130 may analyze the difference between the first background area 202 and the first object area 201 (1005).

인공지능 학습부(130)는 전술한 분석 결과에 기초하여 손실 함수를 연산할 수 있다(1006). 이때, 인공기능 학습부가 연산하는 손실 함수는 트리플렛 손실 함수일 수 있다.The artificial intelligence learning unit 130 may calculate a loss function based on the above-described analysis results (1006). At this time, the loss function calculated by the artificial function learning unit may be a triplet loss function.

인공지능 학습부(130)는 손실 함수가 감소하도록 인공지능 모델을 학습할 수 있다(1007). 이때, 인공지능 학습부(130)는 트리플렛 손실 함수가 감소하도록 인공지능 모델을 반복하여 학습할 수 있다.The artificial intelligence learning unit 130 may learn the artificial intelligence model so that the loss function is reduced (1007). At this time, the artificial intelligence learning unit 130 may repeatedly learn the artificial intelligence model so that the triplet loss function decreases.

본 발명의 실시예에 따른 비지도 학습 기반의 객체 위치 검출 방법의 성능을 검증하기 위하여, 객체 위치 검출 실험을 진행하였다.In order to verify the performance of the unsupervised learning-based object location detection method according to an embodiment of the present invention, an object location detection experiment was conducted.

도 9는 일 실시예에 따른 객체 위치 검출 방법이 종래의 객체 위치 검출 방법에 비해 개선된 정도를 나타낸 표이다.Figure 9 is a table showing the degree of improvement of the object position detection method according to an embodiment compared to the conventional object position detection method.

도 9를 참조하면, 입력 이미지(300)에 대하여 회전 변환을 수행하여 객체의 위치를 검출했을 때, 종래의 방법에 사용되는 손실 함수()만을 연산하고 손실 함수를 감소시키는 방식(Baseline)보다 본 발명에 따른 트리플렛 손실 함수()를 연산하고 트리플렛 손실 함수를 감소시키는 방식 및 학습용 이미지(200)의 채널 별로 가중치를 다르게 부여하는 방식()을 모두 사용하는 방법(Ours(full))이 객체의 위치를 더 잘 검출하는 것을 확인할 수 있다.Referring to FIG. 9, when the position of the object is detected by performing rotation transformation on the input image 300, the loss function ( Rather than calculating only ) and reducing the loss function (Baseline), the triplet loss function according to the present invention ( ) and a method of reducing the triplet loss function and a method of assigning different weights to each channel of the training image 200 ( It can be seen that the method that uses all (Ours(full)) detects the location of the object better.

구체적으로, 객체 위치의 정답(ground truth)을 나타내는 바운딩 박스와 검출된 객체의 위치를 나타내는 바운딩 박스가 겹치는 비율이 30% 이상인 경우는 종래의 방법(Baseline)은 총 실험 횟수 중 96.75%였으나, 본 발명의 경우 총 실험 횟수 중 97.30% 임을 확인할 수 있다.Specifically, when the overlap ratio between the bounding box representing the ground truth of the object location and the bounding box representing the location of the detected object is more than 30%, the conventional method (Baseline) was 96.75% of the total number of experiments, but this In the case of invention, it can be confirmed that it is 97.30% of the total number of experiments.

또한, 객체 위치의 정답(ground truth)을 나타내는 바운딩 박스와 검출된 객체의 위치를 나타내는 바운딩 박스가 겹치는 비율이 70% 이상인 경우는 종래의 방법(Baseline)은 총 실험 횟수 중 28.66%에 불과하지만, 본 발명의 경우 총 실험 횟수 중 38.64%인 것을 확인할 수 있다.In addition, when the overlap ratio between the bounding box representing the ground truth of the object location and the bounding box representing the location of the detected object is more than 70%, the conventional method (Baseline) only accounts for 28.66% of the total number of experiments. In the case of the present invention, it can be confirmed that it is 38.64% of the total number of experiments.

도 9를 참조하면, 단순히 본 발명에 따른 트리플렛 손실 함수()를 연산하는 방법만 사용하는 객체 위치 검출 방법보다 종래의 방법에 사용되는 손실 함수()를 연산하는 방법 및 학습용 이미지(200)의 채널 별로 가중치를 다르게 부여하는 방법()을 모두 중첩적으로 이용하는 객체 위치 검출 방법의 성능이 뛰어난 것을 확인할 수 있다.Referring to Figure 9, simply the triplet loss function according to the present invention ( The loss function ( ) and a method of assigning different weights to each channel of the learning image 200 ( ) It can be seen that the performance of the object location detection method that uses both overlapping methods is excellent.

도 10은 일 실시예에 따라 복수개의 채널 중 일부 채널을 결정하는 기준을 설명하기 위한 도면이며, 도 11은 일 실시예에 따라 복수개의 채널 중 일부 채널을 결정하는 기준을 설명하기 위한 또다른 도면이다.FIG. 10 is a diagram for explaining criteria for determining some channels among a plurality of channels according to an embodiment, and FIG. 11 is another diagram for explaining criteria for determining some channels among a plurality of channels according to an embodiment. am.

도 10 및 도 11을 참조하면, 복수개의 채널들 중에서, 객체의 정보를 포함하는 픽셀의 개수가 많은 상위 몇 퍼센트의 채널들을 제1 채널 밀 제2 채널로 결정할지 미리 설정할 수 있다.Referring to FIGS. 10 and 11 , among a plurality of channels, it is possible to set in advance what percentage of channels with the highest number of pixels containing object information will be determined as the first channel or the second channel.

이때, 실험 결과에 의하면 객체의 정보를 포함하는 픽셀의 개수가 많은 상위 70%의 채널들이 제1 채널 밀 제2 채널로 결정되도록 설정하는 것이 가장 효과적으로 객체의 위치를 검출하는 것을 확인할 수 있다.At this time, according to the experimental results, it can be confirmed that setting the top 70% of channels with the largest number of pixels containing object information to be determined as the first channel and the second channel most effectively detects the location of the object.

도 12는 다양한 입력 이미지 변환에 대하여 일 실시예에 따른 위치 검출 방법이 종래의 위치 검출 방법에 비하여 개선된 정도를 나타낸 표이다.Figure 12 is a table showing the degree to which the position detection method according to an embodiment is improved compared to the conventional position detection method for various input image transformations.

도 12를 참조하면, 변환 이미지(400)를 생성할 때 이용되는 변환 방법에 관계없이, 본 발명에 따른 객체 위치 검출 방법(Ours)이 종래의 방법(Base)보다 객체의 위치를 검출하는 성능이 뛰어난 것을 확인할 수 있다.Referring to FIG. 12, regardless of the conversion method used when generating the converted image 400, the object location detection method (Ours) according to the present invention has better performance in detecting the location of the object than the conventional method (Base). You can see that it is excellent.

구체적으로, 회전 변환(Rotation), 확대 또는 축소하는 변환(Scale), 픽셀 단위로 이동시키는 변환(Translation), 가로 대칭 변환(Hflip), 세로 대칭 변환(Vflip) 등의 변환 방법 중 어느 방법을 이용하여 변환 이미지(400)를 생성하더라도 본 발명에 따른 객체 위치 검출 방법(Ours)이 종래의 방법(Base)보다 객체의 위치를 검출하는 성능이 뛰어난 것을 확인할 수 있다.Specifically, use any of the following transformation methods: rotation, enlargement or reduction (Scale), pixel-wise movement (Translation), horizontal symmetry transformation (Hflip), or vertical symmetry transformation (Vflip). Even when the converted image 400 is generated, it can be confirmed that the object position detection method (Ours) according to the present invention has better performance in detecting the position of the object than the conventional method (Base).

이상에서와 같이 첨부된 도면을 참조하여 개시된 실시예들을 설명하였다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고도, 개시된 실시예들과 다른 형태로 본 발명이 실시될 수 있음을 이해할 것이다. 개시된 실시예들은 예시적인 것이며, 한정적으로 해석되어서는 안 된다.As described above, the disclosed embodiments have been described with reference to the attached drawings. A person skilled in the art to which the present invention pertains will understand that the present invention can be practiced in forms different from the disclosed embodiments without changing the technical idea or essential features of the present invention. The disclosed embodiments are illustrative and should not be construed as limiting.

100: 객체 위치 검출 장치
110: 변환 이미지 생성부
120: 객체 추출부
130: 인공지능 학습부
140: 프로세서
150: 메모리
200: 학습용 이미지
201: 제1 객체 영역
202: 제1 배경 영역
300: 입력 이미지
400: 변환 이미지
401: 제2 객체 영역
402: 제2 배경 영역100: Object location detection device
110: Conversion image creation unit
120: Object extraction unit
130: Artificial Intelligence Learning Department
140: processor
150: memory
200: Image for training
201: first object area
202: first background area
300: input image
400: converted image
401: Second object area
402: Second background area

Claims

(a) converting the learning image to generate a converted image by a converted image generator;
(b) extracting a first object area from the learning image and extracting a second object area from the converted image by an object extractor; and
(c) learning an artificial intelligence model by comparing the first object area and the second object area by an artificial intelligence learning unit,
Step (c) above is:
Learning the artificial intelligence model by comparing, by the artificial intelligence learning unit, a first background area excluding the first object area in the learning image and a second background area excluding the second object area in the converted image. ;
calculating a loss function by analyzing similarity between the first object area and the second object area and analyzing a difference between the first background area and the first object area; and
Including; learning the artificial intelligence model to reduce the loss function,
The steps for learning the artificial intelligence model are:
The loss function is reduced by ensuring that the distance between the embedding value generated when the same transformation as the transformation applied to the training image is performed on the embedding value for the training image and the embedding value corresponding to the image of the first background area is maximized. learning the artificial intelligence model to do so; and
The loss function is reduced by maximizing the distance between the embedding value generated when the transformation applied to the learning image is inversely transformed to the embedding value for the transformed image and the embedding value corresponding to the image of the second background area. An unsupervised learning-based object location detection method, including the step of learning the artificial intelligence model.

delete

According to paragraph 1,
Step (b) above is:
(b1) extracting the first object area based on a first channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the learning image; and
(b2) extracting the second object area based on a second channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the converted image; Object location detection method.

According to paragraph 4,
Step (b1) above is:
determining whether a pixel constituting a channel of the learning image is a pixel containing information about the recognized object;
determining the number of pixels containing information about the object for a channel of the training image; and
and determining, as the first channel, a channel containing relatively more information about the recognized object among the channels of the plurality of training images based on the number of pixels,
Step (b2) above is:
determining whether a pixel constituting a channel of the converted image is a pixel containing information about the recognized object;
determining the number of pixels containing information about the object for a channel of the converted image; and
Based on the number of pixels, determining a channel containing relatively more information about the recognized object among the channels of the plurality of converted images as the second channel.

According to paragraph 1,
Extracting, by an object extraction unit, the first object area from the input image using the artificial intelligence model; and
An unsupervised learning-based object location detection method further comprising: determining an object location in the input image based on the extracted first object area.

According to paragraph 1,
Step (a) above is:
An unsupervised learning-based object location detection method comprising: generating the transformed image by rotating the learning image at a predetermined angle, by the transformed image generator.

According to paragraph 1,
Step (a) above is:
An unsupervised learning-based object location detection method comprising: generating the converted image by enlarging or reducing the learning image by the converted image generator.

According to paragraph 1,
Step (a) above is:
An unsupervised learning-based object location detection method comprising: generating the transformed image by symmetrically transforming the learning image by the transformed image generator.

delete

A computer program stored in a computer-readable recording medium to execute the unsupervised learning-based object location detection method of any one of claims 1, 4 to 9.

a conversion image generator configured to generate a conversion image by converting the learning image;
an object extraction unit configured to extract a first object area from the learning image and a second object area from the converted image; and
It includes an artificial intelligence learning unit configured to learn an artificial intelligence model by comparing the first object area and the second object area,
The artificial intelligence learning department:
learning the artificial intelligence model by comparing a first background area excluding the first object area in the training image with a second background area excluding the second object area in the converted image;
Analyzing the similarity between the first object area and the second object area, analyzing the difference between the first background area and the first object area, and calculating a loss function;
configured to learn the artificial intelligence model so that the loss function decreases;
The loss function is reduced by ensuring that the distance between the embedding value generated when the same transformation as the transformation applied to the training image is performed on the embedding value for the training image and the embedding value corresponding to the image of the first background area is maximized. learning the artificial intelligence model to do so; and
The loss function is reduced by maximizing the distance between the embedding value generated when the transformation applied to the training image is inversely transformed to the embedding value for the transformed image and the embedding value corresponding to the image of the second background area. An unsupervised learning-based object location detection device that learns the artificial intelligence model.

delete

According to clause 12,
The object extraction unit:
extracting the first object area based on a first channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the learning image; and
An unsupervised learning-based object location detection device configured to extract the second object area based on a second channel, which is a channel containing relatively more information about the recognized object among the plurality of channels constituting the converted image.

According to clause 15,
The object extraction unit:
Determine whether a pixel constituting a channel of the learning image is a pixel containing information about the recognized object;
determining the number of pixels containing information about the object for a channel of the training image;
determining a channel containing relatively more information about a recognized object among channels of the plurality of training images as the first channel based on the number of pixels;
determining whether a pixel constituting a channel of the converted image is a pixel containing information about the recognized object;
determine the number of pixels containing information about the object for a channel of the converted image; and
An unsupervised learning-based object location detection device configured to determine a channel containing relatively more information about a recognized object among channels of the plurality of converted images as the second channel based on the number of pixels.

According to clause 12,
It further includes a processor;
The object extraction unit,
Configured to extract the first object area from the input image using the artificial intelligence model,
The processor,
An unsupervised learning-based object location detection device configured to determine an object location in the input image based on the extracted first object area.

According to clause 12,
The converted image generator,
An unsupervised learning-based object location detection device configured to generate the transformed image by rotating the learning image at a predetermined angle.

According to clause 12,
The converted image generator,
An unsupervised learning-based object location detection device configured to generate the converted image by enlarging or reducing the learning image.

According to clause 12,
The converted image generator,
An unsupervised learning-based object location detection device configured to generate the transformed image by symmetrically transforming the learning image.