KR102462733B1

KR102462733B1 - Robust Multi-Object Detection Apparatus and Method Using Siamese Network

Info

Publication number: KR102462733B1
Application number: KR1020200026298A
Authority: KR
Inventors: 김강건; 김진원
Original assignee: 한국과학기술연구원
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2022-11-04
Also published as: KR20210111417A

Abstract

샴 네트워크를 활용하는 강인한 다중 객체 검출 장치 및 방법이 제공된다. 실시예에 따른 다중 객체 검출 장치는 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델을 통해 입력된 이미지에서 적어도 하나의 객체를 검출하여 객체 검출 데이터를 생성하는 객체 검출 네트워크; 상기 객체 검출 네트워크의 특징(feature)과 추적하려는 객체의 정보를 포함하는 템플릿을 기초로 상기 템플릿에 해당하는 객체의 유무 및 위치를 추적하여 객체 추적 데이터를 생성하는 객체 추적 네트워크; 상기 템플릿을 포함하고, 상기 객체 검출 네트워크의 레이어에서 제공된 특징과 상기 템플릿을 하나의 쌍으로 매칭하여 상기 객체 추적 네트워크의 입력 값으로 제공하는 샴 네트워크; 및 상기 객체 검출 데이터와 상기 객체 추적 데이터를 비교하여 객체 검출 결과를 출력하고, 상기 샴 네트워크에 포함된 템플릿을 갱신하는 결과 비교부를 포함한다.A robust multi-object detection apparatus and method utilizing a Siamese network are provided. A multi-object detection apparatus according to an embodiment includes: an object detection network configured to generate object detection data by detecting at least one object from an image input through a machine learning model based on a convolutional neural network; an object tracking network for generating object tracking data by tracking the presence and location of an object corresponding to the template based on a template including a feature of the object detection network and information on an object to be tracked; a Siamese network including the template, matching a feature provided in a layer of the object detection network and the template as a pair and providing it as an input value of the object tracking network; and a result comparison unit that compares the object detection data with the object tracking data, outputs an object detection result, and updates a template included in the Siamese network.

Description

Robust Multi-Object Detection Apparatus and Method Using Siamese Network

본 발명은 강인한 다중 객체 검출 장치 및 방법에 관한 것으로, 구체적으로 샴 네트워크를 활용하는 강인한 다중 객체 검출 장치 및 방법에 관한 것이다.The present invention relates to a robust multi-object detection apparatus and method, and more particularly, to a robust multi-object detection apparatus and method utilizing a Siamese network.

심층학습(딥러닝, deep learning)은 여러 비선형 변환기법의 조합을 통해 높은 수준의 추상화(abstraction, 다량의 데이터나 복잡한 자료들 속에서 특징, 핵심적인 내용 또는 기능을 요약하는 작업)를 시도하는 기계학습(machine learning) 알고리즘의 집합으로 정의될 수 있으며, 사람의 사고방식을 컴퓨터에게 가르치는 기계학습의 한 분야이다.Deep learning (deep learning) is a machine that attempts high-level abstraction (summary of features, core contents, or functions in large amounts of data or complex data) through a combination of several nonlinear transformation methods. Machine learning can be defined as a set of algorithms, and is a branch of machine learning that teaches computers how to think of humans.

심층 학습 기반 방법, 특별히 Convolution Neural Networks(CNNs)를 이용한 객체 검출은 4차 산업 혁명과 함께 최근 큰 발전을 이루어 왔다. 현재 객체 검출 기술에서, 정확도는 높지만 복잡한 파이프라인을 가지는 R-CNN 계열의 two-stage 검출 방법보다는 빠르게 객체를 검출할 수 있는 one-stage 검출 방법이 선호되는 있는 실정이다. 객체 검출 기술에서 검출 속도와 정확도는 서로 대립되는 요소에 해당하며 one-stage 검출 방법의 정확도를 보완할 수 있는 연구가 더욱 요구되고 있다.Deep learning-based methods, especially object detection using Convolution Neural Networks (CNNs), have made great strides recently with the Fourth Industrial Revolution. In the current object detection technology, a one-stage detection method capable of quickly detecting an object is preferred rather than a two-stage detection method of the R-CNN series having high accuracy but a complicated pipeline. In object detection technology, detection speed and accuracy are opposite factors, and research that can supplement the accuracy of the one-stage detection method is more required.

본 발명은 상술한 문제점을 해결하기 위한 것으로, 실시간 검출이 가능한 수준에서 one-stage 검출 방법의 단점을 보완하고 객체 검출 성능을 높이기 위해, 샴 네트워크(Siamese network)를 활용하는 더욱 강인한 다중 객체 검출 장치 및 방법을 제공한다.The present invention is to solve the above-mentioned problems, and in order to compensate for the shortcomings of the one-stage detection method at the level where real-time detection is possible and to increase the object detection performance, a more robust multi-object detection apparatus utilizing a Siamese network and methods.

본 발명의 일 실시예에 따른 다중 객체 검출 장치는 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델을 통해 입력된 이미지에서 적어도 하나의 객체를 검출하여 객체 검출 데이터를 생성하는 객체 검출 네트워크; 상기 객체 검출 네트워크의 특징(feature)과 추적하려는 객체의 정보를 포함하는 템플릿을 기초로 상기 템플릿에 해당하는 객체의 유무 및 위치를 추적하여 객체 추적 데이터를 생성하는 객체 추적 네트워크; 상기 템플릿을 포함하고, 상기 객체 검출 네트워크의 레이어에서 제공된 특징과 상기 템플릿을 하나의 쌍으로 매칭하여 상기 객체 추적 네트워크의 입력 값으로 제공하는 샴 네트워크; 및 상기 객체 검출 데이터와 상기 객체 추적 데이터를 비교하여 객체 검출 결과를 출력하고, 상기 샴 네트워크에 포함된 템플릿을 갱신하는 결과 비교부를 포함한다.A multi-object detection apparatus according to an embodiment of the present invention includes: an object detection network for generating object detection data by detecting at least one object from an image input through a convolutional neural network-based machine learning model; an object tracking network for generating object tracking data by tracking the presence and location of an object corresponding to the template based on a template including a feature of the object detection network and information on an object to be tracked; a Siamese network including the template, matching a feature provided in a layer of the object detection network and the template as a pair and providing it as an input value of the object tracking network; and a result comparison unit that compares the object detection data with the object tracking data, outputs an object detection result, and updates a template included in the Siamese network.

본 발명의 다른 실시예에 따른 다중 객체 검출 방법은 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델을 통해 입력된 이미지에서 적어도 하나의 객체를 검출하여 객체 검출 데이터를 생성하는 단계; 상기 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델의 특징(feature)과 추적하려는 객체의 정보를 포함하는 템플릿이 하나의 쌍으로 매칭된 입력 값을 기초로 상기 템플릿에 해당하는 객체의 유무 및 위치를 추적하여 객체 추적 데이터를 생성하는 단계; 및 상기 객체 검출 데이터와 상기 객체 추적 데이터를 비교하여 객체 검출 결과를 출력하고 상기 템플릿을 갱신하는 단계를 포함한다.A multi-object detection method according to another embodiment of the present invention includes generating object detection data by detecting at least one object in an image input through a machine learning model based on a convolutional neural network; The presence or absence of an object corresponding to the template based on an input value matched as a pair of a template including a feature of the convolutional neural network-based machine learning model and information on an object to be tracked; and generating object tracking data by tracking the location; and comparing the object detection data and the object tracking data to output an object detection result and updating the template.

본 발명의 또 다른 실시예에 따른 컴퓨터 프로그램은 하드웨어와 결합되어 상술한 다중 객체 검출 방법을 실행하도록 매체에 저장된다.A computer program according to another embodiment of the present invention is stored in a medium in combination with hardware to execute the above-described multi-object detection method.

본 발명의 일 실시예에 따른 다중 객체 검출 장치는 입력된 이미지에 포함된 복수의 객체를 각각 검출하고, 검출된 객체를 추적하여 검출된 객체 정보에 대한 보완 및 보정을 수행할 수 있다. 이에 따라, 연속적인 이미지에서 객체를 검출할 때, 이전 프레임에서는 검출되었던 객체가 미 검출되는 상황이 보완될 수 있어, 다중 객체 검출의 정확도가 개선된다. 즉, 보다 강인한 다중 객체 검출 장치가 제공될 수 있다.The multi-object detection apparatus according to an embodiment of the present invention may detect a plurality of objects included in an input image, track the detected object, and supplement and correct the detected object information. Accordingly, when an object is detected in successive images, a situation in which an object that was detected in a previous frame is not detected can be compensated, and thus the accuracy of multi-object detection is improved. That is, a more robust multi-object detection apparatus may be provided.

도 1은 본 발명의 일 실시예에 따른 다중 객체 검출 장치의 구성을 도시한 블록도이다.
도 2은 본 발명의 일 실시예에 따른 결과 비교부에서 수행되는 최종 객체 검출 과정을 설명하기 위한 예시도이다.
도 3은 본 발명의 다른 실시예에 따른 다중 객체 검출 방법의 순서도이다.1 is a block diagram illustrating a configuration of a multi-object detection apparatus according to an embodiment of the present invention.
2 is an exemplary diagram for explaining a final object detection process performed by the result comparison unit according to an embodiment of the present invention.
3 is a flowchart of a multi-object detection method according to another embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당 업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시 예는 서로 다르지만 상호 배타적일 필요는 없다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있다. 따라서, 후술하는 상세한 설명은 한정적인 의미로 기술된 것이 아니며, 본 발명의 범위는 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에서 동일하거나 유사한 기능을 지칭한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0012] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0014] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0016] Reference is made to the accompanying drawings, which show by way of illustration specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present invention. Various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain shapes, structures, and characteristics described herein with respect to one embodiment may be implemented in other embodiments without departing from the spirit and scope of the invention. In addition, the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the present invention. Accordingly, the following detailed description is not intended to be limiting, and the scope of the present invention is limited only by the appended claims along with all equivalents as claimed by the claims. Like reference numerals in the drawings refer to the same or similar functions in several respects.

본 명세서에서 사용되는 용어는 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 관례 또는 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 명세서의 설명 부분에서 그 의미를 기재할 것이다. 따라서 본 명세서에서 사용되는 용어는, 단순한 용어의 명칭이 아닌 그 용어가 가지는 실질적인 의미와 본 명세서의 전반에 걸친 내용을 토대로 해석되어야 한다.The terms used in this specification have been selected as currently widely used general terms as possible while considering their functions, but may vary depending on the intention or custom of a person skilled in the art or the emergence of new technology. In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in the description of the corresponding specification. Therefore, the terms used in this specification should be interpreted based on the actual meaning of the terms and the contents of the entire specification, rather than the names of simple terms.

도 1은 본 발명의 일 실시예에 따른 다중 객체 검출 장치의 구성을 도시한 블록도이다. 1 is a block diagram illustrating a configuration of a multi-object detection apparatus according to an embodiment of the present invention.

다중 객체 검출 장치(10)는 입력된 이미지에서 포함된 복수의 객체를 검출할 수 있다. 여기서, 복수의 객체는 서로 다른 객체(사람, 차량, 배, 신호등 등)를 포함하며, 다중 객체 검출 장치(10)는 복수의 객체의 종류를 각각 구분하며, 객체가 위치하는 영역을 정의할 수 있다. 여기서, 객체가 위치하는 영역은 경계 박스로 정의되며, 경계 박스 주변에는 각각의 객체가 어떠한 종류를 나타내는 지가 표시될 수 있다. 또한, 다중 객체 검출 장치(10)는 입력된 이미지에 포함된 복수의 객체를 각각 검출하고, 검출된 객체를 추적하여 검출된 객체 정보에 대한 보완 및 보정을 수행할 수 있다. 또한, 다중 객체 검출 장치(10)에 입력된 이미지는 동영상일 수 있으며, 프레임 단위로 분획된 이미지로써 입력된 동영상에 대한 다중 객체 검출이 수행될 수 있다.The multi-object detection apparatus 10 may detect a plurality of objects included in the input image. Here, the plurality of objects include different objects (people, vehicles, ships, traffic lights, etc.) have. Here, an area in which an object is located is defined as a bounding box, and a type of each object may be displayed around the bounding box. Also, the multi-object detection apparatus 10 may detect a plurality of objects included in the input image, track the detected objects, and perform supplementation and correction on the detected object information. In addition, the image input to the multi-object detection apparatus 10 may be a moving image, and multi-object detection may be performed on the input moving image as an image segmented in units of frames.

도 1을 참조하면, 본 발명의 일 실시예에 따른 다중 객체 검출 장치(10)는 객체 검출 네트워크(100), 객체 추적 네트워크(110), 샴 네트워크(120) 및 결과 비교부(130)를 포함한다.Referring to FIG. 1 , a multi-object detection apparatus 10 according to an embodiment of the present invention includes an object detection network 100 , an object tracking network 110 , a Siamese network 120 , and a result comparison unit 130 . do.

실시예들에 따른 다중 객체 검출 장치 및 이를 구성하는 각각의 네트워크 또는 부(unit)는, 전적으로 하드웨어이거나, 또는 부분적으로 하드웨어이고 부분적으로 소프트웨어인 측면을 가질 수 있다. 예컨대, 다중 객체 검출 장치의 각각의 구성요소는 하드웨어 및 해당 하드웨어에 의해 구동되는 소프트웨어의 조합을 지칭하는 것으로 의도된다. 하드웨어는 CPU(Central Processing Unit) 또는 다른 프로세서(processor)를 포함하는 데이터 처리 기기일 수 있다. 또한, 하드웨어에 의해 구동되는 소프트웨어는 실행중인 프로세스, 객체(object), 실행파일(executable), 실행 스레드(thread of execution), 프로그램(program) 등을 지칭할 수 있다. 예를 들어, 객체 검출 네트워크(100)는, 이미지를 전달받고 이를 분석하여 객체를 검출하기 위한 하드웨어 및 소프트웨어의 조합을 지칭할 수 있다.The multi-object detection apparatus according to the embodiments and each network or unit constituting the same may have aspects that are entirely hardware, or partially hardware and partially software. For example, each component of a multi-object detection apparatus is intended to refer to a combination of hardware and software driven by the hardware. The hardware may be a data processing device including a central processing unit (CPU) or another processor. In addition, software driven by hardware may refer to a running process, an object, an executable file, a thread of execution, a program, and the like. For example, the object detection network 100 may refer to a combination of hardware and software for receiving an image and analyzing it to detect an object.

객체 검출 네트워크(100)는 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델을 통해 입력된 이미지에서 적어도 하나의 객체를 검출하여 객체 검출 데이터를 생성할 수 있다.The object detection network 100 may generate object detection data by detecting at least one object from an input image through a machine learning model based on a convolutional neural network.

CNN은 심층 신경망(DNN, deep neural network)의 한 종류로, 하나 또는 여러 개의 컨볼루션 계층(convolutional layer)과 통합 계층(pooling layer), 완전하게 연결된 계층(fully connected layer)들로 구성된 신경망이다. CNN은 2차원 데이터의 학습에 적합한 구조를 가지고 있으며, 역전달(backpropagation algorithm)을 통해 훈련될 수 있다. 특히, CNN은 영상 내 객체 분류, 객체 탐지 등 다양한 응용분야에 폭넓게 활용되는 심층 신경망의 대표적인 모델 중 하나이다.CNN is a type of deep neural network (DNN), and is a neural network composed of one or several convolutional layers, a pooling layer, and fully connected layers. CNN has a structure suitable for learning two-dimensional data, and can be trained through a backpropagation algorithm. In particular, CNN is one of the representative models of deep neural networks that are widely used in various application fields such as object classification in images and object detection.

이러한 CNN은 모델이 직접 이미지의 특징(feature)을 추출하는 머신 러닝의 한 유형이다. CNN은 이미지에서 객체, 얼굴, 장면을 인식하기 위해 패턴을 찾는데 특히 유용하며, 데이터에서 직접 학습하고, 패턴을 사용하여 이미지를 분류하며 특징을 수동으로 추출할 필요가 없다. 응용분야에 따라 CNN을 처음부터 만들거나 데이터 셋으로 사전 학습된 모델을 사용할 수 있다. 이러한 CNN은 특징을 직접 학습하기 때문에 특징을 수동으로 추출할 필요가 없고, 높은 수준의 인식 결과를 확인할 수 있으며, 기존 네트워크를 바탕으로 새로운 인식 작업을 위해 CNN을 재학습하여 사용하는 것이 가능한 장점이 있다.Such CNNs are a type of machine learning in which a model directly extracts features of an image. CNNs are particularly useful for finding patterns to recognize objects, faces, and scenes in images, learn directly from data, use patterns to classify images, and eliminate the need to manually extract features. Depending on the application, you can build a CNN from scratch or use a pre-trained model with a dataset. Because these CNNs learn features directly, there is no need to manually extract features, high-level recognition results can be checked, and the advantages of re-learning and using CNNs for new recognition tasks based on the existing network are advantages. have.

CNN의 작동 방식은, 수십, 수백 개의 계층이 각각 이미지의 서로 다른 특징을 감지하도록 학습할 수 있다. 필터가 이용되는데, 필터는 각 학습 이미지에 서로 다른 해상도로 적용되고, 필터의 출력은 다음 계층의 입력으로 활용된다. 필터는 밝기 및 가장자리 등과 같이 매우 단순한 특징에서 시작하여 객체만의 고유한 특징으로 더 복잡하게 발전될 수 있다. CNN은 다른 신경망과 마찬가지로 입력 계층, 출력 계층 및 두 계층 사이의 여러 은닉 계층으로 구성된다. 각 계층은 해당 데이터만이 갖는 특징을 학습하기 위해 데이터를 변경하는 계산을 수행한다. 가장 자주 사용되는 계층으로 컨벌루션, 활성화/ReLU, 풀링이 있다.How CNNs work, tens or hundreds of layers can each learn to detect different features of an image. Filters are used, which are applied at different resolutions to each training image, and the output of the filter is used as an input for the next layer. Filters can start with very simple features, such as brightness and edges, and evolve into more complex features that are unique to an object. CNNs, like other neural networks, consist of an input layer, an output layer, and several hidden layers between the two layers. Each layer performs calculations that change data in order to learn the characteristics of only the corresponding data. The most frequently used layers are convolution, activation/ReLU, and pooling.

컨벌루션은 각 이미지에서 특정 특징을 활성화하는 컨벌루션 필터 집합에 입력 이미지를 통과시킨다. ReLU(rectified linear unit)는 음수 값을 0에 매핑하고 양수 값을 유지하여 더 빠르고 효과적인 학습을 가능하게 한다. 이때, 활성화된 다음 계층으로 전달되기 때문에 이 과정을 활성화라 부른다. 풀링은 비선형 다운 샘플링을 수행하고, 네트워크에서 학습해야 하는 매개 변수의 수를 줄여 출력을 간소화한다. 이러한 작업이 수십 개 또는 수백 개의 계층에서 반복되어 각 계층이 여러 특징을 검출하는 방법을 학습한다.Convolution passes the input image through a set of convolution filters that activate specific features in each image. A rectified linear unit (ReLU) maps negative values to zero and holds positive values, enabling faster and more effective learning. At this time, since it is transferred to the next layer after activation, this process is called activation. Pooling performs nonlinear downsampling and simplifies the output by reducing the number of parameters the network has to learn. This operation is repeated for tens or hundreds of layers, each layer learns how to detect multiple features.

CNN의 아키텍처는 여러 계층에서 특징을 학습한 다음 분류로 넘어간다. 끝에서 두 번째 계층은 K차원의 벡터를 출력하는 완전 연결 계층이다. 여기서, K는 네트워크가 예측할 수 있는 클래스의 수이다. 이 벡터에는 분류되는 이미지의 각 클래스에 대한 확률이 포함된다. CNN의 아키텍처의 마지막 계층에서는 softmax와 같은 분류 계층을 사용하여 분류 출력을 제공한다.The architecture of a CNN learns features from multiple layers and then moves on to classification. The second to last layer is a fully connected layer that outputs a K-dimensional vector. Here, K is the number of classes that the network can predict. This vector contains the probabilities for each class of images being classified. The final layer of the CNN's architecture uses a classification layer such as softmax to provide the classification output.

CNN은 수백, 수천 개, 때로는 수백만 개의 이미지를 학습한다. 다량의 데이터와 복잡한 네트워크 아키텍처가 사용될 때, GPU를 적용하면 모델을 학습시키기 위한 처리 시간을 대폭 단축할 수 있다. CNN은 처음부터 학습시킬 수 있고, 사전에 학습된 모델을 사용하여 전이 학습을 수행할 수도 있다. 어떤 방법을 사용하지는 사용 가능한 리소스와 만들고자 하는 프로그램의 유형에 따라 달라질 수 있다.CNNs learn hundreds, thousands, and sometimes millions of images. When large amounts of data and complex network architectures are used, the application of GPUs can significantly reduce the processing time for training the model. CNNs can be trained from scratch, or transfer learning can be performed using pre-trained models. Which method you use will depend on the resources available and the type of program you want to create.

네트워크를 처음부터 학습시키기 위해 설계자는 계층 및 필터의 개수와 그 밖의 조정 가능한 매개 변수들을 정의할 필요가 있다. 모델을 처음부터 학습시켜 정확한 결과를 도출하기 위해 다량의 데이터와 수백만 개의 샘플이 필요할 수 있다.To train a network from scratch, the designer needs to define the number of layers and filters and other tunable parameters. Training a model from scratch can require large amounts of data and millions of samples to produce accurate results.

본 발명의 실시예에 따른 객체 검출 네트워크(100)는 이러한 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델 중 실시간 객체 검출에 적합한 one-stage 검출 네트워크로 구성될 수 있다. 여기서, one-stage 검출 네트워크는 Regional Proposal과 Classification이 동시에 수행되는 검출 네트워크를 의미한다. 즉, 네트워크의 최종 출력단에서 객체 경계 박스를 찾는 작업과 클래스를 분류하는 작업이 동시에 수행될 수 있다. 예시적으로, 객체 검출 네트워크(100)는 Yolo(You Only Look Once) 네트워크 또는 CenterNet과 같은 one-stage 검출 네트워크로 구성될 수 있으나, 이에 한정되는 것은 아니다.The object detection network 100 according to an embodiment of the present invention may be configured as a one-stage detection network suitable for real-time object detection among these convolutional neural network-based machine learning models. Here, the one-stage detection network refers to a detection network in which regional proposal and classification are simultaneously performed. That is, the task of finding the object bounding box and the task of classifying the object at the final output end of the network may be simultaneously performed. Illustratively, the object detection network 100 may be configured as a one-stage detection network such as a You Only Look Once (Yolo) network or CenterNet, but is not limited thereto.

객체 검출 네트워크(100)에서 생성된 객체 검출 데이터는 입력된 이미지에서 검출된 객체의 종류(class) 및 객체의 영역에 해당하는 경계 박스를 포함한다. 여기서, 객체의 종류는 경계 박스 별로 상이한 생성으로 구성되어 경계 박스의 종류를 나타낼 수 있으나, 이에 한정되는 것은 아니다. 몇몇 실시예에서, 객체의 종류는 경계 박스 외부로 자막 형태로 경계 박스에 포함된 객체의 명칭을 나타내도록 구성될 수도 있다.The object detection data generated by the object detection network 100 includes a bounding box corresponding to a class and a region of an object detected in the input image. Here, the type of object may represent the type of the bounding box by being composed of different creation for each bounding box, but is not limited thereto. In some embodiments, the type of object may be configured to indicate the name of an object included in the bounding box in the form of a caption outside the bounding box.

상술한 바와 같이, 객체 검출 네트워크(100)는 복수의 레이어(계층)로 구성되며, 각 레이어는 특징을 생성한다. 여기서, 객체 검출 네트워크(100)에 포함된 복수의 레이어 중 어느 하나의 레이어에서 생성된 특징은 샴 네트워크(120)로 제공될 수 있다.As described above, the object detection network 100 is composed of a plurality of layers (layers), and each layer generates a feature. Here, the feature generated in any one of the plurality of layers included in the object detection network 100 may be provided to the Siamese network 120 .

샴 네트워크(120)는 객체 추적 네트워크(110)에 입력 데이터를 입력하는 네트워크에 해당하며, 객체 검출 네트워크(100)의 레이어의 특징과 템플릿을 포함할 수 있다. 샴 네트워크(120)는 객체 검출 네트워크(100)의 레이어에서 제공된 특징과 샴 네트워크(120)에 저장된 템플릿을 하나의 쌍으로 매칭할 수 있으며, 매칭된 특징-템플릿 쌍을 객체 추적 네트워크(110)에 입력으로 제공할 수 있다. 여기서, 템플릿은 객체 추적 네트워크(110)에서 추적하고자 하는 객체의 정보를 포함하는 데이터로 정의되며, 객체 추적 네트워크(110)는 각 템플릿에 대응하도록 객체 추적 서브 네트워크가 구성될 수 있다. 또한, 샴 네트워크(120)에 저장된 템플릿은 이전 프레임에서 최종적으로 결정된 상태로 현재 프레임에 따른 객체 검출 결과에 따라 갱신될 수 있다. 이러한 갱신 과정에 대해서는 보다 상세히 후술하도록 한다. 샴 네트워크(120)는 현재 프레임에 대응하는 이미지를 객체 검출 네트워크(100)가 분석하여 생성된 특징과 이전 프레임에서 결정된 템플릿 정보를 매칭하여 객체 추적 네트워크(110)에 입력 값으로 제공할 수 있다.The Siamese network 120 corresponds to a network that inputs input data to the object tracking network 110 , and may include features and templates of layers of the object detection network 100 . The Siamese network 120 may match the feature provided in the layer of the object detection network 100 and the template stored in the Siamese network 120 as a pair, and the matched feature-template pair to the object tracking network 110 . It can be provided as input. Here, the template is defined as data including information of an object to be tracked in the object tracking network 110 , and the object tracking network 110 may have an object tracking sub-network configured to correspond to each template. In addition, the template stored in the Siamese network 120 may be updated according to the object detection result according to the current frame in a state finally determined in the previous frame. This update process will be described later in more detail. The Siamese network 120 may provide an input value to the object tracking network 110 by matching a feature generated by analyzing the image corresponding to the current frame with the template information determined in the previous frame by the object detection network 100 .

객체 추적 네트워크(110)는 샴 네트워크(120)로부터 입력된 템플릿에 대응한 객체 추적을 수행할 수 있다. 객체 추적 네트워크(110)는 샴 네트워크(120)로부터 입력된 템플릿에 대응하여 객체 추적을 수행하는 적어도 하나의 객체 추적 서브 네트워크로 구성될 수 있다. 객체 추적 서브 네트워크는 템플릿의 개수에 대응하여 구성될 수 있다. 각각의 객체 추적 서브 네트워크는 객체 검출 네트워크(100)에서 생성된 특징과 템플릿을 비교하여, 템플릿에 해당하는 객체의 존재 여부에 대한 분석(Classification) 및 템플릿에 해당하는 객체의 경계 박스에 대한 분석(Regression)을 각각 수행할 수 있다. 각각의 객체 추적 서브 네트워크는 템플릿에 해당하는 객체의 존재 여부에 대한 분석(Classification)을 통해 객체 유무에 대한 확률 값을 산출하며, 템플릿에 해당하는 객체의 경계 박스에 대한 분석(Regression)을 통해 객체의 경계 박스 정보를 산출할 수 있다. 각각의 객체 추적 서브 네트워크는 Region Proposal Network(RPN)으로 구성될 수 있으며, 이미지에 대응하는 특징에서 입력된 템플릿에 해당하는 객체의 경계 박스 및 존재 여부를 추적하여 객체의 존재 위치를 추정할 수 있다. 객체 추적 네트워크(110)는 각각의 객체 추적 서브 네트워크에서 분석된 데이터, 추정된 객체의 위치를 취합하여 각 템플릿에 대응하는 객체들의 위치를 추정한 객체 추적 데이터를 생성할 수 있다.The object tracking network 110 may perform object tracking corresponding to the template input from the Siamese network 120 . The object tracking network 110 may include at least one object tracking subnetwork that performs object tracking in response to a template input from the Siamese network 120 . The object tracking subnetwork may be configured to correspond to the number of templates. Each object tracking sub-network compares the features generated by the object detection network 100 with the template, and analyzes the existence of an object corresponding to the template (Classification) and the boundary box of the object corresponding to the template ( Regression) can be performed individually. Each object tracking subnetwork calculates a probability value for the existence of an object through classification of the object corresponding to the template, and the object through regression of the bounding box of the object corresponding to the template. of bounding box information can be calculated. Each object tracking sub-network may be composed of a Region Proposal Network (RPN), and the existence position of the object can be estimated by tracking the boundary box and existence of the object corresponding to the template input from the feature corresponding to the image. . The object tracking network 110 may generate object tracking data obtained by estimating the location of objects corresponding to each template by collecting the analyzed data and the estimated location of the object in each object tracking sub-network.

결과 비교부(130)는 객체 검출 네트워크(100)에서 생성된 객체 검출 데이터와 객체 추적 네트워크(110)에서 생성된 객체 추적 데이터를 비교하여 객체 검출 결과를 출력할 수 있다. 결과 비교부(130)는 객체 검출 네트워크(100)에서 생성된 객체 검출 데이터와 객체 추적 네트워크(110)에서 생성된 객체 추적 데이터를 비교 분석하여 현재 프레임에 대응하는 이미지의 객체를 최종적으로 결정할 수 있다. 즉, 객체 검출 네트워크(100)에서 생성된 객체 검출 데이터는 이전 프레임에 결정된 객체 정보를 기반으로 생성된 템플릿을 활용하여 생성된 객체 추적 데이터를 통해 검출된 객체에 대한 보완 및 보정 작업이 수행될 수 있다. 결과 비교부(130)는 비교 분석을 통해 객체 검출 결과를 최종적으로 출력할 수 있다. 또한, 결과 비교부(130)는 최종적으로 결정된 현재 프레임에 대응하는 이미지의 객체에 대한 정보를 다음 프레임에 대응하는 이미지의 객체 추적에 활용되도록 샴 네트워크(120)에 제공할 수 있다. 즉, 샴 네트워크(120)에 포함되는 템플릿은 현재 프레임의 이미지에 대한 객체 정보로 갱신될 수 있다.The result comparison unit 130 may compare the object detection data generated by the object detection network 100 with the object tracking data generated by the object tracking network 110 to output an object detection result. The result comparator 130 compares and analyzes the object detection data generated by the object detection network 100 and the object tracking data generated by the object tracking network 110 to finally determine the object of the image corresponding to the current frame. . That is, the object detection data generated by the object detection network 100 can be supplemented and corrected for the object detected through the object tracking data generated by using a template generated based on the object information determined in the previous frame. have. The result comparison unit 130 may finally output the object detection result through comparative analysis. Also, the result comparison unit 130 may provide the finally determined information on the object of the image corresponding to the current frame to the Siamese network 120 to be utilized for tracking the object of the image corresponding to the next frame. That is, the template included in the Siamese network 120 may be updated with object information about the image of the current frame.

도 2은 본 발명의 일 실시예에 따른 결과 비교부에서 수행되는 최종 객체 검출 과정을 설명하기 위한 예시도이다.2 is an exemplary diagram for explaining a final object detection process performed by the result comparison unit according to an embodiment of the present invention.

도 2를 참조하면, 상술한 바와 같이 현재 프레임의 이미지에 대해서 객체 검출 데이터(M개의 경계 박스)와 객체 추적 데이터(N개의 경계 박스)가 각각 객체 검출 네트워크(100) 및 객체 추적 네트워크(110)에서 생성될 수 있다. 결과 비교부(130)는 객체 검출 데이터(M개의 경계 박스)에 포함된 객체와 객체 추적 데이터(N개의 경계 박스)에 포함된 객체가 동일한 객체에 해당하는 지 여부를 판단할 수 있다. 결과 비교부(130)는 객체 검출 데이터에 포함된 경계 박스(

)와 객체 추적 데이터에 포함된 경계 박스(

)가 동일한 클래스 정보(종류 정보)를 가지고 있는 지를 판단할 수 있다. 또한, 결과 비교부(130)는 객체 검출 데이터에 포함된 경계 박스(

)와 객체 추적 데이터에 포함된 경계 박스(

)의 유사도를 하기 수학식 1과 같은 L2 norm에 따라 판별하게 된다.Referring to FIG. 2 , as described above, object detection data (M bounding boxes) and object tracking data (N bounding boxes) for the image of the current frame are provided by the object detection network 100 and the object tracking network 110 , respectively. can be created in The result comparison unit 130 may determine whether an object included in the object detection data (M bounding boxes) and an object included in the object tracking data (N bounding boxes) correspond to the same object. The result comparison unit 130 includes a bounding box (

) and the bounding box (

) can determine whether they have the same class information (type information). In addition, the result comparison unit 130 includes a bounding box (

) and the bounding box (

) is determined according to the L2 norm as in Equation 1 below.

[수학식 1][Equation 1]

(여기서, x, y는 경계 박스의 공간상의 위치 좌표, w, h는 경계 박스의 너비, 높이를 각각 의미하며, l은 미리 결정된 값으로 학습 또는 경험적으로 결정될 수 있다.)(Here, x and y are spatial coordinates of the bounding box, w and h are the width and height of the bounding box, respectively, and l is a predetermined value and may be determined by learning or empirically.)

결과 비교부(130)는 상기와 같은 과정을 통해 객체 검출 데이터에 포함된 경계 박스(

)와 객체 추적 데이터에 포함된 경계 박스(

)과 동일한 경계 박스인 것으로 판별이 되면, 하기 수학식 2를 통해 객체 추적 데이터에 포함된 경계 박스(

)의 값을 갱신할 수 있다.The result comparison unit 130 performs the above-described process in the bounding box (

) and the bounding box (

) and the same bounding box, the bounding box (

) can be updated.

[수학식 2][Equation 2]

또한, 결과 비교부(130)는 객체 추적 데이터에 포함된 경계 박스(

) 중 객체 검출 데이터에 포함된 경계 박스(

)와 매칭되는 박스(객체)가 없는 경우, 이는 종래 존재하던 객체가 이미지 상에 사라졌다는 것을 의미할 수 있다. 다만, 객체 추적의 오류에 따라 이미지 상에 존재하는 객체를 검출하지 못하는 오차가 발생할 수도 있는 바, 결과 비교부(130)는 해당 박스(객체)를 바로 삭제하지 않고 객체 유무 임계값(k)와 실패 계수 임계값(c)을 사용하여 상기 매칭되지 않는 객체의 미검출 및 삭제 여부를 결정할 수 있다. 구체적으로, 결과 비교부(130)는 객체 추적 데이터에 포함되어 제공된 상기 객체의 유무에 대한 확률 값이 객체 유무 임계값(k)을 초과하는 지 여부를 판단하며, 초과하지 못하는 경우를 계수(count)하게 된다. 결과 비교부(130)는 상기 객체의 유무에 대한 확률 값이 객체 유무 임계값(k) 이하인 지를 연속하는 프레임 이미지에 따라 계속 카운트하게 되며, 카운트되는 값이 실패 계수 임계값(c)을 초과하면, 해당 템플릿을 삭제하도록 결정하게 된다.In addition, the result comparison unit 130 is a bounding box (

) of the bounding box (

) and there is no matching box (object), this may mean that the existing object has disappeared from the image. However, an error in detecting an object existing on an image may occur due to an error in object tracking, and the result comparison unit 130 does not immediately delete the box (object), but calculates the object presence threshold (k) and The failure coefficient threshold value c may be used to determine whether the non-matching object is not detected and deleted. Specifically, the result comparison unit 130 determines whether or not the probability value for the existence of the object provided by being included in the object tracking data exceeds the object existence threshold value k, and counts the case where it does not exceed ) will do. The result comparison unit 130 continues to count whether the probability value for the existence of the object is equal to or less than the object existence threshold value (k) or not according to successive frame images, and when the counted value exceeds the failure coefficient threshold value (c) , decides to delete the template.

또한, 결과 비교부(130)는 객체 검출 데이터에 포함된 경계 박스(

)와 매칭되는 박스 중 객체 추적 데이터에 포함된 경계 박스(

)가 없는 경우, 이는 새로운 객체가 검출되었다는 의미할 수 있다. 결과 비교부(130)는 새롭게 검출된 객체를 새로운 템플릿으로 추가하는 것을 결정하게 된다.In addition, the result comparison unit 130 includes a bounding box (

) and the bounding box (

), this may mean that a new object has been detected. The result comparison unit 130 determines to add the newly detected object as a new template.

결과 비교부(130)의 비교 결과에 따라 최종적으로 결정된 검출 객체에 대한 정보가 이미지와 함께 출력될 수 있다. 또한, 결과 비교부(130)는 현재 결정된 상태의 템플릿을 샴 네트워크(120)에 전달하여 샴 네트워크(120)에 저장된 템플릿이 최신 상태로 갱신되도록 한다.Information on the detection object finally determined according to the comparison result of the result comparison unit 130 may be output together with the image. Also, the result comparison unit 130 transmits the template in the currently determined state to the Siamese network 120 so that the template stored in the Siamese network 120 is updated to the latest state.

본 발명의 일 실시예에 따른 다중 객체 검출 장치(10)는 입력된 이미지에 포함된 복수의 객체를 각각 검출하고, 검출된 객체를 추적한 정보에 기초하여 검출된 객체 정보에 대한 보완 및 보정을 수행할 수 있다. 이에 따라, 연속적인 이미지에서 객체를 검출할 때, 이전 프레임에서는 검출되었던 객체가 미 검출되는 상황이 보완될 수 있어, 다중 객체 검출의 정확도가 개선된다. 즉, 보다 강인한 다중 객체 검출 장치가 제공될 수 있다.The multi-object detection apparatus 10 according to an embodiment of the present invention detects a plurality of objects included in an input image, respectively, and supplements and corrects the detected object information based on the information tracking the detected object. can be done Accordingly, when an object is detected in successive images, a situation in which an object that was detected in a previous frame is not detected can be compensated, and thus the accuracy of multi-object detection is improved. That is, a more robust multi-object detection apparatus may be provided.

이하, 본 발명의 다른 실시예에 따른 다중 객체 검출 방법에 대해 설명하도록 한다.Hereinafter, a multi-object detection method according to another embodiment of the present invention will be described.

도 3은 본 발명의 다른 실시예에 따른 다중 객체 검출 방법의 순서도이다. 본 발명의 다른 실시예에 따른 다중 객체 검출 방법은 상술한 다중 객체 검출 장치(10)에서 수행되는 방법일 수 있으며, 도 1 및 도 2가 설명을 위해 참조될 수 있다.3 is a flowchart of a multi-object detection method according to another embodiment of the present invention. The multi-object detection method according to another embodiment of the present invention may be a method performed by the above-described multi-object detection apparatus 10, and FIGS. 1 and 2 may be referred to for description.

도 3을 참조하면, 본 발명의 다른 실시예에 따른 다중 객체 검출 방법은 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델을 통해 입력된 이미지에서 적어도 하나의 객체를 검출하여 객체 검출 데이터를 생성하는 단계(S100); 상기 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델의 특징과 추적하려는 객체의 정보를 포함하는 템플릿이 하나의 쌍으로 매칭된 입력 값을 기초로 상기 템플릿에 해당하는 객체의 유무 및 위치를 추적하여 객체 추적 데이터를 생성하는 단계(S110); 및 상기 객체 검출 데이터와 상기 객체 추적 데이터를 비교하여 객체 검출 결과를 출력하고 상기 템플릿을 갱신하는 단계(S120)를 포함한다.Referring to FIG. 3 , a multi-object detection method according to another embodiment of the present invention generates object detection data by detecting at least one object in an image input through a machine learning model based on a convolutional neural network. step (S100); The presence and location of an object corresponding to the template is tracked based on an input value in which a template including the characteristics of the convolutional neural network-based machine learning model and information on the object to be tracked is matched as a pair. to generate object tracking data (S110); and comparing the object detection data with the object tracking data, outputting an object detection result, and updating the template (S120).

먼저, 입력된 이미지에서 적어도 하나의 객체를 검출하여 객체 검출 데이터를 생성한다(S100).First, object detection data is generated by detecting at least one object in an input image (S100).

객체 검출 네트워크(100)는 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델을 통해 입력된 이미지에서 적어도 하나의 객체를 검출하여 객체 검출 데이터를 생성할 수 있다. 본 발명의 실시예에 따른 객체 검출 네트워크(100)는 이러한 컨볼루션 신경망(Convolutional Neural Network) 기반의 기계 학습 모델 중 실시간 객체 검출에 적합한 one-stage 검출 네트워크로 구성될 수 있다. 여기서, one-stage 검출 네트워크는 Regional Proposal과 Classification이 동시에 수행되는 검출 네트워크를 의미한다. 즉, 네트워크의 최종 출력단에서 객체 경계 박스를 찾는 작업과 클래스를 분류하는 작업이 동시에 수행될 수 있다. 예시적으로, 객체 검출 네트워크(100)는 Yolo(You Only Look Once) 네트워크 또는 CenterNet과 같은 one-stage 검출 네트워크로 구성될 수 있으나, 이에 한정되는 것은 아니다. 객체 검출 네트워크(100)에서 생성된 객체 검출 데이터는 입력된 이미지에서 검출된 객체의 종류(class) 및 객체의 영역에 해당하는 경계 박스를 포함한다. 여기서, 객체의 종류는 경계 박스 별로 상이한 생성으로 구성되어 경계 박스의 종류를 나타낼 수 있으나, 이에 한정되는 것은 아니다. 몇몇 실시예에서, 객체의 종류는 경계 박스 외부로 자막 형태로 경계 박스에 포함된 객체의 명칭을 나타내도록 구성될 수도 있다. 상술한 바와 같이, 객체 검출 네트워크(100)는 복수의 레이어로 구성되며, 각 레이어는 특징을 생성한다. 여기서, 객체 검출 네트워크(100)에 포함된 복수의 레이어 중 어느 하나의 레이어에서 생성된 특징은 샴 네트워크(120)로 제공될 수 있다.The object detection network 100 may generate object detection data by detecting at least one object from an input image through a machine learning model based on a convolutional neural network. The object detection network 100 according to an embodiment of the present invention may be configured as a one-stage detection network suitable for real-time object detection among these convolutional neural network-based machine learning models. Here, the one-stage detection network refers to a detection network in which regional proposal and classification are simultaneously performed. That is, the task of finding the object bounding box and the task of classifying the object at the final output end of the network may be simultaneously performed. Illustratively, the object detection network 100 may be configured as a one-stage detection network such as a You Only Look Once (Yolo) network or CenterNet, but is not limited thereto. The object detection data generated by the object detection network 100 includes a bounding box corresponding to a class and a region of an object detected in the input image. Here, the type of object may represent the type of the bounding box by being composed of different creation for each bounding box, but is not limited thereto. In some embodiments, the type of object may be configured to indicate the name of an object included in the bounding box in the form of a caption outside the bounding box. As described above, the object detection network 100 is composed of a plurality of layers, each layer generating a feature. Here, the feature generated in any one of the plurality of layers included in the object detection network 100 may be provided to the Siamese network 120 .

다음으로, 상기 객체 검출 네트워크의 특징과 추적하려는 객체의 정보를 포함하는 템플릿이 하나의 쌍으로 매칭된 입력 값을 기초로 상기 템플릿에 해당하는 객체의 유무 및 위치를 추적하여 객체 추적 데이터를 생성한다(S110).Next, the object tracking data is generated by tracking the presence and location of an object corresponding to the template based on an input value matched as a pair of a template including the characteristics of the object detection network and information on the object to be tracked. (S110).

샴 네트워크(120)는 객체 추적 네트워크(110)에 입력 데이터를 입력하는 네트워크에 해당하며, 객체 검출 네트워크(100)의 레이어의 특징과 템플릿을 포함할 수 있다. 샴 네트워크(120)는 객체 검출 네트워크(100)의 레이어에서 제공된 특징과 샴 네트워크(120)에 저장된 템플릿을 하나의 쌍으로 매칭할 수 있으며, 매칭된 특징-템플릿 쌍을 객체 추적 네트워크(110)에 입력으로 제공할 수 있다.The Siamese network 120 corresponds to a network that inputs input data to the object tracking network 110 , and may include features and templates of layers of the object detection network 100 . The Siamese network 120 may match the feature provided in the layer of the object detection network 100 and the template stored in the Siamese network 120 as a pair, and the matched feature-template pair to the object tracking network 110 . It can be provided as input.

다음으로, 상기 객체 검출 데이터와 상기 객체 추적 데이터를 비교하여 객체 검출 결과를 출력하고 상기 템플릿을 갱신한다(S120).Next, the object detection data is compared with the object tracking data, an object detection result is output, and the template is updated (S120).

결과 비교부(130)는 객체 검출 네트워크(100)에서 생성된 객체 검출 데이터와 객체 추적 네트워크(110)에서 생성된 객체 추적 데이터를 비교하여 객체 검출 결과를 출력할 수 있다. 결과 비교부(130)는 객체 검출 데이터에 포함된 경계 박스(

)와 객체 추적 데이터에 포함된 경계 박스(

)의 유사도를 하기 수학식 1과 같은 L2 norm에 따라 판별하게 된다.The result comparison unit 130 may compare the object detection data generated by the object detection network 100 with the object tracking data generated by the object tracking network 110 to output an object detection result. The result comparison unit 130 includes a bounding box (

) and the bounding box (

) is determined according to the L2 norm as in Equation 1 below.

[수학식 1][Equation 1]

)와 객체 추적 데이터에 포함된 경계 박스(

)과 동일 경계 박스인 것으로 판별이 되면, 하기 수학식 2를 통해 객체 추적 데이터에 포함된 경계 박스(

)의 값을 갱신할 수 있다. The result comparison unit 130 performs the above-described process in the bounding box (

) and the bounding box (

) and the same bounding box, the bounding box (

) can be updated.

[수학식 2][Equation 2]

) 중 객체 검출 데이터에 포함된 경계 박스(

)와 매칭되는 박스(객체)가 없는 경우, 이는 종래 존재하던 객체가 이미지 상에 사라졌다는 것을 의미할 수 있다. 다만, 객체 추적의 오류에 따라 이미지 상에 존재하는 객체를 검출하지 못하는 오차가 발생할 수도 있는 바, 결과 비교부(130)는 해당 박스(객체)를 바로 삭제하지 않고 객체 유무 임계값(k)와 실패 계수 임계값(c)을 사용하여 상기 매칭되지 않는 객체의 미검출 및 삭제 여부를 결정할 수 있다. 구체적으로, 결과 비교부(130)는 객체 추적 데이터에 포함되어 제공된 상기 객체의 유무에 대한 확률 값이 객체 유무 임계값(k)을 초과하는 지 여부를 판단하며, 초과하지 못하는 경우를 계수(count)하게 된다. 결과 비교부(130)는 상기 객체의 유무에 대한 확률 값이 객체 유무 임계값(k) 이하인 지를 연속하는 프레임 이미지에 따라 계속 카운트하게 되며, 카운트되는 값이 실패 계수 임계값(c)을 초과하면, 해당 객체가 이미지 상에서 사라진 것으로 결정하게 된다. 즉, 대응하는 템플릿을 삭제하도록 결정할 수 있다.In addition, the result comparison unit 130 is a bounding box (

) of the bounding box (

) and there is no matching box (object), this may mean that the existing object has disappeared from the image. However, an error in detecting an object existing on an image may occur due to an error in object tracking, and the result comparison unit 130 does not immediately delete the box (object), but calculates the object presence threshold (k) and The failure coefficient threshold value c may be used to determine whether the non-matching object is not detected and deleted. Specifically, the result comparison unit 130 determines whether or not the probability value for the existence of the object provided by being included in the object tracking data exceeds the object existence threshold value k, and counts the case where it does not exceed ) will do. The result comparison unit 130 continues to count whether the probability value for the existence of the object is equal to or less than the object existence threshold value (k) or not according to successive frame images, and when the counted value exceeds the failure coefficient threshold value (c) , it is decided that the object has disappeared from the image. That is, it may be decided to delete the corresponding template.

) 중 객체 추적 데이터에 포함된 경계 박스(

)와 매칭되는 박스가 없는 경우, 이는 새로운 객체가 검출되었다는 의미할 수 있다. 즉, 결과 비교부(130)는 객체 검출 데이터에 포함된 경계 박스(

) 중 객체 추적 데이터에 포함된 경계 박스(

)와 매칭되는 박스가 없는 경우, 새로운 객체가 이미지 상에 검출된 것으로 결정하게 된다. 즉, 대응하는 템플릿을 생성하도록 결정할 수 있다. 결과 비교부(130)의 비교 결과에 따라 최종적으로 결정된 검출 객체에 대한 정보가 이미지와 함께 출력될 수 있다. 또한, 결과 비교부(130)는 현재 결정된 상태의 템플릿을 샴 네트워크(120)에 전달하여 샴 네트워크(120)에 저장된 템플릿이 최신 상태로 갱신되도록 한다.In addition, the result comparison unit 130 includes a bounding box (

) of the bounding box (

) and there is no matching box, it may mean that a new object has been detected. That is, the result comparison unit 130 is a bounding box (

) of the bounding box (

), it is determined that a new object has been detected on the image. That is, it may decide to generate a corresponding template. Information on the detection object finally determined according to the comparison result of the result comparison unit 130 may be output together with the image. Also, the result comparison unit 130 transmits the template in the currently determined state to the Siamese network 120 so that the template stored in the Siamese network 120 is updated to the latest state.

이상에서 설명한 실시예들에 따른 다중 객체 검출 방법에 의한 동작은, 적어도 부분적으로 컴퓨터 프로그램으로 구현되고 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다. 실시예들에 따른 다중 객체 검출 방법에 의한 동작을 구현하기 위한 프로그램이 기록되고 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 또한, 본 실시예를 구현하기 위한 기능적인 프로그램, 코드 및 코드 세그먼트(segment)들은 본 실시예가 속하는 기술 분야의 통상의 기술자에 의해 용이하게 이해될 수 있을 것이다.The operation by the multi-object detection method according to the embodiments described above may be at least partially implemented as a computer program and recorded in a computer-readable recording medium. A computer-readable recording medium in which a program for implementing an operation by the multi-object detection method according to the embodiments is recorded and includes all types of recording devices in which computer-readable data is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. In addition, the computer-readable recording medium may be distributed in network-connected computer systems, and the computer-readable code may be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present embodiment may be easily understood by those skilled in the art to which the present embodiment belongs.

이상에서는 실시예들을 참조하여 설명하였지만 본 발명은 이러한 실시예들 또는 도면에 의해 한정되는 것으로 해석되어서는 안 되며, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to the embodiments, the present invention should not be construed as being limited by these embodiments or drawings, and those skilled in the art will appreciate the spirit and scope of the present invention described in the claims below. It will be understood that various modifications and variations of the present invention can be made without departing from the scope of the present invention.

10: 다중 객체 검출 장치
100: 객체 검출 네트워크
110: 객체 추적 네트워크
120: 샴 네트워크
130: 결과 비교부10: multi-object detection device
100: object detection network
110: object tracking network
120: Siamese Network
130: result comparison unit

Claims

an object detection network for generating object detection data by detecting at least one object from an image input through a convolutional neural network-based machine learning model;
An object tracking sub configured to generate object tracking data by tracking the presence and location of an object corresponding to the template based on a template including a feature of the object detection network and information of an object to be tracked, and corresponding to the template an object tracking network comprising a network;
a Siamese network including the template, matching a feature provided in a layer of the object detection network and the template as a pair and providing it as an input value of the object tracking network; and
Comprising a result comparison unit that compares the object detection data and the object tracking data, outputs an object detection result, and updates a template included in the Siamese network,
The object tracking sub-network calculates a probability value for the existence of an object through classification of the object corresponding to the template, and the object through regression of the bounding box of the object corresponding to the template. Multi-object detection apparatus, characterized in that generating the object tracking data by calculating the bounding box information of.

The method of claim 1,
The input image is a moving picture, and the multi-object detection apparatus, characterized in that detecting multiple objects from the image segmented in units of frames.

3. The method of claim 2,
The object detection data is a multi-object detection apparatus, characterized in that it includes a bounding box corresponding to the type (class) of the object detected in the input image and the area of the object.

delete

The method of claim 1,
The result comparison unit includes a bounding box (

) and the bounding box (

) is determined through Equation 1 below,
The bounding box included in the object detection data (

) and the bounding box (

) is the same bounding box, the bounding box (

), a multi-object detection device, characterized in that it updates the value of .

[Equation 1]

(Here, x and y are spatial coordinates of the bounding box, w and h are the width and height of the bounding box, respectively, and l is a predetermined value for learning, which may be determined by learning or empirically.)

[Equation 2]

6. The method of claim 5,
The result comparison unit is a bounding box (

) of the bounding box (

), if there is no box matching the box, whether the probability value for the presence or absence of the object corresponding to the box is equal to or less than the object presence threshold (k) continues to be counted according to successive frame images, and the counted value is the failure coefficient threshold If (c) is exceeded, it is determined that the object has disappeared from the image,
The result comparison unit includes a bounding box (

) of the bounding box (

) and there is no matching box, it is determined that a new object has been detected.

generating object detection data by detecting at least one object from an image input through a convolutional neural network-based machine learning model;
The presence or absence of an object corresponding to the template based on an input value in which a template including a feature of the convolutional neural network-based machine learning model and information of an object to be tracked is matched as a pair; and generating object tracking data by tracking the location; and
Comparing the object detection data and the object tracking data, outputting an object detection result and updating the template,
The step of generating the object tracking data,
It is performed in the object tracking subnetwork configured to correspond to the template,
The object tracking sub-network calculates a probability value for the existence of an object through classification of the object corresponding to the template, and the object through regression of the bounding box of the object corresponding to the template. Multi-object detection method, characterized in that generating the object tracking data by calculating the bounding box information of.

8. The method of claim 7,
The input image is a moving picture, and the method for detecting multiple objects is characterized in that multiple objects are detected from the image segmented in units of frames.

9. The method of claim 8,
The object detection data is a multi-object detection method, characterized in that it includes a bounding box corresponding to the type of the object detected in the input image (class) and the area of the object.

delete

8. The method of claim 7,
Outputting the object detection result and updating the template includes:
The bounding box included in the object detection data (

) and the bounding box (

) is the same bounding box, the bounding box (

), which includes updating the value of .

[Equation 1]

(Here, x and y are spatial coordinates of the bounding box, w and h are the width and height of the bounding box, respectively, and l is a predetermined value and may be determined by learning or empirically.)

[Equation 2]

12. The method of claim 11,
Outputting the object detection result and updating the template includes:
The bounding box included in the object tracking data (

) of the bounding box (

), if there is no box matching the box, whether the probability value for the presence or absence of the object corresponding to the box is equal to or less than the object presence threshold (k) continues to be counted according to successive frame images, and the counted value is the failure coefficient threshold If (c) is exceeded, it is determined that the object has disappeared from the image,
The bounding box included in the object detection data (

) of the bounding box (

) and there is no matching box, determining that a new object has been detected.

A computer program stored on a medium in combination with hardware to execute the multi-object detection method according to one of claims 7 to 9, 11 and 12.