KR20210067710A

KR20210067710A - Method and apparatus for tracking objects in real time

Info

Publication number: KR20210067710A
Application number: KR1020190157603A
Authority: KR
Inventors: 권기룡
Original assignee: 부경대학교 산학협력단
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-06-08

Abstract

The present invention relates to a method and apparatus for tracking an object in real time, and more particularly, the apparatus includes: a collection part for collecting a data set through at least one or more Internet resources; a learning part for acquiring an object detection model by performing learning based on a Faster R-CNN algorithm on the collected data set; a camera part for acquiring a real-time video sequence; a detection part for detecting an object by inputting the real-time video sequence into an object detection model; and a tracking part for tracking the object based on the output from the detection part. It is possible to track the object more quickly and accurately in real time.

Description

Real-time object detection method and apparatus {METHOD AND APPARATUS FOR TRACKING OBJECTS IN REAL TIME}

본 발명은 실시간 객체 검출 방법 및 장치에 관한 것으로, 보다 상세하게는 보다 빠르고 정확하게 객체를 실시간으로 추적할 수 있도록 하기 위한 실시간 객체 검출 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for detecting a real-time object, and more particularly, to a method and apparatus for detecting a real-time object for enabling faster and more accurate object tracking in real time.

오늘날 객체 검출 프로세스는 컴퓨터 비전에서 가장 어려운 과제 중 하나가 되었다. CSR-DCF 추적 알고리즘은 최근 추적 벤치 마크에서 제안되었는데, 이는 DCF 추적에 대한 채널 공간 안정성 개념을 달성하고, HoG 및 색상 이름과 같은 두 가지 간단한 표준 특성만으로 필터 업데이트 및 추적 프로세스의 효율적이고 원활한 통한을 위한 새로운 학습 알고리즘을 제공한다. Today, the object detection process has become one of the most challenging tasks in computer vision. The CSR-DCF tracking algorithm has been proposed in a recent tracking benchmark, which achieves the concept of channel spatial stability for DCF tracking, and provides an efficient and smooth flow of filter update and tracking processes with only two simple standard properties such as HoG and color name We provide a new learning algorithm for

객체 검출(Object detection)은 특정 객체가 어느 위치에 존재하는지 알아내는 탐색(Localization) 문제와 그 객체가 무슨 클래스인지 알아내는 인식(Recognition) 문제를 둘 다 풀어야 하기 때문에, 일반적인 영상 분류(Image classification) 문제보다 더 어려운 문제이다Since object detection has to solve both the localization problem of finding out where a specific object is located and the recognition problem of finding out what class the object is, it is a general image classification method. The problem is more difficult than the problem

더욱이 실시간 객체 검출은 훨씬 더 까다로운 작업이며, CNN 기반 구조를 적용하여 실시간 프레임에서 정확한 객체를 감지하고 식별하는 동시에, 또한 실시간 추적 중에 프레임의 실시간 비디오 시퀀스에서 객체의 포지션(position) 또는 위치를 언급하는 것이 시급하다.Moreover, real-time object detection is a much more challenging task, applying CNN-based structures to detect and identify precise objects in real-time frames, while also referring to the position or position of objects in real-time video sequences of frames during real-time tracking. it is urgent

프레임의 실시간 비디오 시퀀스에는 오버랩핑(overlapping), 막힘, 모션 블러(motion blur), 외관 변화, 조명 변화, 어수선한 배경, 환경 변화와 같은 것들을 포함할 수 있다. 이러한 경우, 모든 종류의 추적 필터, 알고리즘, 방법들은 실패하게 되고, 이후 후속 프레임에 나타나는 객체를 복구하지 못할 수 있다는 문제점이 있다.A real-time video sequence of frames can include things like overlapping, blockage, motion blur, appearance changes, lighting changes, cluttered backgrounds, environmental changes, and more. In this case, all kinds of tracking filters, algorithms, and methods fail, and there is a problem that objects appearing in subsequent frames may not be recovered.

따라서, 앞서 설명한 문제점들을 극복하고 보다 빠르고 정확하게 객체를 실시간으로 추적할 수 있도록 하는 기술이 개발될 필요가 있다.Accordingly, there is a need to develop a technology capable of overcoming the above-described problems and enabling faster and more accurate tracking of an object in real time.

따라서, 본 발명은 상기한 바와 같은 문제점을 해결하기 위하여 제안된 것으로, oopenCV 기반 CRST 추적기를 이용하여 객체를 감지 및 추적하고, 또한 Fast R-CNN 알고리즘을 기반으로 학습된 객체 검출 모델을 이용함으로써 보다 빠르고 정확하게 객체를 실시간으로 추적할 수 있도록 하는 실시간 객체 검출 방법 및 장치를 제공함에 있다.Therefore, the present invention has been proposed to solve the above problems, and it detects and tracks an object using an oopenCV-based CRST tracker, and uses an object detection model learned based on the Fast R-CNN algorithm. An object of the present invention is to provide a real-time object detection method and apparatus capable of quickly and accurately tracking an object in real time.

본 발명의 목적은 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 본 발명이 속하는 기술 분야의 통상의 지식을 가진 자에게 명확히 이해될 수 있을 것이다.Objects of the present invention are not limited to those mentioned above, and other objects not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 실시간 객체 검출 장치는, 적어도 하나 이상의 인터넷 리소스를 통해 데이터 세트를 수집하는 수집부; 상기 수집된 데이터 세트에 대해 Faster R-CNN 알고리즘을 기반으로 학습을 수행하여 객체 검출 모델을 획득하는 학습부; 실시간 비디오 시퀀스를 획득하는 카메라부; 상기 실시간 비디오 시퀀스를 객체 검출 모델에 입력하여 객체를 검출하는 검출부; 및 상기 검출부로부터의 출력을 기반으로 객체를 추적하는 추적부를 포함한다.A real-time object detection apparatus according to the present invention for achieving the above object, the collection unit for collecting data sets through at least one or more Internet resources; a learning unit to obtain an object detection model by performing learning based on a Faster R-CNN algorithm on the collected data set; a camera unit for acquiring a real-time video sequence; a detector for detecting an object by inputting the real-time video sequence into an object detection model; and a tracking unit that tracks the object based on the output from the detection unit.

또한, 상기와 같은 목적을 달성하기 위한 본 발명에 따른 실시간 객체 검출 방법은, 적어도 하나 이상의 인터넷 리소스를 통해 데이터 세트를 수집하는 단계; 상기 수집된 데이터 세트에 대해 Faster R-CNN 알고리즘을 기반으로 학습을 수행하여 객체 검출 모델을 획득하는 단계; 실시간 비디오 시퀀스를 획득하는 단계; 상기 실시간 비디오 시퀀스를 객체 검출 모델에 입력하여 객체를 검출하는 단계; 및 상기 검출부로부터의 출력을 기반으로 객체를 추적하는 단계를 포함한다.In addition, a real-time object detection method according to the present invention for achieving the above object, the method comprising: collecting a data set through at least one or more Internet resources; obtaining an object detection model by performing learning based on a Faster R-CNN algorithm on the collected data set; acquiring a real-time video sequence; inputting the real-time video sequence into an object detection model to detect an object; and tracking the object based on the output from the detection unit.

본 발명에 의하면, openCV 기반 CRST 추적기를 이용하여 객체를 감지 및 추적하고, 또한 Fast R-CNN 알고리즘을 기반으로 학습된 객체 검출 모델을 이용함으로써 보다 빠르고 정확하게 객체를 실시간으로 추적할 수 있도록 한다.According to the present invention, an object is detected and tracked using an openCV-based CRST tracker, and an object detection model learned based on the Fast R-CNN algorithm is used to enable faster and more accurate object tracking in real time.

본 발명의 효과는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술 분야의 통상의 지식을 가진 자에게 명확히 이해될 수 있을 것이다.Effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description.

도 1은 본 발명의 실시예에 따른 객체 검출 장치를 나타내는 블록도,
도 2는 본 발명의 실시예에 따른 객체 검출 방법을 개략적으로 나타내는 도면,
도 3은 본 발명의 실시예에 따른 객체 추적 방법의 구조를 나타내는 도면,
도 4는 본 발명의 실시예에 적용되는 Faster R-CNN의 메인 구성을 나타내는 도면,
도 5는 본 발명의 실시예에 따른 피쳐 맵을 산출하는 일 예를 나타내는 도면,
도 6은 본 발명의 실시예에 따라 네트워크에서 파라미터와 계산의 양을 줄임으로써 이미지의 표현 공간 크기 감소시키는 일 예를 나타내는 도면,
도 7은 본 발명의 실시예에 따른 RPN 네트워크의 개념을 나타내는 도면.1 is a block diagram showing an object detection apparatus according to an embodiment of the present invention;
2 is a diagram schematically showing an object detection method according to an embodiment of the present invention;
3 is a diagram showing the structure of an object tracking method according to an embodiment of the present invention;
4 is a view showing the main configuration of Faster R-CNN applied to an embodiment of the present invention;
5 is a view showing an example of calculating a feature map according to an embodiment of the present invention;
6 is a diagram illustrating an example of reducing the size of the expression space of an image by reducing the amount of parameters and calculations in the network according to an embodiment of the present invention;
7 is a diagram illustrating the concept of an RPN network according to an embodiment of the present invention.

본 발명의 목적 및 효과, 그리고 그것들을 달성하기 위한 기술적 구성들은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 본 발명을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명에서의 기증을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. Objects and effects of the present invention, and technical configurations for achieving them will become clear with reference to the embodiments described below in detail in conjunction with the accompanying drawings. In describing the present invention, if it is determined that a detailed description of a well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. And the terms to be described later are terms defined in consideration of donation in the present invention, which may vary depending on the intention or custom of the user or operator.

그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있다. 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various different forms. Only the present embodiments are provided so that the disclosure of the present invention is complete, and to fully inform those of ordinary skill in the art to which the present invention belongs, the scope of the invention, the present invention is defined by the scope of the claims will only be Therefore, the definition should be made based on the content throughout this specification.

한편, 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 구비할 수 있다는 것을 의미한다.On the other hand, throughout the specification, when it is said that a certain part is "connected" with another part, it is not only "directly connected" but also "indirectly connected" with another member interposed therebetween. include In addition, when a part "includes" a certain component, this means that other components may be further provided without excluding other components unless otherwise stated.

본 발명은 추적할 정확한 객체에 초점을 맞추는 것을 목표로 하며, 또한 실시간 객체 추적 환경으로 인한 카메라 이동, 객체와 카메라 간의 거리 등과 같은 몇 가지 매개 변수를 고려한다. 이를 위해, 객체 검출기로서 예를 들어, 최초의 효율적인 얼굴 검출기인 Viola-jones 알고리즘, Oriented Gradients의 히스토그램, CNN(Convolutional Neural Networks)이라 하는 딥러닝 방법과 같은 모든 객체 검출 방법을 비교한 후, Faster R-CNN 알고리즘을 사용하게 되었다. The present invention aims to focus on the exact object to be tracked, and also takes into account several parameters such as camera movement due to real-time object tracking environment, distance between object and camera, etc. For this, after comparing all object detection methods as object detectors, for example, Viola-jones algorithm, which is the first efficient face detector, histogram of Oriented Gradients, deep learning method called CNN (Convolutional Neural Networks), Faster R -CNN algorithm was used.

이 Fast R-CNN 알고리즘은 몇 가지 혁신 기술을 사용하여 학습 및 테스트 속도를 향상시키는 동시에 감지 정확도를 높인다. This Fast R-CNN algorithm uses several innovations to speed up learning and testing while increasing detection accuracy.

이로써, 본 발명은 모든 객체 특성, 후보 영역, 객체 경계, 각 위치에서 객체가 없는 점수 및 정확한 레이블이 지정된 객체를 포함하는 학습된 검출 모델을 적용한 openCV 기반 CSRT 추적기를 이용하여 객체를 검출 및 추적하고, 이를 Fast R-CNN 알고리즘 기반의 객체 검출기와 결합하여 학습된 객체 검출기 모델을 확보하도록 한다.Thus, the present invention detects and tracks an object using an openCV-based CSRT tracker to which a learned detection model is applied including all object characteristics, candidate regions, object boundaries, scores without objects at each location, and accurately labeled objects, and , combine it with an object detector based on the Fast R-CNN algorithm to secure a learned object detector model.

CNN 기반 객체 검출기를 openCV 기반 CSRT 추적기와 통합하여 Faster R-CNN 기반 객체 검출기 모델을 적용하여 실시간 프레임 시퀀스에서 안정적인 객체 식별자로 추적 알고리즘을 지원한다.By integrating a CNN-based object detector with an openCV-based CSRT tracker, the Faster R-CNN-based object detector model is applied to support tracking algorithms with stable object identifiers in real-time frame sequences.

이하에서는 도면을 기반으로 본 발명에 대해 보다 구체적으로 설명하도록 한다.Hereinafter, the present invention will be described in more detail based on the drawings.

도 1은 본 발명의 실시예에 따른 객체 검출 장치를 나타내는 블록도이다.1 is a block diagram illustrating an object detecting apparatus according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 객체 검출 장치(1)는 수집부(10), 학습부(30), 생성부(50), 검출부(70) 및 추적부(90)를 포함한다.Referring to FIG. 1 , an object detecting apparatus 1 according to an embodiment of the present invention includes a collecting unit 10 , a learning unit 30 , a generating unit 50 , a detecting unit 70 , and a tracking unit 90 . do.

먼저, 수집부(10)는 포스트 블록, 미디어 사이트, Google 리소스 등과 같은 다양한 종류의 인터넷 리소스를 통해 데이터 세트를 수집한다.First, the collection unit 10 collects data sets through various types of Internet resources such as post blocks, media sites, and Google resources.

한편, 학습부(30)는 수집부(10)에서 수집된 데이터 세트를 사전 학습하는데, 이때, Faster R-CNN 알고리즘을 기반으로 학습을 수행한다. 이로써, 객체 검출 모델을 획득할 수 있다. 이 객체 검출 모델은 포스트 블록, 미디어 사이트, Google 리소스 등과 같은 다양한 종류의 인터넷 리소스를 통해 수집되는 데이터 세트를 기반으로 추적하고 수집할 대상을 학습함으로써 생성되는 것이다.Meanwhile, the learning unit 30 pre-learns the data set collected by the collection unit 10 , and at this time, learning is performed based on the Faster R-CNN algorithm. Thereby, an object detection model can be obtained. This object detection model is created by learning what to track and collect based on data sets collected through various kinds of Internet resources such as post blocks, media sites, Google resources, etc.

이 Faster R-CNN 알고리즘은 매우 빠르고 one-stage pipeline을 취하기 때문에 하나의 네트워크만 학습을 하고 실행하면 된다.Because this Faster R-CNN algorithm is very fast and takes a one-stage pipeline, only one network needs to be trained and executed.

예를 들어, 보행자와 차량이라는 두 가지 객체를 추적하고자 하는 경우, 보행자와 차량이라는 두 가지 이미지 클래스가 포함된 데이터 세트를 수집하고, Fast R-CNN 알고리즘을 이용하여 학습을 수행함으로써 객체 검출 모델을 획득한다. For example, if you want to track two objects, a pedestrian and a vehicle, collect a data set that includes two image classes, a pedestrian and a vehicle, and use the Fast R-CNN algorithm to learn the object detection model. acquire

카메라부(50)는 영상, 즉 실시간 비디오 시퀀스를 획득한다.The camera unit 50 acquires an image, that is, a real-time video sequence.

검출부(70)는 실시간 비디오 시퀀스를 객체 검출 모델에 입력하여 객체를 검출한다. 이와 같이 수집된 데이터 세트에 대해 Faster R-CNN 알고리즘을 기반으로 학습을 수행하여 획득된 객체 검출 모델을 이용하여 객체를 검출하기 때문에 분류 정밀도가 높다.The detection unit 70 detects an object by inputting a real-time video sequence into the object detection model. The classification accuracy is high because the object is detected using the object detection model obtained by performing learning based on the Faster R-CNN algorithm on the collected data set.

추적부(90)는 검출부(50)로부터의 출력을 기반으로 객체를 추적한다.The tracking unit 90 tracks the object based on the output from the detection unit 50 .

이?, 추적부(70)는 CNN 기반 객체 검출기를 openCV 추적기와 통합한 것으로, 실시간 프레임에서 안정적인 객체 식별자로 추적 알고리즘을 지원한다.The tracking unit 70 integrates a CNN-based object detector with an openCV tracker, and supports a tracking algorithm with a stable object identifier in a real-time frame.

도 2는 본 발명의 실시예에 따른 객체 검출 단계를 개략적으로 나타내는 도면이다.2 is a diagram schematically illustrating an object detection step according to an embodiment of the present invention.

도 2를 참조하면, 먼저 수집부(10)에 의해 수집된 데이터 세트를 학습부(30)가 Fast R-CNN 알고리즘을 기반으로 학습하여 객체 검출 모델을 획득한다. Referring to FIG. 2 , first, the learning unit 30 learns the data set collected by the collecting unit 10 based on the Fast R-CNN algorithm to obtain an object detection model.

이후, 검출부(50)가 실시간 비디오 시퀀스를 객체 검출 모델에 입력하여 객체를 검출하면, 추적부(70)가 openCV 라이브러리에서 구현된 딥러닝 학습 기반 객체 검출과 CSRT 추적기를 통합함으로써 새로이 적용되는 CSR-DCF(Channel and Spatial Reliability) 알고리즘을 기반으로 객체를 추적한다. 이와 같은 객체 추적 방법의 구조는 도 3에 구체적으로 도시되어 있다. Then, when the detection unit 50 inputs a real-time video sequence to the object detection model to detect an object, the tracking unit 70 integrates the deep learning learning-based object detection implemented in the openCV library and the CSRT tracker, thereby newly applied CSR- Object tracking is based on the Channel and Spatial Reliability (DCF) algorithm. The structure of such an object tracking method is specifically illustrated in FIG. 3 .

도 4는 본 발명의 실시예에 적용되는 Faster R-CNN의 메인 구성을 나타내는 도면이다.4 is a diagram showing the main configuration of a Faster R-CNN applied to an embodiment of the present invention.

객체 검출는 이미지에서 관심 객체를 배경과 구분해 식별하는 자동화 기법으로, 컴퓨터 비전(Computer Vision) 기술의 하위 집합이기도 하다. 올바른 객체 검출를 위해서는 경계 박스(Bounding Box)를 설정해 객체를 나타내는 사물의 카테고리를 연관시켜야 한다. 이때, 첨단 기법인 딥러닝(Deep Learning)이 활용되는데, 본 발명에서는 Faster R-CNN 알고리즘을 기반으로 딥러닝을 수행한다.Object detection is an automated technique that distinguishes an object of interest from the background in an image and is a subset of computer vision technology. For correct object detection, a bounding box should be established to associate categories of objects representing objects. At this time, an advanced technique, Deep Learning, is used, and in the present invention, deep learning is performed based on the Faster R-CNN algorithm.

도 4를 참조하면, Faster R-CNN은 컨볼루션 레이어, 영역 제안 네트워크(Region Proposal Network, “RPS”), 클래스(class) 및 경계 박스 예측의 3가지 주요 부분으로 구성된다. Referring to FIG. 4 , Faster R-CNN consists of three main parts: a convolution layer, a region proposal network (“RPS”), a class, and a bounding box prediction.

먼저, 컨벌루션 레이어는 학습 이미지에 존재하는 훈련 모양과 색상을 통해 학습하는 학습 필터를 통해 데이터 세트에서 이미지의 적절한 기능을 추출하는데 도움이 된다. 감지하고 추적하고자 하는 타겟은 사람 또는 보행자라 할 수 있다. 객체 검출의 학습 속도는 이미지의 양과 CNN의 구조 레이어 조합에 따라 다르다.First, a convolutional layer helps extract the appropriate features of an image from a dataset through a training filter that learns from the training shapes and colors present in the training image. A target to be detected and tracked may be a person or a pedestrian. The learning speed of object detection depends on the amount of images and the combination of the structure layers of the CNN.

일반적으로 컨볼루션 네트워크는 도 5에 도시된 바와 같이 컨볼루션 레이어, 풀링 레이어, 활성화 레이어 및 풀리-커넥티드 레이어(Fully-connected)라 불리는 레이어의 마지막 구성요소 또는, 분류나 검출과 같은 적절한 작업에 적용될 다른 확장된 요소로 구성된다. In general, a convolutional network is the last component of a layer called a convolutional layer, a pooling layer, an activation layer, and a fully-connected layer, as shown in FIG. 5, or an appropriate task such as classification or detection. It consists of other extended elements to be applied.

입력 이미지 전체에서 필터를 슬라이딩함으로써 컨볼루션을 계산하고 결과는 도 6에 도시된 앵커 내에서 피쳐 맵(feature map)이라는 2 차원 매트릭스로 나타낸다. 피쳐 맵을 계산한 후 도 7에 도시된 피쳐 맵에서 값이 낮은 픽셀을 제거하는 특성의 수가 감소하는 풀링 레이어가 제공된다.The convolution is computed by sliding the filter over the input image and the result is presented as a two-dimensional matrix called a feature map within the anchor shown in FIG. 6 . After calculating the feature map, a pooling layer is provided in which the number of features that remove low-value pixels from the feature map shown in FIG. 7 is reduced.

다음으로, RPN은 객체가 위치한 영역에 대한 제안을 생성하기 위한 작은 신경망으로, 컨볼루셔널 피쳐 맵 위에 작은 앵커 박스가 있고, 객체가 있는지 여부를 예측하고, 또한 도 8에 표시된 바와 같이 객체의 경계 박스를 예측한다.Next, the RPN is a small neural network for generating a proposal for the region where the object is located, with a small anchor box on the convolutional feature map, predicting whether the object is present, and also the boundary of the object as shown in FIG. predict the box

객체를 포함할 가능성이 높은 영역을 선택적 탐색(Selective Search)같은 컴퓨터 비전 기술을 활용하거나 딥러닝 기반의 영역 제안 네트워크(RPN; Region Proposal Network)를 통해 선택한다. 후보군의 윈도우 세트를 취합하면 회귀 모델과 분류 모델의 수를 공식화해 객체 검출를 할 수 있다. 이때, Faster R-CNN 알고리즘이 포함된다. A region that is likely to contain an object is selected using a computer vision technology such as selective search or through a deep learning-based Region Proposal Network (RPN). By collecting the window set of the candidate group, the number of regression and classification models can be formulated for object detection. In this case, the Faster R-CNN algorithm is included.

여기서 고려해야 할 점은 RPN이 윈도우를 생성하는 방법이다. RPN은 YOLO와 마찬가지로 앵커 박스를 사용하되, YOLO 알고리즘과는 다르게 앵커 박스가 데이터로부터 생성되는 것이 아니라 고정된 크기와 형태로 생성된다. 이 앵커 박스는 이미지를 보다 조밀하게 커버할 수 있다. RPN은 여러 객체 카테고리에 대한 분류 대신 윈도우의 객체 포함 유무에 대한 이진 분류(Binary Classification)만 수행한다.The point to consider here is how the RPN creates the window. RPN uses an anchor box like YOLO, but unlike the YOLO algorithm, the anchor box is not created from data, but is created with a fixed size and shape. This anchor box can cover the image more densely. Instead of classifying multiple object categories, RPN only performs binary classification on whether or not objects are included in the window.

RPN에는 대상 객체가 있는 제안의 확률을 결정하는 분류기가 있으며, 회귀 분석은 제안의 좌표를 회귀시킨다. 도 8에서 각 위치에 적용된 k 개의 앵커 박스는 모든 위치에서 동일한 것을 사용하는 변환 불변형임을 의미한다. 회귀는 분류가 확률을 제공한 후 각 (회귀) 앵커가 객체를 표시하는 앵커 박스에서 오프셋을 제공한다.RPN has a classifier that determines the probability of a proposal with a target object, and regression analysis regresses the coordinates of the proposal. In FIG. 8 , k anchor boxes applied to each position are transformation invariant using the same in all positions. Regression gives the offset from the anchor box where each (regression) anchor represents an object after the classification gives the probability.

마지막으로 클래스 및 경계 박스 예측은 완전히 연결된 신경망 계층을 사용하는 객체 감지의 마지막 단계이며 입력으로 RPN에서 영역을 제안하고 객체 클래스(분류), 경계 박스(회귀)도 예측한다.Finally, class and bounding box prediction is the last step in object detection using a fully connected neural network layer, suggesting regions from the RPN as input, and also predicting object classes (classification), bounding boxes (regression).

앞서 설명한 바와 같이, 본 발명은 ooenCV 기반 CRST 추적기를 이용하여 객체를 감지 및 추적하고, 또한 Fast R-CNN 알고리즘을 기반으로 학습된 객체 검출 모델을 이용함으로써 보다 빠르고 정확하게 객체를 실시간으로 추적할 수 있도록 한다.As described above, the present invention detects and tracks an object using an ooenCV-based CRST tracker, and uses an object detection model learned based on the Fast R-CNN algorithm to track an object faster and more accurately in real time. do.

본 명세서와 도면에는 본 발명의 바람직한 실시예에 대하여 개시하였으며, 비록 특정 용어들이 사용되었으나, 이는 단지 본 발명의 기술 내용을 쉽게 설명하고 발명의 이해를 돕기 위한 일반적인 의미에서 사용된 것이지, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시예 외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형예들이 실시 가능하다는 것은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명한 것이다.In the present specification and drawings, preferred embodiments of the present invention have been disclosed, and although specific terms are used, these are only used in a general sense to easily explain the technical content of the present invention and help the understanding of the present invention. It is not intended to limit the scope. It will be apparent to those of ordinary skill in the art to which the present invention pertains that other modifications based on the technical spirit of the present invention can be implemented in addition to the embodiments disclosed herein.

Claims

A real-time object detection apparatus comprising:
a collection unit for collecting data sets through at least one or more Internet resources;
a learning unit for acquiring an object detection model by performing learning based on a Faster R-CNN algorithm on the collected data set;
a camera unit for acquiring a real-time video sequence;
a detector for detecting an object by inputting the real-time video sequence into an object detection model; and
and a tracking unit for tracking an object based on the output from the detection unit.

According to claim 1,
The tracking unit,
A real-time object detection device characterized by integrating a CNN-based object detector with an openCV tracker, and supporting a tracking algorithm with a stable object identifier in a frame.

According to claim 1,
The real-time object detection apparatus according to claim 1, wherein the collecting unit collects images captured by a drone as a data set, and the camera unit is provided in the drone to obtain a real-time video sequence.

The real-time object detection method is,
collecting data sets via at least one or more Internet resources;
obtaining an object detection model by performing learning based on a Faster R-CNN algorithm on the collected data set;
acquiring a real-time video sequence;
inputting the real-time video sequence into an object detection model to detect an object; and
and tracking an object based on the output from the detection unit.

5. The method of claim 4,
The tracking step is
A real-time object detection method, characterized in that it supports a tracking algorithm with a stable object identifier in a frame.

4. The method of claim 3,
The data set is an image captured by a drone, and the real-time video sequence is acquired through a camera provided in the drone.