KR20220063544A

KR20220063544A - Apparatus for predicting traffic line of box-level multiple object using only position information of box-level multiple object

Info

Publication number: KR20220063544A
Application number: KR1020200149533A
Authority: KR
Inventors: 김은태; 이영조; 성홍제; 현준혁
Original assignee: 연세대학교 산학협력단
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2022-05-17
Also published as: KR102454281B1

Abstract

An apparatus for predicting traffic lines of multiple box-level objects using only positional information of the multiple objects according to a preferred embodiment of the present invention can quickly predict movements of the multiple objects via light operation by receiving only box-level positional information of the multiple objects as input through a long short-term memory (LSTM) network structure and predicting future traffic lines of the multiple objects at a box level. In addition, the apparatus can predict the movements of multiple objects even in situations where the objects whose movements are to be predicted are hidden by other surroundings or where the objects overlap each other by predicting the positions of the multiple objects using only the box-level positional information of the multiple objects.

Description

Apparatus for predicting traffic line of box-level multiple object using only position information of box-level multiple object

본 발명은 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치에 관한 것으로서, 더욱 상세하게는 다중 객체의 과거 위치를 기반으로 다중 객체의 미래 위치를 예측하는, 장치에 관한 것이다.The present invention relates to a box-level multi-object motion prediction apparatus using only box-level location information of multiple objects, and more particularly, to an apparatus for predicting the future location of multiple objects based on past locations of multi-objects. .

다중 객체의 움직임을 분석하는 다중 객체 추적(Multiple Object Tracking, MOT)이나 경로 예측(Trajectory Prediction)을 하는 종래 기술들은 많은 실행 시간이 소요된다. 이는 움직임을 분석할 여러 객체들 사이의 관계뿐만 아니라 객체를 제외한 배경과 객체들 사이의 관계까지 고려하여 픽셀-레벨(pixel-level)별 움직임을 분석하기 때문이다. 이러한 다중 객체의 움직임을 분석하는 업무는 보통 자율 주행 자동차나 로봇과 같은 기계 장치의 자율적 운동에서 사고 예방을 위한 장치에 많이 쓰이고 실시간에 가까울수록 기계 장치의 운동이 자연스럽고 더 빨라 질 수 있다.Conventional techniques for performing multiple object tracking (MOT) or trajectory prediction for analyzing the motion of multiple objects require a lot of execution time. This is because motion is analyzed for each pixel-level by considering not only the relationship between various objects to be analyzed, but also the relationship between the background and objects except for the object. The task of analyzing the motion of these multi-objects is usually used in devices for accident prevention in the autonomous movement of mechanical devices such as autonomous vehicles or robots, and the closer to real time, the more natural and faster the motion of the mechanical device.

최근 기계 장치들의 자율적 움직임 변화를 살펴보면, 먼저, 카메라를 통해 주변을 둘러본다. 그후, 장치 내의 알고리즘들을 통해 주변의 변화를 예측한다. 예측된 변화에서 충돌과 같은 문제 상황이 발생할 수 있다면 최대한 발생하지 않는 방향으로 기계 장치에게 동작 명령을 내릴 것이고, 이 명령에 따라 기계 장치는 움직일 것이다. 이때, 주변을 둘러보는 카메라는 대부분 1초에 60프레임을 찍을 수 있을 정도로 빠르다. 동작 명령 또한 대부분 전기적 신호로 움직이기 때문에 사고가 예상되는 상황에서 빠르게 기계 장치의 움직임을 바꿀 수 있다. 기계 장치의 움직임을 부자연스럽게 만들고 움직임의 속도를 낮추는 부분은 바로 주변의 변화 상황을 예측하는 알고리즘의 연산이다. 그렇기 때문에 알고리즘 자체의 연산이 가벼워지면 기계 장치의 상황에 대한 반응성이 빨라 질 수 있다.Looking at the recent changes in the autonomous movement of mechanical devices, we first look around us through the camera. It then predicts changes in its surroundings through algorithms within the device. If a problem situation such as a collision can occur in the predicted change, it will give an operation command to the mechanical device in a direction that does not occur as much as possible, and the mechanical device will move according to this command. At this time, most cameras looking around are fast enough to shoot 60 frames per second. Since most of the operation commands are also driven by electrical signals, the movement of mechanical devices can be quickly changed in the event of an accident. The part that makes the movement of the mechanical device unnatural and slows the movement is the operation of the algorithm that predicts the change in the surroundings. Therefore, if the operation of the algorithm itself becomes lighter, the responsiveness to the situation of the mechanical device can be increased.

또한, 종래의 기술들은 이미지를 기반으로 다중 객체들의 움직임을 분석하기 때문에 객체들끼리 가려진 상황에서 기존 객체를 그대로 따라가는데 어려움이 있다.In addition, since the prior art analyzes the motion of multiple objects based on images, it is difficult to follow the existing objects as they are in a situation where the objects are hidden.

본 발명이 이루고자 하는 목적은, 시계열 데이터의 미래 예측에 많이 사용되는 장단기 기억 신경망(long short term memory, LSTM) 구조를 통해 다중 객체들의 박스-레벨 위치 정보만을 입력으로 받아 미래의 다중 객체 동선을 박스-레벨로 예측하는, 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치를 제공하는 데 있다.An object of the present invention is to receive only box-level position information of multiple objects as input through a long short term memory (LSTM) structure, which is often used for future prediction of time series data, and box the future multi-object movement. - To provide a box-level multi-object movement line prediction apparatus using only box-level position information of multi-objects that predicts by level.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Other objects not specified in the present invention may be additionally considered within the scope that can be easily inferred from the following detailed description and effects thereof.

상기의 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치는, 학습 데이터의 다중 객체에 대한 박스-레벨 위치 정보만을 이용하여, 장단기 기억 신경망(LSTM)으로 이루어지는 다중 객체 위치 예측 모델을 학습하는 학습부; 및 미리 설정된 개수의 연속된 과거 프레임들에 대한 다중 객체의 박스-레벨 위치 정보들을 상기 다중 객체 위치 예측 모델에 입력하고, 상기 다중 객체 위치 예측 모델로부터 출력되는 미리 설정된 개수의 연속된 미래 프레임들에 대한 다중 객체의 박스-레벨 위치 정보들을 기반으로, 다중 객체의 미래의 동선을 예측하는 예측부;를 포함한다.The box-level multi-object movement line prediction apparatus using only box-level position information of multiple objects according to a preferred embodiment of the present invention for achieving the above object, using only box-level position information for multiple objects of training data , a learning unit for learning a multi-object position prediction model consisting of a long-term memory neural network (LSTM); and input box-level position information of multiple objects for a preset number of consecutive past frames to the multi-object position prediction model, and to a preset number of consecutive future frames output from the multi-object position prediction model. and a prediction unit for predicting the future movement of multiple objects based on box-level position information of multiple objects.

여기서, 상기 박스-레벨 객체 위치 정보는, 객체를 둘러싸는 바운딩 박스(bounding box)의 네 모서리의 좌표값으로 이루어질 수 있다.Here, the box-level object location information may include coordinate values of four corners of a bounding box surrounding the object.

여기서, 상기 학습부는, 상기 학습 데이터에 포함된 학습 동영상에 대하여, 프레임에 대한 다중 객체의 박스-레벨 위치 정보들을 이용하여 다중 객체의 위치를 미리 설정된 최대 객체 개수x8 형태의 행렬에 채우고, 행렬의 나머지 부분을 0으로 패딩하여 프레임에 대한 행렬을 획득하고, 획득한 행렬을 이용하여 상기 다중 객체 위치 예측 모델을 학습할 수 있다.Here, with respect to the learning video included in the learning data, the learning unit fills a matrix of the form of a preset maximum number of objects x 8 by using the box-level position information of the multiple objects with respect to the frame, A matrix for a frame may be obtained by padding the remaining part with 0, and the multi-object position prediction model may be learned using the obtained matrix.

여기서, 상기 예측부는, 미리 설정된 개수의 연속된 과거 프레임들 각각에 대하여, 과거 프레임에 대한 다중 객체의 박스-레벨 위치 정보들을 이용하여 다중 객체의 위치를 미리 설정된 최대 객체 개수x8 형태의 행렬에 채우고, 행렬의 나머지 부분을 0으로 패딩하여 과거 프레임에 대한 행렬을 획득하고, 미리 설정된 개수의 연속된 과거 프레임들 각각에 대응되는 복수개의 행렬을 상기 다중 객체 위치 예측 모델에 입력하고, 상기 다중 객체 위치 예측 모델로부터 출력되는 복수개의 행렬을 이용하여 미리 설정된 개수의 연속된 미래 프레임들에 대한 다중 객체의 박스-레벨 위치 정보들을 획득할 수 있다.Here, the predictor, for each of a preset number of consecutive past frames, fills the position of the multiple objects in a matrix of the form of a preset maximum number of objects x 8 by using box-level position information of the multiple objects for the past frame, , obtain a matrix for a past frame by padding the rest of the matrix with 0, input a plurality of matrices corresponding to each of a preset number of consecutive past frames into the multi-object position prediction model, and the multi-object position By using a plurality of matrices output from the prediction model, box-level position information of multiple objects for a preset number of consecutive future frames may be obtained.

본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치에 의하면, 장단기 기억 신경망(long short term memory, LSTM) 구조를 통해 다중 객체들의 박스-레벨 위치 정보만을 입력으로 받아 미래의 다중 객체 동선을 박스-레벨로 예측함으로써, 가벼운 연산을 통해 다중 객체의 움직임을 빠르게 예측할 수 있다.According to the box-level multi-object movement prediction apparatus using only box-level position information of multiple objects according to a preferred embodiment of the present invention, the box-level positions of multiple objects through a long short term memory (LSTM) structure By receiving only information as input and predicting the future multi-object movement at box-level, it is possible to quickly predict the movement of multiple objects through light calculations.

또한, 다중 객체의 박스-레벨 위치 정보만을 이용하여 다중 객체의 위치를 예측함으로써, 움직임을 예측할 객체가 다른 환경들에 의해 가려진 상황이나 객체들끼리 겹치는 상황에서도 다중 객체의 움직임을 예측할 수 있다.In addition, by predicting the positions of multiple objects using only the box-level position information of multiple objects, the motion of multiple objects can be predicted even in a situation in which the object to be motion predicted is obscured by other environments or overlaps with each other.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치를 설명하기 위한 블록도이다.
도 2는 본 발명의 바람직한 실시예에 따른 박스-레벨 객체 위치를 설명하기 위한 도면으로, 도 2의 (a)는 종래의 바운딩 박스의 위치를 나타내는 방법이고, 도 2의 (b)는 본 발명에 따른 바운딩 박스의 위치를 나타내는 방법이다.
도 3은 본 발명의 바람직한 실시예예 따른 다중 객체 위치 예측 모델의 일례를 설명하기 위한 도면이다.
도 4는 본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치의 성능을 설명하기 위한 도면이다.1 is a block diagram illustrating an apparatus for predicting a box-level multi-object motion using only box-level position information of multiple objects according to a preferred embodiment of the present invention.
Figure 2 is a view for explaining the position of a box-level object according to a preferred embodiment of the present invention, Figure 2 (a) is a method showing the position of the conventional bounding box, Figure 2 (b) is the present invention This is a method of indicating the position of the bounding box according to
3 is a diagram for explaining an example of a multi-object position prediction model according to a preferred embodiment of the present invention.
4 is a diagram for explaining the performance of an apparatus for predicting a box-level multi-object motion using only box-level position information of multiple objects according to a preferred embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 실시 예를 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 게시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 게시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Advantages and features of the present invention, and a method for achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments published below, but may be implemented in various different forms, and only these embodiments make the publication of the present invention complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly defined in particular.

본 명세서에서 "제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.In the present specification, terms such as “first” and “second” are for distinguishing one component from other components, and the scope of rights should not be limited by these terms. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

본 명세서에서 각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In the present specification, identification symbols (eg, a, b, c, etc.) in each step are used for convenience of description, and identification symbols do not describe the order of each step, and each step is clearly Unless a specific order is specified, the order may differ from the specified order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 명세서에서, "가진다", "가질 수 있다", "포함한다" 또는 "포함할 수 있다"등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this specification, expressions such as “have”, “may have”, “include” or “may include” indicate the existence of a corresponding feature (eg, a numerical value, function, operation, or component such as a part). and does not exclude the presence of additional features.

또한, 본 명세서에 기재된 '~부'라는 용어는 소프트웨어 또는 FPGA(field-programmable gate array) 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터 구조들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다.In addition, the term '~ unit' as used herein means software or a hardware component such as a field-programmable gate array (FPGA) or ASIC, and '~ unit' performs certain roles. However, '-part' is not limited to software or hardware. '~' may be configured to reside on an addressable storage medium or may be configured to refresh one or more processors. Thus, as an example, '~' refers to components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data structures and variables. The functions provided in the components and '~ units' may be combined into a smaller number of components and '~ units' or further separated into additional components and '~ units'.

이하에서 첨부한 도면을 참조하여 본 발명에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치의 바람직한 실시예에 대해 상세하게 설명한다.Hereinafter, a preferred embodiment of a box-level multi-object movement line prediction apparatus using only box-level location information of multi-objects according to the present invention will be described in detail with reference to the accompanying drawings.

먼저, 도 1 내지 도 3을 참조하여 본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치에 대하여 설명한다.First, an apparatus for predicting a box-level multi-object movement line using only box-level position information of multiple objects according to a preferred embodiment of the present invention will be described with reference to FIGS. 1 to 3 .

도 1은 본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치를 설명하기 위한 블록도이고, 도 2는 본 발명의 바람직한 실시예에 따른 박스-레벨 객체 위치를 설명하기 위한 도면으로, 도 2의 (a)는 종래의 바운딩 박스의 위치를 나타내는 방법이고, 도 2의 (b)는 본 발명에 따른 바운딩 박스의 위치를 나타내는 방법이며, 도 3은 본 발명의 바람직한 실시예예 따른 다중 객체 위치 예측 모델의 일례를 설명하기 위한 도면이다.1 is a block diagram illustrating an apparatus for predicting a box-level multi-object movement using only box-level position information of multiple objects according to a preferred embodiment of the present invention, and FIG. 2 is a box-level according to a preferred embodiment of the present invention. As a diagram for explaining the position of a level object, Fig. 2 (a) is a method showing the position of a conventional bounding box, and Fig. 2 (b) is a method showing the position of the bounding box according to the present invention, Fig. 3 is a diagram for explaining an example of a multi-object position prediction model according to a preferred embodiment of the present invention.

도 1을 참조하면, 본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치(100)는 시계열 데이터의 미래 예측에 많이 사용되는 장단기 기억 신경망(long short term memory, LSTM) 구조를 통해 다중 객체들의 박스-레벨 위치 정보만을 입력으로 받아 미래의 다중 객체 동선을 박스-레벨로 예측한다.Referring to FIG. 1 , an apparatus 100 for predicting a box-level multi-object movement using only box-level position information of multiple objects according to a preferred embodiment of the present invention includes a long- and short-term memory neural network (long and short-term memory) that is frequently used for future prediction of time series data. Through the short term memory (LSTM) structure, only the box-level position information of multiple objects is received as input and the future multi-object movement is predicted at the box-level.

여기서, 박스-레벨 객체 위치 정보는 객체를 둘러싸는 바운딩 박스(bounding box)의 네 모서리의 좌표값으로 이루어질 수 있다. 즉, 본 발명은 도 2의 (a)에 도시된 바와 같이, 객체를 둘러싸는 바운딩 박스의 중심 좌표(x, y), 높이(h) 및 너비(w)로 이루어지는 정보인 "(x, y, w, h)"를 박스-레벨 객체 위치로 이용하지 않고, 도 2의 (b)에 도시된 바와 같이, 객체를 둘러싸는 바운딩 박스의 네 모서리의 (x, y) 좌표값이 순서대로 이루어지는 정보인 "(x₁, y₁, x₂, y₂, x₃, y₃, x₄, y₄)"를 박스-레벨 객체 위치로 이용한다. 이에 따라, 축 방향으로 정렬된 경계 상자(Axis-Aligned Bounding Box, AABB)뿐만 아니라 방향성이 있는 경계 상자(Oriented Bounding Box, OBB)의 검출에도 사용될 수 있다.Here, the box-level object position information may be composed of coordinate values of four corners of a bounding box surrounding the object. That is, the present invention, as shown in (a) of FIG. 2, is information consisting of the center coordinates (x, y), height (h) and width (w) of the bounding box surrounding the object "(x, y)" , w, h)" is not used as the box-level object position, and as shown in FIG. 2 (b), the (x, y) coordinate values of the four corners of the bounding box surrounding the object are sequential The information "(x ₁ , y ₁ , x ₂ , y ₂ , x ₃ , y ₃ , x ₄ , y ₄ )" is used as the box-level object position. Accordingly, it can be used to detect not only an axially aligned bounding box (Axis-Aligned Bounding Box, AABB) but also an oriented bounding box (OBB).

이를 위해, 박스-레벨 다중 객체 동선 예측 장치(100)는 학습부(110) 및 예측부(130)를 포함할 수 있다.To this end, the box-level multi-object movement prediction apparatus 100 may include a learning unit 110 and a prediction unit 130 .

학습부(110)는 학습 데이터의 다중 객체에 대한 박스-레벨 위치 정보만을 이용하여, 장단기 기억 신경망(LSTM)으로 이루어지는 다중 객체 위치 예측 모델을 학습한다.The learning unit 110 learns a multi-object location prediction model made of a long-short-term memory neural network (LSTM) using only box-level location information for multi-objects of the training data.

이때, 학습부(110)는 학습 데이터에 포함된 학습 동영상에 대하여, 프레임에 대한 다중 객체의 박스-레벨 위치 정보들을 이용하여 다중 객체의 위치를 미리 설정된 최대 객체 개수x8 형태의 행렬에 채우고, 행렬의 나머지 부분을 0으로 패딩하여 프레임에 대한 행렬을 획득하고, 획득한 행렬을 이용하여 다중 객체 위치 예측 모델을 학습할 수 있다. 여기서, 미리 설정된 최대 객체 개수는 동영상의 프레임들 중에서 프레임 단위로 검출할 수 있는 객체의 최대 개수보다 큰 값을 가지며, 예컨대, 1,000개일 수 있다.At this time, the learning unit 110 fills a matrix of the form of a preset maximum number of objects x 8 by using the box-level position information of the multiple objects for the frame with respect to the learning video included in the training data, and the matrix It is possible to obtain a matrix for a frame by padding the remaining part of ? with 0, and learn a multi-object position prediction model using the obtained matrix. Here, the preset maximum number of objects has a value greater than the maximum number of objects that can be detected in units of frames among frames of a moving picture, and may be, for example, 1,000.

예컨대, 학습 동영상 내의 프레임별 객체의 개수는 일정하지 않기 때문에, 프레임별 최대 객체 개수(예컨대, 1,000개 등)를 미리 설정하고, 프레임에 대한 다중 객체의 박스-레벨 위치 정보들을 이용하여 미리 설정된 최대 객체 개수x8 형태의 행렬에 검출된 다중 객체의 해당 위치에 값을 채운 후, 행렬의 나머지 부분, 즉 (미리 설정된 최개 객체 개수 - 검출된 다중 객체 개수)x8 형태의 행렬에 0으로 패딩한다. 최종적으로 다중 객체 위치 예측 모델에 미리 설정된 최대 객체 개수x8의 행렬이 입력 벡터로 들어가고, 출력 벡터 역시 미리 설정된 최대 객체 개수x8의 행렬이 나오게 된다.For example, since the number of objects per frame in the learning video is not constant, the maximum number of objects per frame (eg, 1,000, etc.) is set in advance, and the preset maximum number of objects using box-level position information of multiple objects for the frame is used. After filling the corresponding positions of the detected multiple objects in the object count x 8 matrix, the remaining part of the matrix, that is, the (preset maximum number of objects - the number of detected multiple objects) x 8 matrix is padded with zeros. Finally, a matrix of the maximum number of objects x8 preset in the multi-object position prediction model is entered as an input vector, and an output vector is also a matrix of the preset maximum number of objects x8.

예측부(130)는 학습부(110)를 통해 학습된 다중 객체 위치 예측 모델을 통해, 입력된 동영상을 기반으로 다중 객체의 미래 위치를 예측한다.The prediction unit 130 predicts the future positions of multiple objects based on the input video through the multi-object position prediction model learned through the learning unit 110 .

즉, 예측부(130)는 미리 설정된 개수(예컨대, 3개 등)의 연속된 과거 프레임들에 대한 다중 객체의 박스-레벨 위치 정보들을 다중 객체 위치 예측 모델에 입력할 수 있다. 종래의 다중 객체 위치 예측 방법은 장단기 기억 신경망(LSTM)의 입력 벡터로 동영상 중 프레임 일부와 각각의 프레임에 대한 박스 정보(또는 박스 정보를 이용하여 원본 이미지에서 잘라낸 패치들)를 이용한다. 이에 반면, 본 발명은 입력 벡터로 이미지 대신 다중 객체에 대한 박스-레벨 위치 정보를 이용한다.That is, the prediction unit 130 may input the box-level position information of multiple objects for a preset number (eg, three, etc.) of consecutive past frames to the multi-object position prediction model. The conventional multi-object position prediction method uses a part of a frame of a moving picture and box information (or patches cut from an original image using box information) for each frame as an input vector of a long-term memory neural network (LSTM). On the other hand, the present invention uses box-level position information for multiple objects instead of images as input vectors.

그리고, 예측부(130)는 다중 객체 위치 예측 모델로부터 출력되는 미리 설정된 개수(예컨대, 3개 등)의 연속된 미래 프레임들에 대한 다중 객체의 박스-레벨 위치 정보들을 기반으로, 다중 객체의 미래의 동선을 예측할 수 있다.And, the prediction unit 130 is based on the box-level position information of multiple objects for a preset number (eg, three, etc.) of consecutive future frames output from the multi-object position prediction model, the future of the multi-object can predict the route of

이때, 예측부(130)는 미리 설정된 개수의 연속된 과거 프레임들 각각에 대하여, 과거 프레임에 대한 다중 객체의 박스-레벨 위치 정보들을 이용하여 다중 객체의 위치를 미리 설정된 최대 객체 개수x8 형태의 행렬에 채우고, 행렬의 나머지 부분을 0으로 패딩하여 과거 프레임에 대한 행렬을 획득하고, 미리 설정된 개수의 연속된 과거 프레임들 각각에 대응되는 복수개의 행렬을 다중 객체 위치 예측 모델에 입력하고, 다중 객체 위치 예측 모델로부터 출력되는 복수개의 행렬을 이용하여 미리 설정된 개수의 연속된 미래 프레임들에 대한 다중 객체의 박스-레벨 위치 정보들을 획득할 수 있다. 여기서, 미리 설정된 최대 객체 개수는 동영상의 프레임들 중에서 프레임 단위로 검출할 수 있는 객체의 최대 개수보다 큰 값을 가지며, 예컨대, 1,000개일 수 있다.In this case, for each of the preset number of consecutive past frames, the prediction unit 130 determines the positions of the multiple objects using box-level position information of the multiple objects for the past frame in a preset maximum number of objects x 8 matrix. to obtain a matrix for the past frame by padding the rest of the matrix with 0, and input a plurality of matrices corresponding to each of a preset number of consecutive past frames into the multi-object position prediction model, and the multi-object position By using a plurality of matrices output from the prediction model, box-level position information of multiple objects for a preset number of consecutive future frames may be obtained. Here, the preset maximum number of objects has a value greater than the maximum number of objects that can be detected in units of frames among frames of a moving picture, and may be, for example, 1,000.

예컨대, 도 3에 도시된 바와 같이, 본 발명에 따른 다중 객체 위치 예측 모델은 입력 벡터와 출력 벡터로 이미지 대신에 다중 객체에 대한 박스-레벨 위치 정보들을 이용한다. 다중 객체 위치 예측 모델은 과거의 연속된 3개의 프레임에 대한 다중 객체의 박스-레벨 위치 정보들을 입력받고, 이후의 연속된 3개의 프레임에 대한 다중 객체의 박스-레벨 위치 정보들을 예측한다. 즉, 동영상 내의 프레임별 객체의 개수는 일정하지 않기 때문에, 프레임별 최대 객체 개수(예컨대, 1,000개 등)를 미리 설정하고, 프레임에 대한 다중 객체의 박스-레벨 위치 정보들을 이용하여 미리 설정된 최대 객체 개수x8 형태의 행렬에 검출된 다중 객체의 해당 위치에 값을 채운 후, 행렬의 나머지 부분, 즉 (미리 설정된 최개 객체 개수 - 검출된 다중 객체 개수)x8 형태의 행렬에 0으로 패딩한다. 최종적으로 다중 객체 위치 예측 모델에 미리 설정된 최대 객체 개수x8의 행렬이 입력 벡터로 들어가고, 출력 벡터 역시 미리 설정된 최대 객체 개수x8의 행렬이 나오게 된다.For example, as shown in FIG. 3 , the multi-object position prediction model according to the present invention uses box-level position information for multiple objects instead of images as input vectors and output vectors. The multi-object position prediction model receives box-level position information of multiple objects for three consecutive frames in the past, and predicts box-level position information of multiple objects for three consecutive frames thereafter. That is, since the number of objects per frame in the video is not constant, the maximum number of objects per frame (eg, 1,000, etc.) is preset and the maximum object preset using box-level position information of multiple objects for the frame. After filling the corresponding positions of the detected multi-objects in the number x 8 matrix, the remaining part of the matrix, that is, the (preset maximum number of objects - the number of detected multi-objects) x 8 matrix is padded with zeros. Finally, a matrix of the maximum number of objects x8 preset in the multi-object position prediction model is entered as an input vector, and an output vector is also a matrix of the preset maximum number of objects x8.

그러면, 도 4를 참조하여 본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치의 성능에 대하여 설명한다.Then, the performance of the box-level multi-object movement line prediction apparatus using only the box-level position information of the multi-objects according to a preferred embodiment of the present invention will be described with reference to FIG. 4 .

도 4는 본 발명의 바람직한 실시예에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치의 성능을 설명하기 위한 도면이다.4 is a diagram for explaining the performance of an apparatus for predicting a box-level multi-object motion using only box-level position information of multiple objects according to a preferred embodiment of the present invention.

본 발명에 따른 다중 객체의 박스-레벨 위치 정보만을 이용한 박스-레벨 다중 객체 동선 예측 장치의 성능을 실험하기 위해, 다중 객체들의 위치에 대한 박스 정보를 가지고 있는 MOT16 데이터 셋(Milan, A., et al. "MOT16: A benchmark for multi-object tracking. arXiv 2016." arXiv preprint arXiv:1603.00831 9. 참고)을 이용하여 실험을 진행하였다.In order to test the performance of the box-level multi-object motion prediction apparatus using only the box-level location information of multiple objects according to the present invention, the MOT16 data set (Milan, A., et al. al. "MOT16: A benchmark for multi-object tracking. arXiv 2016." arXiv preprint arXiv:1603.00831 9.) was used for the experiment.

도 4에 도시된 위의 세 열은 다중 객체 위치 예측 모델에 입력으로 들어간 프레임에 대한 다중 객체의 박스 정보를 모두 표현한 이미지이고, 도 4에 도시된 아래의 세 열은 다중 객체 위치 예측 모델로부터 출력으로 나온 프레임별 다중 객체의 박스 정보를 모두 표현한 이미지이다. 위아래 모두 왼쪽부터 시간순으로 표현되어 있다.The upper three columns shown in FIG. 4 are images expressing all the box information of multiple objects for a frame that entered the multi-object position prediction model as an input, and the lower three columns shown in FIG. 4 are output from the multi-object position prediction model. It is an image expressing all the box information of multiple objects per frame. Both top and bottom are expressed in chronological order from left to right.

정답 이미지는 예측된 박스 위치와 많이 중복되어 도 4에 도시하지 않았으나, 본 발명에 따라 예측된 박스의 위치가 정답 위치와 잘 맞는다는 것을 확인할 수 있었다. 또한, 해당 데이터는 많은 데이터 셋 중 가시성이 좋은 데이터 셋을 표현한 것인데 다른 영상의 경우 객체가 너무 많아서 박스를 도면으로 표시하지는 못하였지만, 이 경우에도 박스의 위치는 준수한 정확도를 보였으며 다른 객체에 의해 가려지는 상황에서도 꾸준한 정확도를 보이는 것을 확인할 수 있었다.Although the correct answer image overlaps the predicted box location a lot and is not shown in FIG. 4, it was confirmed that the predicted box location according to the present invention matches the correct answer location well. In addition, the data expressed the data set with good visibility among many data sets. In the case of other images, the box could not be displayed as a drawing because there were too many objects, but even in this case, the position of the box showed satisfactory accuracy and It was confirmed that steady accuracy was shown even in the obscured situation.

아울러, 아래의 [표 1]은 입력으로 사용되는 프레임의 수를 조절하면서 그 이후 세 프레임에 대한 다중 객체의 움직임을 예측했을 때, 이후 세 프레임에 대한 예측에 걸리는 시간을 측정한 것이다.In addition, [Table 1] below measures the time it takes to predict the next three frames when the motion of multiple objects is predicted for three frames thereafter while controlling the number of frames used as input.

방식(입력 프레임 개수)Method (number of input frames) 소요 시간(ms)Time required (ms) 2개2 0.0370.037 3개Three 0.0400.040 4개4 pieces 0.0540.054 5개5 pieces 0.0760.076

일반적으로 30 fps를 실시간 동작으로 이야기한다. 즉, 실시간 예측을 위해선 세 프레임에 대해 적어도 33ms 안에 연산이 이뤄져야 한다는 것이다. [표 1]의 값들은 대략 이 값의 1/1000에 해당한다. 따라서, 본 발명은 더 빠른 속도로 카메라 입력이 들어와도 다 처리할 수 있다. 본 발명은 이미지를 직접적으로 이용하는 종래 방식들에 비해 정확도는 다소 떨어질 수는 있지만, 더 순간적인 대응을 할 수 있다는 점에서, 실질적으로 예측 기술이 응용되는데에는 더 큰 역할을 할 수 있다.We usually talk about 30 fps as real-time motion. In other words, for real-time prediction, calculations must be made within at least 33ms for three frames. The values in [Table 1] correspond to approximately 1/1000 of this value. Therefore, according to the present invention, even if the camera input is received at a higher speed, it can be processed. In the present invention, although the accuracy may be somewhat lower than that of the conventional methods that directly use an image, in that it can respond more instantaneously, it can play a larger role in practically applying the prediction technology.

이상에서 설명한 본 발명의 실시예를 구성하는 모든 구성요소들이 하나로 결합하거나 결합하여 동작하는 것으로 기재되어 있다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 또한, 이와 같은 컴퓨터 프로그램은 USB 메모리, CD 디스크, 플래쉬 메모리 등과 같은 컴퓨터가 읽을 수 있는 기록 매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 기록 매체로서는 자기기록매체, 광 기록매체 등이 포함될 수 있다.Even though all the components constituting the embodiment of the present invention described above are described as being combined or operated in combination, the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining one or more. In addition, all of the components may be implemented as one independent hardware, but a part or all of each component is selectively combined to perform some or all of the functions of the combined hardware in one or a plurality of hardware program modules It may be implemented as a computer program having In addition, such a computer program is stored in a computer readable media such as a USB memory, a CD disk, a flash memory, etc., read and executed by a computer, thereby implementing an embodiment of the present invention. The recording medium of the computer program may include a magnetic recording medium, an optical recording medium, and the like.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. 따라서, 본 발명에 개시된 실시예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and those of ordinary skill in the art to which the present invention pertains may make various modifications, changes and substitutions within the scope without departing from the essential characteristics of the present invention. will be. Accordingly, the embodiments disclosed in the present invention and the accompanying drawings are for explaining, not limiting, the technical spirit of the present invention, and the scope of the technical spirit of the present invention is not limited by these embodiments and the accompanying drawings . The protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

100 : 박스-레벨 다중 객체 동선 예측 장치,
110 : 학습부,
130 : 예측부100: box-level multi-object movement prediction device,
110: learning department;
130: prediction unit

Claims

a learning unit for learning a multi-object location prediction model made of a long-term memory neural network (LSTM) using only box-level location information for multi-objects of the training data; and
Input the box-level position information of multiple objects for a preset number of consecutive past frames into the multi-object position prediction model, and for a preset number of consecutive future frames output from the multi-object position prediction model a prediction unit for predicting a future movement of multiple objects based on box-level location information of multiple objects;
A box-level multi-object movement line prediction apparatus using only box-level position information of multiple objects including

In claim 1,
The box-level object location information,
Consists of the coordinate values of the four corners of the bounding box surrounding the object,
A box-level multi-object motion prediction apparatus using only the box-level position information of multiple objects.

In claim 2,
The learning unit,
With respect to the training video included in the training data, the positions of multiple objects are filled in a matrix of a preset maximum number of objects x 8 using box-level position information of multiple objects with respect to a frame, and the rest of the matrix is padded with 0 to obtain a matrix for the frame, and to learn the multi-object position prediction model using the obtained matrix,
A box-level multi-object motion prediction apparatus using only the box-level position information of multiple objects.

In claim 2,
The prediction unit,
For each of a preset number of consecutive past frames, the positions of multiple objects are filled in a matrix of the preset maximum number of objects x 8 using box-level position information of multiple objects for the past frame, and the rest of the matrix is Obtaining a matrix for the past frame by padding with 0, inputting a plurality of matrices corresponding to each of a preset number of consecutive past frames to the multi-object location prediction model, and outputting a plurality of matrices from the multi-object location prediction model Obtaining box-level position information of multiple objects for a preset number of consecutive future frames using a matrix of
A box-level multi-object motion prediction apparatus using only the box-level position information of multiple objects.