KR20210065520A

KR20210065520A - System for Object Tracking Using Multiple Cameras in Edge Computing

Info

Publication number: KR20210065520A
Application number: KR1020190154361A
Authority: KR
Inventors: 이동만; 신병헌; 장시영
Original assignee: 한국과학기술원
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2021-06-04

Abstract

An objective of the present invention is to efficiently support an object tracking application using multiple cameras in an edge computing environment. According to one embodiment of the present invention, an object tracking system using multiple cameras in an edge computing environment comprises: multiple IoT cameras including one or more virtual containers (VCs) and a hypervisor performing a camera function and frame preprocessing; and an edge server including a processing device for video processing, multiple camera queues in which frames received from the multiple IoT cameras stand by, and one or more virtual container masters (VCMs), and performing a tracking job for each tracking application through the virtual container masters. The virtual containers use a target template provided from the virtual container masters to provide only frames including a tracking target among frames provided from the hypervisor for the edge server, and perform a hand-over operation of monitoring the movement of the target and handing over a tracking job to an adjacent IoT camera.

Description

{System for Object Tracking Using Multiple Cameras in Edge Computing}

본 출원은 에지 컴퓨팅 환경에서 복수의 카메라를 이용한 객체 추적 시스템에 관한 것이다.This application relates to an object tracking system using a plurality of cameras in an edge computing environment.

에지 컴퓨팅(Edge Computing)은 클라우드에 전적으로 의존하지 않고 가능한 한 클라이언트와 근접한 곳에서 컴퓨팅 자원을 활용하는 기술이다. 에지 컴퓨팅은 에지에서 네트워크 카메라의 데이터를 기반으로 실시간으로 물체 감지, 식별, 추적 등과 같은 다양한 비디오 분석 서비스를 제공한다. 다양한 유형의 비디오 분석 애플리케이션 중에서 객체 추적 애플리케이션은 실시간 비디오 스트림의 실시간 분석을 요구한다. 이를 위해서는 에지 서버는 에지 IoT 카메라가 공급하는 비디오 프레임에서 대상 물체를 실시간으로 추적해야 한다.Edge Computing is a technology that utilizes computing resources as close to the client as possible without relying entirely on the cloud. Edge computing provides a variety of video analytics services such as object detection, identification, and tracking in real time based on data from network cameras at the edge. Among the various types of video analytics applications, object tracking applications require real-time analysis of real-time video streams. To do this, the edge server must track the target in real-time in the video frame fed by the edge IoT camera.

이를 위한 종래의 접근 방식은 크게 서버 구동 방식과 카메라 구동 방식의 두 가지 방식으로 분류할 수 있다. Conventional approaches for this can be broadly classified into two types: a server driving method and a camera driving method.

서버 구동 방식에서 에지 서버는 모든 관련 카메라로부터 비디오 스트림을 동시에 수신하고 애플리케이션 요구에 맞게 비디오 프레임을 처리한다. 그러나 관련 카메라의 수가 증가함에 따라 많은 양의 쓸모없는 비디오 프레임, 즉 대상 객체를 포함하지 않는 비디오 프레임이 공급되고, 이에 따라 에지 서버가 미가공 비디오 프레임을 시간 내에 취급하고 처리하는 데 병목 현상이 발생한다. 보다 강력한 GPU 하드웨어를 사용하면 이 문제를 어느 정도 완화할 수 있지만 에지 서버의 확장성을 계속 방해하여 결국 서비스 품질(QoS)이 낮아질 수 있다. In a server-driven approach, the edge server simultaneously receives video streams from all relevant cameras and processes the video frames to suit application needs. However, as the number of cameras involved increases, it is fed a large amount of useless video frames, i.e. video frames that do not contain the target object, which creates a bottleneck for edge servers to handle and process raw video frames in time. . Using more powerful GPU hardware can mitigate this to some extent, but it can continue to hinder the scalability of edge servers, which in turn can result in lower quality of service (QoS).

한편, 카메라 구동 방식은 각 카메라가 비디오 스트림을 분석하고 쓸모없는 프레임을 폐기하여 에지 서버가 훨씬 적은 양의 프레임을 처리할 수 있도록 한다. 그러나 종래의 카메라 구동 방식은 복수의 카메라를 활용한 추적을 고려하지 않는다. 이를 위해서는 추적을 위해 잠재적으로 관련된 모든 카메라가 동시에 실행되어야 하므로 많은 양의 에너지 소비가 발생한다. 카메라가 배터리 전원으로 작동하는 경우 성능에 중요할 수 있다.On the other hand, the camera-driven approach allows each camera to analyze the video stream and discard useless frames, allowing the edge server to process a much smaller number of frames. However, the conventional camera driving method does not consider tracking using a plurality of cameras. This results in a large amount of energy consumption as all potentially relevant cameras for tracking must be running simultaneously. This can be critical to performance if the camera is running on battery power.

따라서, 당해 기술분야에서는 에지 컴퓨팅 환경에서 복수의 카메라를 이용한 객체 추적 애플리케이션을 효율적으로 지원하기 위한 방안이 요구되고 있다.Accordingly, there is a need in the art for a method for efficiently supporting an object tracking application using a plurality of cameras in an edge computing environment.

상기 과제를 해결하기 위해서, 본 발명의 일 실시예는 에지 컴퓨팅 환경에서 복수의 카메라를 이용한 객체 추적 시스템을 제공한다.In order to solve the above problems, an embodiment of the present invention provides an object tracking system using a plurality of cameras in an edge computing environment.

상기 에지 컴퓨팅 환경에서 복수의 카메라를 이용한 객체 추적 시스템은, 카메라 기능과 프레임 전처리를 담당하는 하이퍼바이저(Hypervisor) 및 하나 이상의 가상 컨테이너(VC; Virtual Container)를 포함하여 구성되는 복수의 IoT 카메라; 및 비디오 프로세싱을 위한 프로세싱 장치, 상기 복수의 IoT 카메라 각각으로부터 수신한 프레임이 대기하는 복수의 IoT 카메라 큐 및 하나 이상의 가상 컨테이너 마스터(VCM; Virtual Container Master)를 포함하여 구성되며, 상기 가상 컨테이너 마스터를 통해 각각의 추적 애플리케이션 별로 추적 작업을 수행하도록 하는 에지 서버를 포함하며, 상기 가상 컨테이너는 상기 가상 컨테이너 마스터로부터 제공된 대상 템플릿을 이용하여 상기 하이퍼바이저로부터 공급된 프레임 중 추적 대상을 포함하는 프레임만 상기 에지 서버로 제공하고, 상기 대상의 움직임을 모니터링하고 추적 작업을 인접 IoT 카메라로 넘겨주는 핸드오버 동작을 수행하는 것을 특징으로 한다.The object tracking system using a plurality of cameras in the edge computing environment includes: a plurality of IoT cameras configured including a hypervisor and one or more virtual containers (VC) in charge of camera functions and frame preprocessing; and a processing device for video processing, a plurality of IoT camera queues waiting for frames received from each of the plurality of IoT cameras, and one or more virtual container masters (VCMs), wherein the virtual container master and an edge server to perform a tracking operation for each tracking application through the virtual container, wherein the virtual container uses a target template provided from the virtual container master and only frames including a tracking target among frames supplied from the hypervisor are the edge It is provided to a server, monitors the movement of the target, and performs a handover operation of transferring a tracking operation to an adjacent IoT camera.

덧붙여 상기한 과제의 해결수단은, 본 발명의 특징을 모두 열거한 것이 아니다. 본 발명의 다양한 특징과 그에 따른 장점과 효과는 아래의 구체적인 실시형태를 참조하여 보다 상세하게 이해될 수 있을 것이다.Incidentally, the means for solving the above problems do not enumerate all the features of the present invention. Various features of the present invention and its advantages and effects may be understood in more detail with reference to the following specific embodiments.

본 발명의 일 실시예에 따르면, 에지 컴퓨팅 환경에서 복수의 카메라를 이용한 객체 추적 애플리케이션을 효율적으로 지원할 수 있다.According to an embodiment of the present invention, it is possible to efficiently support an object tracking application using a plurality of cameras in an edge computing environment.

도 1은 본 발명의 일 실시예에 따른 에지 컴퓨팅 환경에서 복수의 카메라를 이용한 객체 추적 시스템의 전체 구조를 도시하는 도면이다.
도 2는 대상의 속도와 인접 카메라의 시야에 다시 나타난 시간의 관계를 그래프로 도시하는 도면이다.
도 3은 본 발명의 일 실시예에 따른 카메라 간 핸드오버 절차를 도시하는 도면이다.
도 4는 복수의 카메라를 통해 관찰된 대상의 궤적을 도시하는 도면이다.
도 5는 에지 서버에서 실행 중인 애플리케이션의 수에 따른 평균 대기 시간을 도시하는 도면이다.
도 6은 에지 서버에서 실행 중인 애플리케이션의 수에 따른 애플리케이션 레벨 굿풋을 도시하는 도면이다.
도 7은 실행 중인 애플리케이션의 수에 따른 유용 및 무용한 프레임을 비교하는 도면이다.
도 8은 대상의 평균 이동 속도에 따른 애플리케이션 레벨 굿풋과 추가 대역폭 사용량을 도시하는 도면이다.
도 9는 카메라에서의 대기 시간을 비교한 도면이다.
도 10은 모든 IoT 카메라의 에너지 소비량을 비교한 도면이다.1 is a diagram illustrating the overall structure of an object tracking system using a plurality of cameras in an edge computing environment according to an embodiment of the present invention.
FIG. 2 is a graph showing the relationship between the speed of an object and the time it reappears in the field of view of an adjacent camera.
3 is a diagram illustrating a handover procedure between cameras according to an embodiment of the present invention.
4 is a diagram illustrating a trajectory of an object observed through a plurality of cameras.
5 is a diagram illustrating an average waiting time according to the number of applications running on an edge server.
6 is a diagram illustrating an application level goodput according to the number of applications running on an edge server.
7 is a diagram comparing useful and useless frames according to the number of running applications.
8 is a diagram illustrating an application-level good put and additional bandwidth usage according to an average moving speed of a target.
9 is a diagram comparing standby times in cameras.
10 is a diagram comparing energy consumption of all IoT cameras.

이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.Hereinafter, preferred embodiments will be described in detail so that those of ordinary skill in the art can easily practice the present invention with reference to the accompanying drawings. However, in describing the preferred embodiment of the present invention in detail, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the same reference numerals are used throughout the drawings for parts having similar functions and functions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 '연결'되어 있다고 할 때, 이는 '직접적으로 연결'되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 '간접적으로 연결'되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 '포함'한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.In addition, throughout the specification, when a part is 'connected' with another part, it is not only 'directly connected' but also 'indirectly connected' with another element interposed therebetween. include In addition, 'including' a certain component means that other components may be further included, rather than excluding other components, unless otherwise stated.

우선, 본 발명의 실시예를 설명하기에 앞서 복수의 카메라를 이용한 효율적인 객체 추적을 위해 고려되어야 할 사항을 설명한다.First, things to be considered for efficient object tracking using a plurality of cameras will be described before describing an embodiment of the present invention.

복수의 카메라를 이용한 객체 추적 시스템에서 카메라는 일반적으로 감시 영역의 물리적 구조에 따라 배치되는 것이 일반적이다. 즉, 대상의 이동 속도와 방향에 따라 특정 카메라에 나타나는 물체가 물리적 위치에 따라 이전 카메라와 관련된 다른 카메라의 시야에 다시 나타난다. 다시 말해, 공간에 설치된 카메라들 사이의 공간적 및 시간적 관계를 활용하는 것이 더 효율적이며, 모든 카메라 대신 관련 카메라만 활성화되도록 할 수 있다.In an object tracking system using a plurality of cameras, cameras are generally arranged according to the physical structure of a surveillance area. That is, depending on the speed and direction of the object's movement, an object that appears on one camera will reappear in the field of view of another camera relative to the previous camera, depending on its physical location. In other words, it is more efficient to utilize the spatial and temporal relationships between cameras installed in space, and only the relevant cameras can be activated instead of all cameras.

우선, 고정식 카메라의 경우 대상 물체가 카메라 시야 내부를 향한 방향을 미리 분석할 수 있다. 비디오 클러스터 전체에서 각 대상의 움직임을 분석하여 한 카메라에서 인접한 다른 카메라로 이동한 물체를 식별할 수 있다. 감시 카메라는 일반적으로 출구 및 다른 지점으로 이동하는 유일한 통로와 같은 중요한 지점에 설치되기 때문에 특정 지점에서 특정 방식을 따라 이동 방향을 기준으로 대상이 출현할 다음 지점을 추정할 수 있다. 즉, 모든 카메라가 비디오 프레임을 계속 모니터링하고 전송하지 않아도 된다. 경로에 의해 만들어진 물리적 제한에 따라 각 카메라는 건물 내 모든 카메라 중에서 일부 카메라 집합의 우선 순위를 지정할 수 있다. 따라서, 대상 물체의 출현을 예측하기 위해 공간적으로 상관된 카메라 집합을 고려함으로써, 카메라의 검색 공간을 감소시킬 수 있다. First, in the case of a stationary camera, the direction of the target object toward the inside of the camera field of view can be analyzed in advance. By analyzing the movement of each object throughout the video cluster, it is possible to identify objects moving from one camera to another adjacent camera. Surveillance cameras are usually installed at important points, such as exits and the only passage to other points, so from a certain point along a certain way, based on the direction of travel, it is possible to estimate the next point at which an object will appear. This means that all cameras do not have to constantly monitor and transmit video frames. Depending on the physical constraints created by the path, each camera may prioritize some set of cameras out of all cameras in the building. Therefore, by considering a spatially correlated set of cameras to predict the appearance of a target object, it is possible to reduce the search space of the camera.

상술한 바와 같이 공간적 위치를 기초로 대상이 이전 카메라의 시야에서 벗어난 후 대상이 감지되는 특정 카메라를 추정하고 나면, 다음으로 대상이 특정 카메라에 나타날 시간을 예측할 수 있다. 이는 대상의 이동 속도에 따라 다르다. 주어진 시간 동안 대상의 이전 위치와 현재 위치를 비교하여 주어진 시간 동안 대상 물체가 카메라에서 얼마나 멀리 이동했는지 평가할 수 있다. 현재 카메라의 시야에서 벗어난 후 객체가 동일한 속도로 이동한다고 가정하면 다음 카메라의 시야에서 객체가 나타날 시간을 오류 범위를 허용하여 예측할 수 있다.As described above, after estimating a specific camera in which an object is detected after the object deviates from the field of view of the previous camera based on the spatial location, it is possible to predict the time when the object will appear in the specific camera. It depends on the target's movement speed. By comparing the current position and the previous position of the object for a given time, it is possible to evaluate how far the target object has moved from the camera during the given time. Assuming that the object moves at the same speed after it leaves the current camera's field of view, the time the object will appear in the next camera's field of view can be predicted with a margin of error.

다시 말해, 공간적 지역성 및 시간적 지역성은 객체 추적 애플리케이션을 지원하기 위한 주요 단서로서, 복수의 카메라를 이용한 경우 효율적인 추적 애플리케이션을 구현하기 위해서는 현재 대상 물체를 호스팅하고 있는 카메라에서 대상 물체의 방향과 속도를 추정하고 다음 카메라로 제어를 넘겨주기 위한 효율적인 기술이 필요하다.In other words, spatial locality and temporal locality are key clues for supporting object tracking applications. In order to implement an efficient tracking application when multiple cameras are used, the direction and velocity of the target object are estimated from the camera currently hosting the target object. and efficient technique to pass control to the next camera.

한편, 각각의 추적 애플리케이션은 대상 템플릿, 비디오 프레임의 해상도, 프레임 속도 등을 포함하는 고유한 파라미터 집합을 가지므로, 각각의 추적 작업은 가상 컨테이너에서 생성된 독립적인 환경에서 실행된다. 이 경우, 카메라 간 핸드오버 성능은 하나의 카메라에서 다른 카메라로 해당 컨테이너를 이동하는 작업에 달려있다.On the other hand, since each tracking application has its own set of parameters including the target template, the resolution of the video frame, the frame rate, etc., each tracking task is executed in an independent environment created in a virtual container. In this case, camera-to-camera handover performance depends on moving that container from one camera to another.

추적 애플리케이션의 품질 요구 사항(즉, 애플리케이션 레벨 굿풋)을 보장하려면 컨테이너는 카메라 간의 이동이 실시간 추적 프로세스에 영향을 미치지 않도록 충분히 가벼워야 한다. 이를 위해서는 카메라 간의 핸드오버를 조정하고 목적지 카메라의 컨테이너를 적시에 부팅할 수 있어야 한다.To ensure the quality requirements of the tracking application (i.e., application-level goodput), the container must be light enough so that movement between cameras does not affect the real-time tracking process. This requires coordinating handovers between cameras and being able to boot the destination camera's container in a timely manner.

도 1은 본 발명의 일 실시예에 따른 에지 컴퓨팅 환경에서 복수의 카메라를 이용한 객체 추적 시스템의 전체 구조를 도시하는 도면이다.1 is a diagram illustrating the overall structure of an object tracking system using a plurality of cameras in an edge computing environment according to an embodiment of the present invention.

도 1은 참조하면, 본 발명의 일 실시예에 따른 에지 컴퓨팅 환경에서 복수의 카메라를 이용한 객체 추적 시스템은 에지 서버(110) 및 고정 설치된 복수의 IoT 카메라(120)를 포함하여 구성될 수 있다.Referring to FIG. 1 , an object tracking system using a plurality of cameras in an edge computing environment according to an embodiment of the present invention may include an edge server 110 and a plurality of fixedly installed IoT cameras 120 .

에지 서버(110)는 비디오 프로세싱을 위한 프로세싱 장치(즉, GPU Units), 복수의 IoT 카메라 큐 및 가상 컨테이너 마스터(VCM; Virtual Container Master)를 포함하여 구성될 수 있으며, 복수의 비디오 분석 애플리케이션(예를 들어, 객체 추적 애플리케이션)을 호스팅할 수 있다. 이 경우, 에지 서버(110)는 가상 컨테이너 마스터를 통해 각각의 애플리케이션 별로 작업을 수행하도록 할 수 있다.The edge server 110 may be configured to include a processing unit (ie, GPU Units) for video processing, a plurality of IoT camera queues, and a virtual container master (VCM), and a plurality of video analysis applications (eg, For example, an object tracking application). In this case, the edge server 110 may perform a task for each application through the virtual container master.

상술한 바와 같이 객체 추적 애플리케이션에서 대상을 찾고 추적하는데 공간적 지역성이 중요하므로 각각의 IoT 카메라로부터 수신한 프레임은 별도의 큐에서 대기하도록 구성될 수 있다.As described above, since spatial locality is important in finding and tracking a target in an object tracking application, a frame received from each IoT camera may be configured to wait in a separate queue.

가상 컨테이너 마스터는 IoT 카메라 큐들의 프레임을 폴링하고 애플리케이션 사용을 위해 프레임을 검증(Validation) 및 후처리(Post-processing) 할 수 있다. 또한, 가상 컨테이너 마스터는 사용자로부터 추적 대상을 포함하는 대상 템플릿을 수신하면 임의의 IoT 카메라를 선택하고 대상 템플릿을 이용하여 가상 컨테이너를 개시하여 대상을 검색하도록 한다.The virtual container master may poll the frames of IoT camera queues and validate and post-process the frames for application use. In addition, when the virtual container master receives the target template including the tracking target from the user, the virtual container master selects an arbitrary IoT camera and starts the virtual container using the target template to search for the target.

한편, 복수의 IoT 카메라(120)는 각각 하이퍼바이저(Hypervisor) 및 하나 이상의 가상 컨테이너(VC; Virtual Container)를 포함하여 구성될 수 있다.Meanwhile, each of the plurality of IoT cameras 120 may be configured to include a hypervisor and one or more virtual containers (VCs).

여기서, 하이퍼바이저는 물리적 카메라(physical camera)의 일반적인 기능과 프레임 전처리(예를 들어, 크기 또는 해상도 변경 등)를 담당할 수 있다.Here, the hypervisor may be in charge of general functions of a physical camera and frame preprocessing (eg, change of size or resolution, etc.).

가상 컨테이너는 각각의 추적 애플리케이션 별로 고유한 요구 사항을 관리하고 이에 따라 적절한 비디오 프레임을 에지 서버(110)로 제공할 수 있다. 각각의 가상 컨테이너는 대상 템플릿에 대해 하이퍼바이저로부터 공급된 프레임을 분석하고 다시 식별하며, 유용한 프레임만 에지 서버(110)로 제공할 수 있다. 또한, 가상 컨테이너는 대상의 움직임을 모니터링하고 추적 작업을 인접 IoT 카메라로 넘겨주는 핸드오버 동작을 수행할 수 있다.The virtual container may manage the unique requirements for each tracking application and provide appropriate video frames to the edge server 110 accordingly. Each virtual container may analyze and re-identify frames supplied from the hypervisor for a target template, and provide only useful frames to the edge server 110 . In addition, the virtual container can monitor the movement of the target and perform a handover operation of passing the tracking task to an adjacent IoT camera.

복수의 IoT 카메라(120)는 네트워크를 통해 에지 서버(110) 및 다른 카메라들과 연결되며 특정 수준의 비디오 프로세싱을 지원하기 위해 예를 들어 라즈베리 파이(Raspberry Pi)와 같은 컴퓨팅 엔진이 장착될 수 있다. A plurality of IoT cameras 120 are connected to the edge server 110 and other cameras through a network and may be equipped with a computing engine such as, for example, a Raspberry Pi to support a certain level of video processing. .

또한, 각각의 IoT 카메라(120)는 복수의 추적 작업을 수용할 수 있으며, 여기서 각각의 작업은 하나의 추적 애플리케이션 전용일 수 있다.Additionally, each IoT camera 120 may accommodate multiple tracking tasks, where each task may be dedicated to one tracking application.

보다 구체적으로 설명하면, IoT 카메라(120)에 복수의 추적 작업을 할당하기 위해 기존 가상 머신에 비해 경량으로 설계된 Docker 컨테이너를 사용하여 실제 카메라를 가상화하도록 구성될 수 있다.More specifically, in order to allocate a plurality of tracking tasks to the IoT camera 120 , it may be configured to virtualize a real camera using a Docker container designed to be lightweight compared to an existing virtual machine.

추적 애플리케이션에는 개체 감지, 재 식별, 추적 등을 포함한 유사한 비디오 분석 기능이 필요하므로, 예를 들어 OpenCV 등과 같은 소프트웨어 라이브러리가 사용될 수 있다.Since tracking applications require similar video analysis functions including object detection, re-identification, tracking, etc., a software library such as OpenCV, for example, can be used.

일단, 추적을 위한 Docker 컨테이너가 생성되면 나중에 다른 컨테이너가 이전에 다운로드한 라이브러리를 재사용하도록 구성될 수 있다. 이로써, 다음 컨테이너의 생성 속도가 빨라지며, 해당 공통 라이브러리의 실행 가능 부분은 격리를 위해 각 컨테이너의 메모리 공간에 복제될 수 있다. 또한, 후술하는 바와 같은 경량 컨테이너화 기술을 사용하여 격리를 유지하면서도 각 추적 컨테이너에서 중복 및 변경되지 않은 부분을 줄일 필요가 있다.Once a Docker container for tracking is created, it can later be configured to reuse previously downloaded libraries by other containers. This speeds up the creation of the next container, and the executable portion of that common library can be copied to the memory space of each container for isolation. In addition, there is a need to reduce duplicates and unaltered portions in each tracking container while maintaining isolation using lightweight containerization techniques as described below.

비디오 분석에 일반적으로 사용되는 소프트웨어 라이브러리를 유지하기 위해 하이퍼바이저는 공통 라이브러리 실행 관리자(Common Library Execution Manager)를 포함할 수 있다. 공통 라이브러리 실행 관리자(Common Library Execution Manager)는 공통 라이브러리 스토리지(CLS; Common Library Storage)라는 공유 스토리지를 가질 수 있다. To maintain software libraries commonly used for video analysis, the hypervisor may include a Common Library Execution Manager. The Common Library Execution Manager may have a shared storage called Common Library Storage (CLS).

임의의 추적 애플리케이션을 위해 생성된 가상 컨테이너는 CLI(command-line interface)를 통해 액세스할 수 있도록 CLS 위치를 마운트하고, 프로그래밍 언어로 패키지 가져오기를 수행할 수 있다. 명령을 실행하는 경우 가상 컨테이너의 가상 카메라 드라이버(Virtual Camera Driver)는 명령을 직접 실행하는 대신 공통 라이브러리 실행 관리자로 명령을 보낸다. A virtual container created for any tracing application can mount the CLS location for access via a command-line interface (CLI) and perform package imports in a programming language. When executing a command, the Virtual Camera Driver in the virtual container sends the command to the Common Library Execution Manager instead of directly executing the command.

마찬가지로, 동일한 라이브러리를 요청하는 복수의 가상 컨테이너가 동일한 비디오 스트림으로 실행하는 경우 공통 라이브러리 실행 관리자는 명령을 한 번만 실행하고 결과를 각 가상 컨테이너에 복사할 수 있다.Similarly, if multiple virtual containers requesting the same library are running with the same video stream, the common library execution manager can execute the command only once and copy the results to each virtual container.

시작하는 동안, IoT 카메라(120)는 에지 서버(110)에 조인 메시지를 보내 에지 네트워크에 조인할 수 있다. 이후, 에지 서버(110)는 다른 이미 연결된 IoT 카메라(120)에서 최근에 사용하는 공통 라이브러리 목록을 IoT 카메라(120)에 응답할 수 있다. 이전에 연결된 IoT 카메라(120)가 없는 경우 에지 서버(110)는 기본 비디오 분석 라이브러리 세트를 IoT 카메라(120)에 제공할 수 있다.During startup, the IoT camera 120 may send a join message to the edge server 110 to join the edge network. Thereafter, the edge server 110 may respond to the IoT camera 120 with a list of common libraries recently used by other already connected IoT cameras 120 . If there is no previously connected IoT camera 120 , the edge server 110 may provide a basic video analysis library set to the IoT camera 120 .

한편, 에지 서버(110)의 요청에 따라 IoT 카메라(120)에서 새로운 가상 컨테이너가 생성되면, 가상 컨테이너는 제공된 대상 템플릿과 일치하는 객체를 감지하고 감지된 개체를 추적하기 시작한다. Meanwhile, when a new virtual container is created in the IoT camera 120 according to the request of the edge server 110 , the virtual container detects an object matching the provided target template and starts tracking the detected object.

먼저, IoT 카메라(120) 간의 시공간 관계를 분석하여 대상이 언제 그리고 어느 방향으로 이동하는지 판단할 수 있다. 이를 위해, IoT 카메라(120)는 각 대상에 대해 카메라의 시야에서 대상이 사라지는 타임 스탬프, 인접 카메라에 대상이 다시 나타나는 다음 타임 스탬프, 사라질 때까지의 대상의 평균 이동 속도 및 목표의 이동 방향을 포함하는 궤적 데이터를 기록할 수 있다.First, it is possible to determine when and in which direction the object moves by analyzing the spatiotemporal relationship between the IoT cameras 120 . To this end, the IoT camera 120 includes, for each object, a timestamp when the object disappears from the camera's field of view, the next timestamp when the object reappears in an adjacent camera, the average movement speed of the object until it disappears, and the direction of movement of the object. trajectory data can be recorded.

이에 따라, IoT 카메라(120)는 기록된 궤적 데이터로부터 대상이 어느 인접 카메라에 다시 나타나는지 추정할 수 있다. 대상의 이동 방향은 북쪽(N), 북동쪽(NE), 동쪽(E), 남동쪽(SE), 남쪽(S), 남서쪽(SW), 서쪽(W) 및 북서쪽(NW)의 값 중 하나로 표시될 수 있다. Accordingly, the IoT camera 120 may estimate from which adjacent camera the object reappears from the recorded trajectory data. The direction of movement of the target will be displayed as one of the following values: North (N), Northeast (NE), East (E), Southeast (SE), South (S), Southwest (SW), West (W), and Northwest (NW). can

대상이 하나의 카메라의 시야에서 사라질 때, 인접 카메라에서 다시 나타나는 시간을 추정하는 수학식 1을 선형 회귀 모델을 적용하여 도출할 수 있다. 여기서, s_xy는 대상의 속도로, 단위 시간에 이동한 거리를 나타내고, α, β, ε는 기 설정된 변수이다. 그러나, 각 카메라의 시야에서 대상의 좌표 및 이동 방향에 대한 사전 지식 없이는 대상이 이동할 카메라를 보장할 수 없다. 따라서, 각 IoT 카메라(120)는 각 대상 움직임의 아웃 바운드 영역을 기록한다. 즉, 각 IoT 카메라(120)는 상술한 두 가지 정보를 기초로 인접 카메라에서의 대상 재현을 예측할 수 있다.When an object disappears from the field of view of one camera, Equation 1 for estimating the time to reappear in an adjacent camera may be derived by applying a linear regression model. Here, s _xy is the speed of the object and represents the distance moved per unit time, and α, β, and ε are preset variables. However, without prior knowledge of the coordinates and movement direction of the object in the field of view of each camera, it is impossible to guarantee the camera to which the object will move. Therefore, each IoT camera 120 records the outbound area of each object movement. That is, each IoT camera 120 may predict the reproduction of an object in a neighboring camera based on the above two pieces of information.

[수학식 1] [Equation 1]

exp = α + βs_xy + εexp = α + βs _xy + ε

이와 같이 획득한 모델은 각 IoT 카메라(120)의 카메라 맵(camera map)에 저장되며, 시공간 객체 추적 및 핸드오버 결정의 기초로 사용될 수 있다. 도 2는 대상의 속도와 인접 카메라의 시야에 다시 나타난 시간의 관계를 그래프로 도시하는 도면이다.The obtained model is stored in a camera map of each IoT camera 120 and can be used as a basis for tracking spatiotemporal objects and determining handover. FIG. 2 is a graph showing the relationship between the speed of an object and the time it reappears in the field of view of an adjacent camera.

하기의 알고리즘 1은 가상 컨테이너에서 대상을 식별하고 이의 이동 속도와 방향을 추정하는 것을 포함하는 추적 작업을 나타낸 것이다.Algorithm 1 below shows a tracking operation including identifying a target in a virtual container and estimating its moving speed and direction.

우선, IoT 카메라는 예를 들어 신경망 기반의 객체 감지를 실행하여 특징 객체를 추출하여 현재 프레임에서 객체를 감지할 수 있다. 모든 특징 클래스 중에서 IoT 카메라는 추적 대상 클래스와 일치하는 특징 클래스를 제외하고 관련이 없는 특징 클래스를 제거할 수 있다. 가상 컨테이너의 감지 및 재식별(Detection & Re-identification) 프로세스는 감지된 객체 클래스의 모든 잠재적 후보를 대상 템플릿과 비교할 수 있다. 시간 t에서, 카메라 i의 각 후보 좌표 k와 대상 템플릿은 예를 들어 유클리드 거리와 같은 유사성을 계산하기 위해 벡터로 변환될 수 있다. 또한, 대상 템플릿과 가장 유사한 후보 목록을 순서대로 출력할 수 있다. 이와 같은 프로세스는 알고리즘 1의 1~11행으로 표현될 수 있다.First, the IoT camera may detect an object in a current frame by extracting a feature object by executing, for example, neural network-based object detection. Among all the feature classes, the IoT camera can remove irrelevant feature classes except for the feature class that matches the class to be tracked. The detection & re-identification process of a virtual container can compare all potential candidates of the detected object class to the target template. At time t, each candidate coordinate k of camera i and the target template can be transformed into vectors to calculate similarities, e.g. Euclidean distance. In addition, the candidate list most similar to the target template can be output in order. Such a process can be expressed in lines 1 to 11 of Algorithm 1.

프레임에 대상이 있으면, 추가 추적을 위해 대상의 좌표가 가상 컨테이너의 추적기(Tracker)에 입력될 수 있다. 추적기에서는 다음 프레임으로 프레임의 좌표를 추적하는 상관 기반 추적 알고리즘을 구현할 수 있다. 이를 위해, 추적기는 우선 새로 찾은 대상의 새 좌표 위치를 초기화할 수 있다. 이후, 대상이 여전히 다음 프레임에 있는지 확인할 수 있다. 이전에 등록된 대상의 좌표가 더 이상 기 정해진 시간 동안 다음의 기 정의된 수의 프레임과 상관 관계가 없는 경우 대상은 누락된 것으로 간주될 수 있다. 트래커는 이와 같이 동작함으로써 기본 카메라의 시야에 새로 들어오는 객체를 식별할 수 있다. 이와 같은 프로세스는 알고리즘 1의 12~33행으로 표현될 수 있다. 또한, 이와 같이 기 정의된 수의 프레임에 대해서 재식별을 수행함으로써 처리 대기 시간을 단축시킬 수 있다.If there is an object in the frame, the object's coordinates can be entered into the virtual container's Tracker for further tracking. The tracker can implement a correlation-based tracking algorithm that tracks the coordinates of a frame to the next frame. To this end, the tracker may first initialize a new coordinate position of the newly found object. After that, you can check if the object is still in the next frame. An object may be considered missing if the coordinates of a previously registered object are no longer correlated with the next predefined number of frames for a predetermined time. By operating in this way, the tracker can identify objects that are new to the primary camera's field of view. Such a process can be expressed in lines 12 to 33 of Algorithm 1. In addition, by performing re-identification of the predefined number of frames as described above, the processing waiting time can be shortened.

이와 같이 대상을 추적한 후에는 인접 카메라로의 적절한 핸드오버를 결정하기 위해 하기와 같은 3가지가 실시간으로 모니터링되어야 한다.After the target is tracked in this way, the following three things must be monitored in real time in order to determine an appropriate handover to the adjacent camera.

우선, 대상이 인접 카메라에 다시 나타날 시간을 추정한다. 알고리즘 1의 23~24행에 표현된 바와 같이 경과 시간에 의해 최근 좌표 위치를 계산함으로써 대상의 이동 속도를 추정할 수 있다. 이를 기초로, 인접 카메라에서의 대상의 예상 도착 시간(exp)은 상술한 수학식 1에 따라 추정할 수 있다.First, we estimate the time it will take for the object to appear again in an adjacent camera. As expressed in lines 23 to 24 of Algorithm 1, the moving speed of the object can be estimated by calculating the latest coordinate position by the elapsed time. Based on this, the expected arrival time (exp) of the object in the adjacent camera can be estimated according to Equation 1 described above.

다음으로, 대상이 현재 카메라의 시야에서 벗어날 시점을 예측하는 것은 현재 실행 중인 작업을 중지하기 위해 중요하다. 예를 들어 추측항법(dead reckoning)과 같은 공지의 위치 예측 방법과 유사하게, 가상 컨테이너의 트래커는 알고리즘 1의 25행에 표현된 바와 같이 프레임 수(f)에 현재 이동 속도를 곱하여 대상의 미래 위치를 예측할 수 있다. 여기서, 대상의 이동 속도를 고려하여 f가 적절하게 선택되어야 한다.Next, predicting when the subject will currently leave the camera's field of view is important to stop the currently running task. Similar to known methods of position prediction, for example dead reckoning, the tracker of a virtual container multiplies the number of frames (f) by the current movement speed as expressed in line 25 of Algorithm 1 to determine the future position of the object. can be predicted Here, f should be appropriately selected in consideration of the moving speed of the object.

마지막으로, 어떤 카메라가 준비되어야 하는지 결정하기 위해서 대상이 향하는 방향을 추정해야 한다. 알고리즘 1의 26행에 표현된 바와 같이 북쪽을 0도로 설정하고 현재 좌표와 이전 좌표의 각도를 계산하여 대상의 이동 방향을 추정할 수 있다.Finally, we need to estimate the direction the object is facing to determine which camera should be ready. As expressed in line 26 of Algorithm 1, the direction of movement of the object can be estimated by setting the north to 0 degrees and calculating the angle between the current coordinate and the previous coordinate.

도 3은 본 발명의 일 실시예에 따른 카메라 간 핸드오버 절차를 도시하는 도면으로, 예를 들어 카메라 i에서 카메라 j로의 핸드오버 절차를 도시한다.3 is a diagram illustrating a handover procedure between cameras according to an embodiment of the present invention, for example, a handover procedure from camera i to camera j.

또한, 하기의 알고리즘 2는 IoT 카메라에서 핸드오버 요청을 취급하는 과정을 나타낸 것이다.In addition, the following algorithm 2 shows a process of handling a handover request in the IoT camera.

도 3을 참조하면, 우선, 가상 컨테이너의 트래커가 대상이 시야에서 벗어나는 것을 확인하면, 하이퍼바이저의 핸드오버 관리자(Handover Manager)에게 인접 IoT 카메라로 핸드오버 요청 메시지를 보내도록 요청할 수 있다.Referring to FIG. 3 , first, if the tracker of the virtual container confirms that the target is out of view, it may request a handover manager of the hypervisor to send a handover request message to an adjacent IoT camera.

(1) 시간 t_trigger에서 카메라 i의 가상 컨테이너는 프레임에서 대상이 사라지는 것을 감지할 수 있다(알고리즘 1의 27행).(1) At time t _trigger , the virtual container of camera i can detect the disappearance of the object from the frame (line 27 of Algorithm 1).

(2) 카메라 i의 가상 컨테이너는 하이퍼바이저의 핸드오버 관리자에게 a) 메시지 유형, b) 대상의 예상 도착 시간(exp), c) 카메라 i에서 캡처한 대상의 템플릿을 포함하는 핸드오버 메시지를 카메라 j로 보내도록 요청할 수 있다(알고리즘 1의 30행).(2) Camera i's virtual container sends a handover message to the hypervisor's handover manager with a handover message containing a) the message type, b) the target's expected arrival time (exp), and c) the target's template captured by camera i. You can ask it to be sent to j (line 30 of Algorithm 1).

한편, 카메라 j에서 핸드오버 메시지를 수신하면, 후술하는 바와 같이 나머지 핸드오버 절차를 수행할 수 있다.Meanwhile, when the handover message is received from camera j, the remaining handover procedure may be performed as described below.

(1) 시간 t_{trigger+transmission}에서 카메라 j가 메시지를 처리할 수 있다. t_exp가 t_cur보다 짧은 경우 대상을 추적하기 위해 새로운 VC(app1, task2)를 로드하여야 한다(알고리즘 2의 3~4행).(1) At time t _{trigger+transmission} , camera j can process the message. If t _exp is _{shorter than t cur} , a new VC(app1, task2) must be loaded to track the target (lines 3 and 4 of Algorithm 2).

(2) exp는 대상의 정확한 도착 시간을 항상 예측할 수는 없다. 대상이 카메라 i의 시야에서 벗어나면, 카메라 j에서 새로운 exp값이 업데이트된다. 카메라 j는 컨테이너를 예약하기 위해 이전 t_exp값을 평균하여 t_exp를 조정할 수 있다. (알고리즘 2의 6~8행)(2) exp cannot always predict the exact arrival time of an object. When the object is out of view of camera i, the new exp value is updated in camera j. Camera j can adjust _{t exp} by averaging the _{previous t exp} values to reserve the container. (Lines 6-8 of Algorithm 2)

(3) 시간 t_adj에서 카메라 j는 애플리케이션을 지원하기 위해 해당 유형 및 파라미터를 가진 가상 컨테이너를 개시할 수 있다. 카메라 j에서 가상 컨테이너가 실행되면 카메라 i에서 가상 컨테이너를 언로드하라는 Ack 메시지가 리턴될 수 있다.(3) At time t _adj , camera j may launch a virtual container with its type and parameters to support the application. When the virtual container is launched on camera j, an Ack message to unload the virtual container from camera i may be returned.

(4) 그러나, 대상이 주어진 오프셋 시간에 카메라 j에 도달하지 않으면, 카메라 j는 대상 시야에서 사라졌다고 간주하고 카메라 i 및 에지 서버의 가상 컨테이너 마스터에 사라짐 메시지를 보낸다. 이에 따라, 애플리케이션 사용자가 애플리케이션을 다시 시작할지 결정하도록 할 수 있다.(4) However, if the object does not reach camera j at the given offset time, camera j considers the object to have disappeared from view and sends a disappear message to camera i and the virtual container master of the edge server. Accordingly, it is possible to allow the application user to decide whether to restart the application.

(5) 상술한 과정에 따라 카메라 j가 추적을 시작하고 카메라 i에서 카메라 j 로의 핸드오버 절차가 완료된다. (5) According to the above-described process, camera j starts tracking and the handover procedure from camera i to camera j is completed.

상술한 바와 같은 본 발명의 실시예의 성능을 평가하기 위해, 후술하는 바와 같은 실험을 진행하였다. In order to evaluate the performance of the embodiment of the present invention as described above, an experiment as described below was conducted.

건물 복도에 3 개의 IoT 카메라를 설치하고 대상 물체가 이 카메라를 따라 움직이는 동안 480x640 픽셀의 해상도로 60 초 길이의 비디오를 녹화하였다. 도 4는 복수의 카메라를 통해 관찰된 대상의 궤적을 도시하는 도면으로, 구체적으로 카메라 1, 2 및 3을 따라 이동하는 대상 물체의 궤적을 보여준다. 또한, 시간적 지역성 측면에서 시스템을 평가하기 위해 두 개의 카메라(카메라 1 및 3)를 사용하여 두 번째 비디오를 녹화하였다. 여기서 추적 대상이 되는 대상 물체는 특정 사람이다.Three IoT cameras were installed in the hallway of the building and a 60-second video was recorded at a resolution of 480x640 pixels while the target moved along these cameras. 4 is a diagram illustrating a trajectory of an object observed through a plurality of cameras, and specifically, a trajectory of a target object moving along cameras 1, 2, and 3 is shown. In addition, a second video was recorded using two cameras (cameras 1 and 3) to evaluate the system in terms of temporal locality. Here, the target object to be tracked is a specific person.

에지 서버에는 원격 데이터 센터 또는 클라우드에 비해 다소 제한된 양의 컴퓨팅 자원이 장착되어 있다고 가정한다. 이 실험에서 에지 서버에는 2 개의 Intel® Xeon E5-2630v4 CPU (총 20 코어, 각각 2.2GHz), Nvidia GeForce 1080Ti GPU 및 128GB 메모리 공간(RAM)이 장착되어 있다. Amazon Deeplens 또는 Nvidia Jetson TX와 같은 최근 개발된 스마트 카메라에는 2 개의 코어, 8G 메모리 및 내장 가속기가 장착되어 있다. IoT 카메라 설정과 마찬가지로 2.4GHz 클럭 속도, 4GB 메모리 공간 및 Intel Movidius 신경 컴퓨팅 스틱을 갖춘 Intel (R) Pentium (R) 2020M CPU가 장착된 베어 본 머신을 사용하여 검출 단계를 가속화하였다.It is assumed that the edge server is equipped with a somewhat limited amount of computing resources compared to a remote data center or cloud. In this experiment, the edge server is equipped with two Intel® Xeon E5-2630v4 CPUs (20 cores total, 2.2GHz each), an Nvidia GeForce 1080Ti GPU, and 128GB of memory space (RAM). Recently developed smart cameras such as the Amazon Deeplens or Nvidia Jetson TX are equipped with two cores, 8G memory and built-in accelerators. Similar to the IoT camera setup, a barebones machine with an Intel(R) Pentium(R) 2020M CPU with a 2.4GHz clock speed, 4GB memory footprint and Intel Movidius neural computing stick was used to accelerate the detection phase.

동기화를 위해 모든 실험이 시작될 때 모든 IoT 카메라를 초기화하여 에지 서버로 시간을 보정한다. OpenCV를 사용하여 Python에서 비디오 프로세싱 작업을 포함한 샘플 추적 애플리케이션 및 해당 작업을 구현한다. 물체 감지 모델의 경우 MobileNet과 함께 SSD(Single Shot Multibox Detector)를 사용한다. MOSSE 트래커는 프레임 크기, 조명, 스케일의 변화에 강하고 컴퓨팅 기능이 낮은 기기(IoT 카메라에서 초당 약 80 프레임)에서도 높은 속도로 프레임을 처리하기 때문에 강력하다.For synchronization, all IoT cameras are initialized at the start of every experiment to calibrate the time with an edge server. Implement a sample tracking application and its operations in Python, including video processing operations, using OpenCV. For the object detection model, we use a Single Shot Multibox Detector (SSD) with MobileNet. MOSSE trackers are powerful because they are resistant to changes in frame size, lighting, and scale, and because they process frames at high rates, even on devices with low computing power (about 80 frames per second in IoT cameras).

성능의 검증을 위해 후술하는 평가 지표가 사용될 수 있다.An evaluation index, which will be described later, may be used to verify the performance.

1) 애플리케이션 레벨 굿풋(Application level goodput)(%)1) Application level goodput (%)

추적 애플리케이션이 사용자에게 올바른 추적 결과를 제 시간에 제공할 수 있으려면 에지 클라우드에서의 추적 동작이 지정된 이벤트 발생 시간 내에 처리되어야 한다. 따라서, 실제 이벤트가 발생한 후 주어진 시간 내에 처리되는 프레임 수로 애플리케이션 레벨 굿풋을 정의할 수 있다. 이 실험에서는 만료 시간을 각각 0.5 초와 1 초로 설정하였다. 에지 서버의 큐에서 프레임이 유지되는 기간을 측정한다. 프레임에 오래된 타임 스탬프가 포함되어 있어도 큐에서 프레임이 삭제되지 않는다. In order for the tracking application to be able to provide users with the correct tracking results in a timely manner, tracking actions in the edge cloud must be processed within the specified event occurrence time. Therefore, application-level goodput can be defined as the number of frames processed within a given time after an actual event occurs. In this experiment, the expiration times were set to 0.5 sec and 1 sec, respectively. Measures how long a frame is held in the edge server's queue. Frames are not deleted from the queue even if they contain outdated timestamps.

2) 총 대역폭 사용량(Total bandwidth usage)(%) 2) Total bandwidth usage (%)

애플리케이션 요구 사항을 지원하는 주요 동인 중 하나는 IoT 카메라에서 전송되는 쓸모없는 비디오 프레임을 얼마나 줄일 수 있는지 이다. 카메라에서 에지 서버로 전송된 프레임 수를 수량화하기 위해 각 실험마다 서버로 전송되는 바이트 수를 측정한다. 모든 프레임은 압축되지 않으며 각 프레임의 크기는 약 900kB이다.One of the main drivers supporting application requirements is how much useless video frames sent from IoT cameras can be reduced. To quantify the number of frames sent from the camera to the edge server, we measure the number of bytes sent to the server for each experiment. All frames are uncompressed and each frame is about 900kB in size.

3) 처리 지연(Processing delay)(초)3) Processing delay (sec)

본 발명의 실시예에서는 IoT 카메라의 애플리케이션 작업을 컨테이너화한다. IoT 카메라의 처리 지연을 정량화하기 위해 프로세스의 각 단계를 초 단위로 측정한다.In an embodiment of the present invention, the application task of the IoT camera is containerized. To quantify the processing latency of an IoT camera, each step of the process is measured in seconds.

4) 에너지 소비(Energy consumption)(%)4) Energy consumption (%)

에너지 소비 관점에서 각 IoT 카메라에서 애플리케이션 작업을 실행하기 위한 작업량을 측정한다. 모든 IoT 카메라에 동일한 하드웨어가 장착되어 있다고 가정하면 애플리케이션의 수에 따라 Joule (J)의 전력 소비량을 측정한다.In terms of energy consumption, we measure the amount of work required to run application tasks on each IoT camera. Assuming that all IoT cameras are equipped with the same hardware, we measure the power consumption of Joule (J) according to the number of applications.

또한, 상술한 본 발명의 실시예를 후술하는 종래 기술과 비교한다.In addition, the above-described embodiment of the present invention is compared with the prior art described below.

1) 종래 기술 1 (bl1: 모든 프레임 스트리밍) 1) Prior art 1 (bl1: streaming all frames)

각 IoT 카메라에서 모든 비디오 프레임이 에지 서버로 맹목적으로 전송된다. 에지 서버에서, 각 카메라 큐에서 전송된 모든 프레임은 재식별 단계를 거쳐 라운드 로빈 방식으로 각 비디오 스트림에서 대상 객체를 찾는다. 객체가 발견되면 특정 카메라 큐의 프레임이 계속 처리되는 반면 다른 큐의 비디오 프레임은 추적 대상이 없으므로 추가 처리를 위해 무시된다. 종래 기술 1은 전체 비디오 분석 작업이 에지 서버에서 수행되는 클라우드 기반 비디오 처리 방식에 해당한다.Every video frame from each IoT camera is blindly sent to an edge server. At the edge server, every frame transmitted from each camera queue goes through a re-identification step to find the target object in each video stream in a round-robin fashion. When an object is found, frames from one camera queue continue to be processed, while video frames from other queues are ignored for further processing as they have nothing to track. Prior art 1 corresponds to a cloud-based video processing method in which the entire video analysis operation is performed in an edge server.

2) 종래 기술 2 (bl2: 카메라에서 물체 감지, 에지 서버에서 추적) 2) Prior art 2 (bl2: object detection in camera, tracking in edge server)

각 IoT 카메라는 독립적으로 비디오 스트림을 검사하여 감지할 물체의 정보를 기반으로 쓸모없는 프레임을 폐기한다. 애플리케이션을 실행하기 전에 에지 서버는 사전 훈련된 추론 모델을 각 IoT 카메라로 전송한다. IoT 카메라는 이 모델을 사용하여 주어진 프레임에서 특정 물체를 감지하려고 한다. 해당 물체가 포함되어 있으면 프레임이 서버로 전송되고, 그렇지 않으면 프레임이 삭제된다. 여러 애플리케이션이 동시에 실행되고 다른 유형의 물체를 감지하도록 요청함에 따라 에지 서버는 다른 모든 유형의 물체를 포괄하도록 추론 모델을 업데이트한다. 각 애플리케이션에 대해 에지 IoT 카메라는 비디오 프레임을 처리하고 각 프레임을 폐기 또는 전송하기로 결정하는 컨테이너를 실행한다. 그러나 대상 물체가 움직일 때 카메라 간에 핸드오버는 지원하지 않는다.Each IoT camera independently inspects the video stream and discards useless frames based on the information of the object to be detected. Before running the application, the edge server sends a pre-trained inference model to each IoT camera. IoT cameras use this model to try to detect specific objects in a given frame. If the object is included, the frame is sent to the server, otherwise the frame is dropped. As multiple applications run concurrently and request to detect different types of objects, the edge server updates its inference model to cover all other types of objects. For each application, the edge IoT camera runs a container that processes video frames and decides to discard or transmit each frame. However, handover between cameras is not supported when the target is moving.

성능 평가를 위해, 본 실험에서는 동시에 실행되는 애플리케이션의 수를 변경하면서 에지 서버의 애플리케이션 레벨 굿풋을 비교한다. 먼저, 대기 시간, 각 유용한 프레임, 즉 실제로 추적에 사용되는 프레임이 얼마나 오래 애플리케이션에 전달되기 전에 큐에 남아 있는지를 분석한다. For performance evaluation, this experiment compares the application-level goodput of the edge server while changing the number of concurrently running applications. First, we analyze the latency, how long each useful frame, i.e. the frame actually used for tracking, remains in the queue before being delivered to the application.

도 5는 에지 서버에서 실행 중인 애플리케이션의 수에 따른 평균 대기 시간을 도시하는 도면으로, 본 발명의 실시예와 상술한 종래 기술 1 및 2에서 유용한 프레임의 대기 시간을 보여준다. 여기서, 에지 서버의 각 애플리케이션은 동일한 전송 속도로 두 대의 카메라에서 프레임을 수신한다고 가정한다.FIG. 5 is a diagram showing the average waiting time according to the number of applications running on the edge server, and shows the waiting time of a frame useful in the embodiment of the present invention and the prior arts 1 and 2 described above. Here, it is assumed that each application of the edge server receives frames from two cameras at the same transmission rate.

종래 기술 1에 따르면, 모든 IoT 카메라는 필터링없이 에지 서버로 프레임을 공급하여 이미지 처리를 위해 에지 서버에서 대기하는 프레임의 양을 급증시킨다. 애플리케이션이 하나일 경우 프레임 전달의 평균 대기 시간은 약 690ms이고, 애플리케이션 수가 증가함에 따라 지연 시간은 약 5700ms까지 크게 증가한다. According to the prior art 1, all IoT cameras supply frames to the edge server without filtering, thereby increasing the amount of frames waiting in the edge server for image processing. If there is one application, the average latency of frame delivery is about 690 ms, and as the number of applications increases, the latency increases significantly to about 5700 ms.

또한, 종래 기술 2에 따르면, 각각의 IoT 카메라는 에지 서버에서 수신하는 총 프레임 수를 감소시키기 위해 프레임 전송을 자율적으로 시작 및 중지한다. 이에 따라, 에지 서버의 큐 길이가 짧아져 애플리케이션이 하나일 경우 약 350ms의 대기 시간이 나타낸다. 또한, 애플리케이션의 수가 증가하더라도 에지 서버가 정체되지 않고 대기 시간이 변경되지 않는다. Further, according to prior art 2, each IoT camera autonomously starts and stops frame transmission in order to reduce the total number of frames received by the edge server. As a result, the queue length of the edge server is shortened, resulting in a waiting time of about 350 ms when there is only one application. In addition, edge servers are not congested and latency does not change as the number of applications increases.

반면, 본 발명의 실시예에 따르면, 애플리케이션이 하나일 경우 평균 대기 시간은 약 313ms로 종래 기술 2에 비해 약간 작다. 이는 본 발명의 실시예에 따르면 카메라 간 핸드오버가 지원되므로, 각 IoT 카메라가 작업을 다음 IoT 카메라로 핸드오버할 시점을 사전에 예측하여 특정 객체를 계속 추적할 수 있도록 하기 때문이다. 또한, 에지 서버가 제공하는 애플리케이션 수가 증가하는 경우에도 여러 큐를 통한 전환에 상당한 오버 헤드를 발생시키지 않으며, 이에 따라 제 시간에 애플리케이션에 프레임을 전달할 수 있다.On the other hand, according to the embodiment of the present invention, when there is one application, the average waiting time is about 313 ms, which is slightly smaller than that of the prior art 2. This is because, according to the embodiment of the present invention, handover between cameras is supported, so that a specific object can be continuously tracked by predicting in advance when each IoT camera will handover a task to the next IoT camera. Moreover, even when the number of applications served by the edge server increases, switching through multiple queues does not incur significant overhead, thus allowing frames to be delivered to applications on time.

도 6은 에지 서버에서 실행 중인 애플리케이션의 수에 따른 애플리케이션 레벨 굿풋을 도시하는 도면이다.6 is a diagram illustrating an application level goodput according to the number of applications running on an edge server.

도 6에서 알 수 있는 바와 같이, 본 발명의 일 실시예에 따르면, t1 이전에 모든 프레임을 처리하고, 약 89.3 %의 프레임이 t0.5 내에 처리됨을 알 수 있다. As can be seen from FIG. 6 , according to an embodiment of the present invention, it can be seen that all frames are processed before t1, and about 89.3% of frames are processed within t0.5.

반면, 종래 기술 1 및 2에 따르면, 하나의 애플리케이션이 에지 서버에서 실행 중인 경우, 애플리케이션 레벨 굿풋은 각각 약 86.2 % (t1) 및 50.8 % (t0.5)과, 81.1 % (t1), 42.5 %(t0.5)이다. 또한, 동시에 실행되는 애플리케이션의 수가 증가함에 따라 유용한 프레임을 처리하기 전에 처리해야 하는 많은 양의 프레임으로 인해 애플리케이션 레벨 굿풋이 크게 떨어짐을 알 수 있다. On the other hand, according to prior arts 1 and 2, when one application is running on the edge server, the application-level goodput is about 86.2% (t1) and 50.8% (t0.5), 81.1% (t1), and 42.5%, respectively. (t0.5). Also, as the number of concurrently running applications increases, it can be seen that the application level goodput drops significantly due to the large amount of frames that must be processed before useful frames can be processed.

도 8은 대상의 평균 이동 속도에 따른 애플리케이션 레벨 굿풋과 추가 대역폭 사용량을 도시하는 도면이다.8 is a diagram illustrating an application level good put and additional bandwidth usage according to an average moving speed of a target.

본 발명의 실시예에서, 시간적 지역성이 잘 처리되는지 여부를 평가하기 위해, 대상이 다른 속도로 이동하는 다른 비디오 세트를 재생하고 추가 대역폭 사용량과 애플리케이션 레벨 굿풋을 비교할 수 있다. 상술한 수학식 1에서 α 및 β는 각각 -0.15 및 10으로 설정된다. x 축은 대상의 평균 이동 속도를 나타내고, 좌측 y 축에는 애플리케이션 레벨 굿풋이 표시되고, 우측 y축에는 추가 대역폭 사용률이 표시된다.In embodiments of the present invention, to evaluate whether temporal locality is handled well, it is possible to play different sets of videos in which the subject moves at different speeds and compare the additional bandwidth usage with the application level goodput. In Equation 1, α and β are set to −0.15 and 10, respectively. The x-axis represents the average moving speed of the target, the left y-axis shows the application-level goodput, and the right y-axis shows the additional bandwidth usage.

도 8로부터 모든 프레임이 t1(1초) 임계값 내에 도달하지만, 프레임의 83 % ~ 84.1 %는 다양한 목표 이동 속도에서 t0.5(0.5초) 임계값 내에 처리됨을 알 수 있다. 또한, 실험을 통해 목표 속도가 증가함에 따라 t_exp에서 예정된 가상 컨테이너가 약간 더 일찍 로드됨을 알 수 있다. 이에 따라, 에지 서버 애플리케이션으로 전송되는 추가 프레임 수가 25.1 % 증가한다.It can be seen from FIG. 8 that all frames reach the t1 (1 second) threshold, but 83% to 84.1% of the frames are processed within the t0.5 (0.5 second) threshold at various target moving speeds. Also, experiments show that the virtual container scheduled at _{t exp is loaded slightly earlier as the target speed increases.} This increases the number of additional frames sent to the edge server application by 25.1%.

도 7은 실행 중인 애플리케이션의 수에 따른 유용 및 무용한 프레임을 비교하는 도면이다.7 is a diagram comparing useful and useless frames according to the number of running applications.

필터링된 무용한 비디오 프레임 수가 얼마나 많은지 확인하기 위해 동시 실행 중인 애플리케이션 수에 따라 에지 서버로 전송된 총 프레임 수(즉, 무용한 프레임 + 유용한 프레임)를 비교할 수 있다.To see how many stale video frames are filtered out, you can compare the total number of frames sent to the edge server (i.e. stale frames + useful frames) based on the number of concurrently running applications.

도 7에서 알 수 있듯이, 종래 기술 1에서는 무용한 프레임은 전체 프레임의 80%에서 82%까지 증가한다. 이는 모든 프레임을 맹목적으로 에지 서버로 전송하기 때문이다.As can be seen from FIG. 7 , in the prior art 1, the useless frame increases from 80% to 82% of the total frame. This is because all frames are blindly sent to the edge server.

이에 반해, 본 발명의 실시예와 종래 기술 2에서는 카메라가 대상 객체를 포함하는 프레임만 에지 서버로 전송하므로 대역폭 사용량이 훨씬 적음을 알 수 있다.In contrast, in the embodiment of the present invention and prior art 2, it can be seen that the bandwidth usage is much less because the camera transmits only the frame including the target object to the edge server.

도 9는 카메라에서의 대기 시간을 비교한 도면으로, 여기서 p(d&r)은 대상을 검출하고 재식별할 때 발생된 지연을 나타내고, p(d)는 대상 상관에서 발생된 지연을 나타내며, sc는 다른 처리없이 미가공 비디오 프레임을 스트리밍할 때의 지연을 나타낸다. 이로부터 Docker를 사용하여 가상 컨테이너를 생성하는 오버 헤드가 전체 성능에 크게 영향을 미치지 않음을 알 수 있다.9 is a diagram comparing latency in cameras, where p(d&r) represents the delay generated when detecting and re-identifying an object, p(d) represents the delay generated in the target correlation, and sc is Represents the delay when streaming raw video frames without any other processing. From this, it can be seen that the overhead of creating virtual containers using Docker does not significantly affect the overall performance.

주변 IoT 카메라에서 핸드오버 메시지를 수신하면 IoT 카메라에서 가상 컨테이너를 생성하고 초기화하는데 걸리는 시간을 측정하였다. 가상 컨테이너의 생성 및 초기화 시간은 표 1과 같다.When a handover message is received from a nearby IoT camera, the time taken to create and initialize a virtual container in the IoT camera is measured. Table 1 shows the virtual container creation and initialization times.

단계step 시간(ms)time (ms) 생성produce 2814.5742814.574 초기화reset 1.2531.253

즉, 처음부터 추적 작업 인스턴스 생성에는 약 2.814 초가 소요된다. 또한, 초기화 시간은 가상 컨텐이너가 프레임을 처리하거나 전송 시작하는데 걸리는 시간으로, 작업이 생성되고 나면 초기화 시간은 약 0.00125 초가 걸린다. 즉, 일단 가상 컨테이너가 생성되면 초기화 시간은 크게 소요되지 않음을 알 수 있다.That is, creating a trace job instance from scratch takes about 2.814 seconds. In addition, the initialization time is the time it takes for the virtual container to process a frame or start transmitting, and the initialization time takes about 0.00125 seconds after a task is created. In other words, once the virtual container is created, it can be seen that the initialization time does not take much.

도 10은 모든 IoT 카메라의 에너지 소비량을 비교한 도면이다.10 is a diagram comparing energy consumption of all IoT cameras.

실행 중인 애플리케이션 수에 따라 애플리케이션 작업의 일부가 IoT 카메라에서 실행될 때 카메라의 에너지 소비를 측정하였다. 3 대의 IoT 카메라의 에너지 소비량을 더했으며, 여기서 각 IoT 카메라의 유휴 상태에서 에너지 소비량은 약 30J이다.Depending on the number of running applications, we measured the energy consumption of the camera when a part of the application's work is running on the IoT camera. We added the energy consumption of 3 IoT cameras, where the energy consumption of each IoT camera in idle state is about 30J.

본 발명의 실시예에 따르면, 모든 경우에 대해 가장 적은 양의 에너지를 소비하며, 특히 종래 기술 2에 비해 최대 62.13 %의 에너지 소비를 절약할 수 있다. 또한, 종래 기술 1도 본 발명의 실시예보다 더 많은 에너지를 소비한다. According to the embodiment of the present invention, the smallest amount of energy is consumed in all cases, and in particular, it is possible to save up to 62.13% of energy consumption compared to the prior art 2 . In addition, prior art 1 also consumes more energy than the embodiment of the present invention.

종래 기술 1의 경우, 실행 중인 애플리케이션 수에 관계없이 570J을 소비한다. 종래 기술 1에서는 IoT 카메라가 모든 비디오 프레임을 에지 서버로 스트리밍하는 것과 정확히 동일한 작업을 수행하기 때문이다. 이와 같이 무선 네트워크를 통해 유용한 비디오 프레임과 무용한 비디오 프레임을 모두 전송하면 많은 에너지가 소비되므로 네트워크 오버 헤드와 에너지 소비 측면에서 비효율적이다.In the case of prior art 1, 570J is consumed regardless of the number of running applications. This is because, in prior art 1, the IoT camera does exactly the same thing as streaming every video frame to the edge server. In this way, transmitting both useful and useless video frames over a wireless network consumes a lot of energy, which is inefficient in terms of network overhead and energy consumption.

종래 기술 2이 경우, 에너지 소비는 실행 중인 애플리케이션 하나인 경우 954J를 소비하고, 애플리케이션이 3개인 경우 1323J을 소비한다. 종래 기술 2에서는 IoT 카메라가 추적 작업을 다른 카메라와 조정하는 논리가 없기 때문에 각 IoT 카메라는 항상 애플리케이션별 작업을 실행해야 한다. 일반적으로 각 애플리케이션 별 작업은 컨테이너를 사용하여 격리된 환경에서 실행될 수 있으며 컨테이너 수는 애플리케이션 수에 따라 달라진다. 따라서 IoT 카메라의 전체 에너지 소비는 실행 중인 애플리케이션의 수에 비례한다.In the case of prior art 2, energy consumption consumes 954J when there is one running application, and consumes 1323J when there are three applications. In the prior art 2, since there is no logic for an IoT camera to coordinate a tracking operation with another camera, each IoT camera must always execute an application-specific operation. Typically, each application-specific task can be run in an isolated environment using containers, and the number of containers depends on the number of applications. Therefore, the overall energy consumption of an IoT camera is proportional to the number of running applications.

이에 반해, 본 발명의 실시예에 따르면, 실행 중인 애플리케이션이 하나인 경우 393J를 소비하고, 애플리케이션이 3개인 경우 501J를 소비한다. 본 발명에서는 IoT 카메라는 시공간 조정을 통해 애플리케이션별 작업을 적절한 것에 배치할 수 있으며, 애플리케이션의 수가 증가하더라도 각 IoT 카메라가 반드시 모든 작업을 동시에 실행할 필요는 없다. 즉, 특정 애플리케이션에 유용한 프레임을 공급하기에 적합한 것으로 추정되는 카메라만 해당 가상 컨테이너를 유지하면 된다.In contrast, according to the embodiment of the present invention, when there is one running application, 393J is consumed, and when there are three applications, 501J is consumed. In the present invention, the IoT camera can arrange an application-specific task to an appropriate one through spatiotemporal adjustment, and even if the number of applications increases, each IoT camera does not necessarily execute all tasks simultaneously. This means that only those cameras that are presumed suitable to supply useful frames for a particular application need to maintain their virtual containers.

본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 본 발명에 따른 구성요소를 치환, 변형 및 변경할 수 있다는 것이 명백할 것이다.The present invention is not limited by the above embodiments and the accompanying drawings. For those of ordinary skill in the art to which the present invention pertains, it will be apparent that the components according to the present invention can be substituted, modified and changed without departing from the technical spirit of the present invention.

Claims

A plurality of IoT cameras configured to include a hypervisor and one or more virtual containers (VC) in charge of camera functions and frame pre-processing; and
It is configured to include a processing device for video processing, a plurality of IoT camera queues in which frames received from each of the plurality of IoT cameras wait, and one or more virtual container masters (VCMs), through the virtual container master Includes an edge server to perform tracking tasks for each tracking application,
The virtual container uses a target template provided from the virtual container master to provide only a frame including a tracking target among frames supplied from the hypervisor to the edge server, monitors the motion of the target, and performs a tracking operation with a neighboring IoT camera An object tracking system using a plurality of cameras, characterized in that performing a handover operation to be passed to.

The method of claim 1,
The hypervisor is
a Common Library Execution Manager that maintains a software library for video analysis;
The common library execution manager is an object tracking system using a plurality of cameras, characterized in that provided with a common library storage (CLS; Common Library Storage) as a shared storage.

3. The method of claim 2,
The object tracking system using a plurality of cameras, characterized in that the virtual camera driver included in the virtual container sends the command to the common library execution manager when the command is executed.

4. The method of claim 3,
The common library execution manager executes a command only once when a plurality of virtual containers requesting the same library are executed with the same video stream and copies the result to each virtual container.

The method of claim 1,
The virtual container is
The object tracking system using a plurality of cameras, characterized in that it further comprises a tracker for performing re-identification according to a correlation-based tracking algorithm for a predefined number of next frames when there is the target in the frame.

The method of claim 1,
The virtual container determines handover to a neighboring IoT camera in consideration of an expected time for the target to appear in the field of view of the neighboring IoT camera, an expected time when the target will depart from the field of view of the current IoT camera, and the moving direction of the target. An object tracking system using multiple cameras.