KR20210040709A

KR20210040709A - Device and Method for adding target class thereon

Info

Publication number: KR20210040709A
Application number: KR1020190123361A
Authority: KR
Inventors: 정승원; 고용수; 박종찬; 박미주; 김영호
Original assignee: 한화테크윈 주식회사
Priority date: 2019-10-04
Filing date: 2019-10-04
Publication date: 2021-04-14

Abstract

As a preferred embodiment of the present invention, a terminal comprises: an MSE buffer for storing some sections of a received image; a first interface for selecting a target class from an image displayed on a display; and a second interface for selecting at least one specific point in the image displayed on the display, wherein the display can estimate and display the target class on a captured image received in real time from the at least one image capturing device after the target class is selected.

Description

Device and Method for adding target class thereon

본 발명은 네트워크 감시카메라에 딥러닝을 적용하여 영상을 분석하는 방법에 관한 것으로, 보다 상세히 네트워크 감시카메라에서 사용자가 추가적으로 새로운 클래스(class)를 추가하기 위해 전이학습(transfer learning)을 수행하는 방법에 관한 것이다. The present invention relates to a method of analyzing an image by applying deep learning to a network surveillance camera, and in more detail, a method of performing transfer learning in order for a user to additionally add a new class in a network surveillance camera. About.

종래의 네트워크 감시카메라는 임베디드 시스템으로 구현되어 있어, 이미 주어진 데이터 세트(data set)를 이용하여 학습을 한 후, 학습된 모델 파라미터와 라벨(label)을 고정하고 임베디드 시스템 내에서는 객체를 추론(inferencing)하는 작업만을 수행하였다. Conventional network surveillance cameras are implemented as embedded systems, so after learning using a given data set, the learned model parameters and labels are fixed, and objects are inferencing within the embedded system. ) Only the task was performed.

그러나, 네트워크 감시카메라에서 획득한 영상 내의 각 객체를 사전 학습을 통해 분류된 고정된 클래스로만 분류하여 분석하는 기존의 방법을 통한 딥러닝은 제한적인 결과가 나오는 제약이 있다. However, deep learning through the conventional method of classifying and analyzing each object in an image acquired from a network surveillance camera into only fixed classes classified through pre-learning has limited results.

KR 10-2019-0062030 AKR 10-2019-0062030 A

본 발명의 바람직한 일 실시예에서는 사용자가 네트워크 감시카메라에서 수신하는 영상을 모니터링하다가, 새로운 타겟 클래스(target class)를 영상을 통해 선택하고, 네트워크 감시카메라에서 전이 학습을 수행함으로써 사용자가 추가한 타겟 클래스의 업데이트를 반영하여 딥러닝 추론을 수행하고자 한다. In a preferred embodiment of the present invention, while monitoring an image received from a network surveillance camera by a user, a new target class is selected through the image, and the target class added by the user by performing transfer learning in the network surveillance camera. We intend to perform deep learning inference by reflecting the update of.

본 발명의 바람직한 일 실시예로서, 단말기는 적어도 하나의 영상촬영장치에서 촬영된 영상을 실시간으로 수신하는 수신부; 수신한 영상의 일부 구간을 저장하는 MSE(Media Source Extensions)버퍼; 상기 수신한 영상을 표시하는 디스플레이; 상기 디스플레이에 표시되는 영상에서 타겟 클래스(target class)를 선택하는 제 1 인터페이스; 및 상기 디스플레이에서 표시되는 영상에서 적어도 하나의 특정시점을 선택하는 제 2 인터페이스;를 포함하고, 상기 디스플레이는 상기 타겟 클래스가 선택된 이후 상기 적어도 하나의 영상촬영장치로부터 실시간으로 수신하는 촬영된 영상에 상기 타겟 클래스를 추정하여 표시할 수 있는 것을 특징으로 한다. In a preferred embodiment of the present invention, a terminal includes: a receiver configured to receive an image captured by at least one image capturing device in real time; A Media Source Extensions (MSE) buffer for storing a partial section of the received image; A display displaying the received image; A first interface for selecting a target class from an image displayed on the display; And a second interface for selecting at least one specific time point from the image displayed on the display, wherein the display is configured to display the captured image received in real time from the at least one image capturing device after the target class is selected. It is characterized in that the target class can be estimated and displayed.

본 발명의 바람직한 일 실시예로서, 딥러닝부에서 상기 선택된 특정시점 기준 기설정된 일정시간 이내의 프레임들을 기초로 상기 타겟 클래스를 학습하고, 전송부를 통해 학습된 타겟 클래스 및 이와 관련된 딥러닝 변수를 상기 적어도 하나의 영상촬영장치에 전송하는 것을 특징으로 한다. 이 경우, 상기 MSE 버퍼는 상기 라이브 모니터링 과정에서 수신한 상기 영상의 일부 구간을 디코딩한 후 렌더링 하여 상기 디스플레이에 표시하고, 상기 MSE버퍼 내에 일정시간동안 저장(hold)하는 것을 특징으로 한다. 상기 딥러닝부는 상기 선택된 특정시점 기준 기설정된 일정시간 이내의 프레임을 모두 이용하여 데이터 증강(data augumentation)방식으로 상기 타겟 클래스에 대해 전이 학습(transfer learning)을 수행하는 것을 특징으로 한다. As a preferred embodiment of the present invention, a deep learning unit learns the target class based on frames within a predetermined period of time based on the selected specific time point, and the target class learned through the transmission unit and a deep learning variable related thereto are learned. It is characterized in that the transmission to at least one image capturing apparatus. In this case, the MSE buffer is characterized in that a partial section of the image received in the live monitoring process is decoded, rendered, displayed on the display, and stored in the MSE buffer for a predetermined period of time. The deep learning unit is characterized in that it performs transfer learning on the target class using a data augumentation method using all frames within a predetermined time based on the selected specific time point.

본 발명의 또 다른 일 실시예로서, 단말기는 디스플레이; 하나 이상의 인스트럭션을 저장하는 메모리; 수신한 라이브영상의 일부 구간을 저장하는 버퍼; 및 상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행하는 프로세서; 상기 라이브영상에서 타겟 클래스(target class)를 선택하는 제 1 인터페이스;및 상기 라이브영상에서 적어도 하나의 특정시점을 선택하는 제 2 인터페이스;를 포함하고, 상기 프로세서는, 상기 하나 이상의 인스트럭션을 실행함으로써, 실시간으로 수신하는 라이브영상을 출력하도록 상기 디스플레이를 제어하고, 또한, 상기 타겟 클래스가 선택된 이후 상기 실시간으로 수신하는 라이브영상에 상기 타겟 클래스를 추정하여 표시할 수 있도록 제어하며, 상기 메모리는 딥러닝을 통해 상기 적어도 하나의 특정시점 전후의 프레임으로부터 상기 타겟 클래스를 학습한 학습 결과를 저장하는 것을 특징으로 한다. In another embodiment of the present invention, a terminal includes a display; A memory for storing one or more instructions; A buffer for storing a partial section of the received live image; And a processor that executes the one or more instructions stored in the memory. A first interface for selecting a target class in the live image; and a second interface for selecting at least one specific point in the live image, wherein the processor executes the one or more instructions, The display is controlled to output a live image received in real time, and the target class is estimated and displayed on the live image received in real time after the target class is selected, and the memory performs deep learning. And storing a learning result obtained by learning the target class from frames before and after the at least one specific time point.

본 발명의 또 다른 바람직한 일 실시예로서, 단말기에서 타겟클래스를 추가하여 표시하는 방법은 적어도 하나의 영상촬영장치에서 촬영된 라이브영상을 수신하는 단계; MSE(Media Source Extensions)버퍼에서 상기 라이브영상의 일부 구간을 저장하는 단계; 디스플레이에 상기 수신한 라이브영상을 표시하는 단계; 상기 디스플레이에 표시되는 상기 라이브영상에서 타겟 클래스(target class)를 지정하는 단계; 인스턴트 플레이백 인터페이스를 이용하여 상기 디스플레이에 표시되는 상기 라이브영상에서 적어도 하나의 특정시점을 선택하는 단계; 및 상기 타겟 클래스가 선택된 이후 상기 적어도 하나의 영상촬영장치로부터 수신하는 라이브영상에 상기 타겟 클래스를 추정하여 표시하는 단계;를 포함하고, 상기 선택된 적어도 하나의 특정시점 기준 기설정된 일정시간 이내의 시계열순으로 배치된 복수의 프레임을 이용하여 상기 타겟 클래스에 대해 전이 학습을 수행하는 것을 특징으로 한다. In another preferred embodiment of the present invention, a method for displaying by adding a target class in a terminal comprises: receiving a live image captured by at least one image capturing device; Storing a partial section of the live video in a Media Source Extensions (MSE) buffer; Displaying the received live image on a display; Designating a target class in the live image displayed on the display; Selecting at least one specific time point from the live image displayed on the display using an instant playback interface; And estimating and displaying the target class on a live image received from the at least one image capturing device after the target class is selected, and including, in a time series order within a predetermined time based on the selected at least one specific time point. It is characterized in that transfer learning is performed on the target class using a plurality of frames arranged as.

본 발명의 바람직한 일 실시예에서는 고정된 클래스에 대해서만 영상 분석이 수행되는 것이 아니라, 사용자가 추가적으로 선택한 클래스에 대해서도 실시간 전이 학습을 통해 사용자가 원하는 대상(target)에 대한 영상 분석이 수행될 수 있다. In a preferred embodiment of the present invention, image analysis is not performed only for a fixed class, but an image analysis for a target desired by the user may be performed for a class additionally selected by the user through real-time transfer learning.

도 1 은 본 발명의 바람직한 일 실시예로서, CCTV에서 실시간 전이학습을 수행하는 영상감시시스템을 도시한다.
도 2 는 본 발명의 바람직한 일 실시예로서, 단말기 내부 구성도를 도시한다.
도 3 은 본 발명의 바람직한 일 실시예로서, 인스턴트 플레이백 인터페이스를 이용하여 MSE버퍼에 영상의 일부 구간을 저장하는 일 예를 도시한다.
도 4 는 본 발명의 바람직한 일 실시예로서, 실시간 전이학습을 수행하는 일 예를 도시한다.
도 5 는 본 발명의 바람직한 일 실시예로서, 영상감시시스템 내의 단말기에서 타겟클래스를 추가하여 표시하는 흐름도를 도시한다. 1 is a preferred embodiment of the present invention, showing a video surveillance system for performing real-time transfer learning in CCTV.
2 is a block diagram of a terminal according to a preferred embodiment of the present invention.
3 illustrates an example of storing a partial section of an image in an MSE buffer using an instant playback interface as a preferred embodiment of the present invention.
4 shows an example of performing real-time transfer learning as a preferred embodiment of the present invention.
FIG. 5 is a flowchart illustrating a display by adding a target class in a terminal in the video surveillance system as a preferred embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 개시의 실시예를 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 또한, 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였다. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present disclosure. However, the present disclosure may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in the drawings, parts not related to the description are omitted in order to clearly describe the present disclosure.

본 명세서, 특히, 특허 청구 범위에서 사용된 “상기” 및 이와 유사한 지시어는 단수 및 복수 모두를 지시하는 것일 수 있다. 또한, 본 개시에 따른 방법을 설명하는 단계들의 순서를 명백하게 지정하는 기재가 없다면, 기재된 단계들은 적당한 순서로 행해질 수 있다. 기재된 단계들의 기재 순서에 따라 본 개시가 한정되는 것은 아니다.In the present specification, in particular, the “above” and similar designations used in the claims may indicate both the singular and the plural. Further, unless there is a description that clearly specifies the order of the steps describing the method according to the present disclosure, the described steps may be performed in a suitable order. The present disclosure is not limited according to the order of description of the described steps.

본 개시의 일부 실시예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들의 일부 또는 전부는 특정 기능들을 실행하는 다양한 개수의 하드웨어 및/또는 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 본 개시의 기능 블록들은 하나 이상의 마이크로프로세서들에 의해 구현되거나, 소정의 기능을 위한 회로 구성들에 의해 구현될 수 있다. 또한, 예를 들어, 본 개시의 기능 블록들은 다양한 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능 블록들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 개시는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. “매커니즘”, “요소”, “수단” 및 “구성”등과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다.Some embodiments of the present disclosure may be represented by functional block configurations and various processing steps. Some or all of these functional blocks may be implemented with various numbers of hardware and/or software components that perform specific functions. For example, the functional blocks of the present disclosure may be implemented by one or more microprocessors, or may be implemented by circuit configurations for a predetermined function. In addition, for example, the functional blocks of the present disclosure may be implemented in various programming or scripting languages. Functional blocks may be implemented as an algorithm executed on one or more processors. In addition, the present disclosure may employ conventional techniques for electronic environment setting, signal processing, and/or data processing. Terms such as “mechanism”, “element”, “means” and “composition” can be used widely, and are not limited to mechanical and physical configurations.

이하 첨부된 도면을 참고하여 본 개시를 상세히 설명하기로 한다.Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명의 바람직한 일 실시예로서, CCTV에서 실시간 전이학습을 수행하는 영상감시시스템을 도시한다. 1 is a preferred embodiment of the present invention, showing a video surveillance system for performing real-time transfer learning in CCTV.

영상감시시스템(100)은 사용자 단말기(120)와 영상촬영장치(110)를 포함한다. 영상촬영장치(110)는 단말기(120)와 유무선 통신이 가능하며, 영상촬영장치(110)의 예로는 CCTV, 카메라, 영상촬영이 가능한 단말기 등을 포함한다. The image monitoring system 100 includes a user terminal 120 and an image capturing device 110. The image capturing device 110 is capable of wired or wireless communication with the terminal 120, and examples of the image capturing device 110 include a CCTV, a camera, a terminal capable of capturing an image, and the like.

단말기(120)는 휴대폰, 태블릿 PC, 디지털 카메라, 캠코더, 노트북, 컴퓨터, 태블릿 PC, 데스크탑, 전자책 단말기, 디지털 방송용 단말기, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 네비게이션, 착용형 기기(wearable device) 등과 같은 다양한 전자 장치로 구현될 수 있다. 또한, 단말기(120)는 또한 도 2의 구성요소들을 포함한 형태의 물리적 장치를 모두 포함한다.Terminal 120 is a mobile phone, tablet PC, digital camera, camcorder, notebook, computer, tablet PC, desktop, e-book terminal, digital broadcasting terminal, PDA (Personal Digital Assistants), PMP (Portable Multimedia Player), navigation, wearable It may be implemented in various electronic devices such as a wearable device. In addition, the terminal 120 also includes all of the physical devices including the components of FIG. 2.

본 발명의 바람직한 일 실시예로서, 영상촬영장치(110)는 제 2 딥러닝부(112)를 포함하며, 제 2 딥러닝부(112)는 사전-트레이닝(pre-training)된 데이터 세트를 이용하여 딥러닝이 수행된 것을 전제로 한다. 제 2 딥러닝부(112)는 영상촬영장치(110)에서 촬영한 영상 중 기학습된 클래스에 대해 분석하여 결과를 메타데이터 형태로 제공할 수 있다. As a preferred embodiment of the present invention, the image capturing device 110 includes a second deep learning unit 112, and the second deep learning unit 112 uses a pre-trained data set. Therefore, it is assumed that deep learning has been performed. The second deep learning unit 112 may analyze a pre-learned class among images captured by the image capturing apparatus 110 and provide a result in the form of metadata.

본 발명의 바람직한 일 실시예에서, 제 2 딥러닝부(112)는 또한 새로운 타겟 클래스에 대해 전이 학습이 가능하다. 전이 학습을 수행하는 과정은 아래와 같다.In a preferred embodiment of the present invention, the second deep learning unit 112 is also capable of transfer learning for a new target class. The process of performing transfer learning is as follows.

사용자가 CCTV(110)에서 전송되는 영상을 모니터링 하다가, 제 1 인터페이스(S121)를 이용하여 새로운 타겟 클래스가 포함된 t 시점의 영상을 선택한다. 타겟 클래스의 일 예로는 모자를 쓴 남성 등이 있다.While the user monitors the image transmitted from the CCTV 110, the first interface S121 selects an image at point t that includes the new target class. An example of a target class is a man wearing a hat.

이 후, 사용자는 제 2 인터페이스(S122)를 이용하여 선택된 타겟 클래스가 포함된 t+x 초 ~ t-y 시점의 영상을 모두 수집하도록 구현한다. 본 발명의 바람직한 일 실시예에서는, 라이브 영상에서 새로운 타겟 클래스를 추가할 때 특정 시간 t 기준 전후에 동일한 타겟 클래스가 있는 특징을 이용하여, 동일한 타겟 클래스를 데이터 증강 형식으로 학습할 수 있다. After that, the user implements to collect all images of the time point t+x to t-y including the selected target class using the second interface S122. In a preferred embodiment of the present invention, when a new target class is added in a live image, the same target class may be learned in a data augmentation format by using a feature having the same target class before and after a specific time t reference.

단말기(120)는 제 1 딥러닝부(122)를 통해 제 2 인터페이스(S122)를 통해 수집된 일련의 시계열적 영상을 모두 이용하여 데이터 증강을 수행하고(S123), 새로운 타겟 클래스에 대한 전이 학습(transfer learning) 및 업데이트(S124)를 수행한다. 단말기(120)는 업데이트된 딥러닝 변수(deep learning parameter)와 새로운 타겟 클래스 정보를 CCTV(110)에 전송한다. The terminal 120 performs data augmentation using all of the series of time series images collected through the second interface S122 through the first deep learning unit 122 (S123), and transfer learning for a new target class. (transfer learning) and update (S124) are performed. The terminal 120 transmits the updated deep learning parameter and new target class information to the CCTV 110.

CCTV(110)는 수신한 딥러닝 변수와 새로운 타겟 클래스 정보를 업데이트한 후 영상에 대한 객체 추론(deep learning inferencing)을 수행한다.The CCTV 110 updates the received deep learning variable and new target class information and then performs deep learning inferencing on the image.

도 2 를 참고하면 단말기(200)는 프로세서(210), 메모리(220), 버퍼(230), 디스플레이(240)를 포함하고, 디스플레이(240)는 제 1 인터페이스(250) 및 제 2 인터페이스(260)를 지원한다. Referring to FIG. 2, the terminal 200 includes a processor 210, a memory 220, a buffer 230, and a display 240, and the display 240 includes a first interface 250 and a second interface 260. ) Support.

메모리(220)는 플래시 메모리 타입, 하드디스크 타입, 멀티미디어 카드 마이크로 타입, 카드 타입의 메모리, RAM, ROM, SRAM, EEPROM, PROM, 자기 메모리, 자기 디스크, 광디스크의 형태 등을 이용할 수 있으며, 프로세서(210)의 처리 및 제어를 위한 프로그램을 저장할 수 있다. 프로세서(210)는 단말기(200)의 The memory 220 may be a flash memory type, a hard disk type, a multimedia card micro type, a card type memory, RAM, ROM, SRAM, EEPROM, PROM, magnetic memory, magnetic disk, optical disk, etc., and a processor ( 210) can be stored in a program for processing and control. The processor 210 is

메모리(220)는 하나 이상의 인스트럭션을 저장하고, 프로세서(210)는 하나 이상의 인스트럭션을 실행함으로써, 실시간으로 수신하는 라이브영상을 출력하도록 상기 디스플레이(240)를 제어할 수 있다. 메모리(220)는 딥러닝을 통해 적어도 하나의 특정시점 전후의 프레임으로부터 타겟 클래스를 학습한 학습 결과를 저장할 수 있다. The memory 220 may store one or more instructions, and the processor 210 may control the display 240 to output a live image received in real time by executing one or more instructions. The memory 220 may store a learning result obtained by learning a target class from frames before and after at least one specific time point through deep learning.

프로세서(210)는 도 3 내지 4의 일 실시예와 같이 사용자에 의해 타겟 클래스가 선택된 이후 실시간으로 수신하는 라이브영상에 상기 선택된 타겟 클래스를 추정하여 표시할 수 있도록 제어할 수 있다.The processor 210 may control to estimate and display the selected target class in a live image received in real time after the target class is selected by the user as in the exemplary embodiment of FIGS. 3 to 4.

버퍼(230)는 수신한 라이브영상의 일부 구간을 저장할 수 있다. 도 3 을 참고하면, 버퍼(230)는 MSE 버퍼(330)를 이용할 수 있으며, 라이브 영상 중 일부 구간(320)을 저장할 수 있다. 또한 버퍼(230)는 사용자가 제 2 인터페이스(260)를 통해 선택한 특정시점을 기준으로 한 일정 시간 구간 내의 복수의 프레임(331~334)을 시계열 순서 대로 저장할 수 있다. The buffer 230 may store a partial section of the received live image. Referring to FIG. 3, the buffer 230 may use the MSE buffer 330 and may store some sections 320 of the live image. In addition, the buffer 230 may store a plurality of frames 331 to 334 within a predetermined time interval based on a specific time point selected by the user through the second interface 260 in time series order.

제 1 인터페이스(250)는 디스플레이(240)에 표시되는 라이브영상에서 타겟 클래스(target class)를 선택하도록 구현된다. 제 2 인터페이스(260)는 디스플레이(240)에 표시되는 라이브영상에서 적어도 하나의 특정시점을 선택하도록 구현된다. 본 발명의 바람직한 일 실시예에서 제 2 인터페이스(260)는 인스턴트 플레이백(instant playback) 인터페이스 일 수 있다.The first interface 250 is implemented to select a target class from a live image displayed on the display 240. The second interface 260 is implemented to select at least one specific time point from the live image displayed on the display 240. In a preferred embodiment of the present invention, the second interface 260 may be an instant playback interface.

도 3 내지 4 는 본 발명의 바람직한 일 실시예로서, 단말기(310a, 310b)에서 라이브영상을 모니터링하는 중 MSE 버퍼를 이용하여 영상을 저장하는 일 예를 도시한다.3 to 4 illustrate an example of storing an image using an MSE buffer while monitoring a live image in the terminals 310a and 310b as a preferred embodiment of the present invention.

본 발명의 바람직한 일 실시예로서, 라이브 영상은 디코더(332)에서 디코딩된 후 렌더링부(334)를 통해 렌더링이 이루어 지고, 사용자 단말기의 디스플레이(310b)에 표시된다. 사용자 단말기(310b)에서 이용하는 MSE 버퍼(330)는 사용자에게 보여준 영상을 바로 제거하지 않고 일부 프레임들(331~334)을 일정 시간동안 보유(hold)한다.In a preferred embodiment of the present invention, the live video is decoded by the decoder 332 and then rendered through the rendering unit 334 and displayed on the display 310b of the user terminal. The MSE buffer 330 used in the user terminal 310b does not immediately remove the image shown to the user, but holds some of the frames 331 to 334 for a predetermined period of time.

본 발명의 바람직한 일 실시예로서, 사용자가 단말기(310a)에서 특정 객체를 새로운 타겟 클래스로 선택하는 경우, 사용자는 인스턴트 플레이백과 같은 인터페이스를 이용하여 t-1초, t초, t+1초, t+2초 에 해당되는 시계열적 영상을 재생(playback)하여 딥러닝부(350)의 트레이닝 데이터로 사용한다. 데이터증강부(340)는 스케일링(scaling), 변환(translation), 회전(rotation), 플리핑(flipping), 노이즈 부가(add noise) 등의 방식을 통해 트레이닝 데이터를 증가시킨 후 딥러닝부(350)에 수집한 데이터를 제공한다. As a preferred embodiment of the present invention, when a user selects a specific object as a new target class in the terminal 310a, the user uses an interface such as instant playback, t-1 second, t second, t+1 second, A time series image corresponding to t+2 seconds is played back and used as training data of the deep learning unit 350. The data augmentation unit 340 increases the training data through scaling, translation, rotation, flipping, and add noise, and then the deep learning unit 350 Provide the collected data in ).

도 5는 본 발명의 바람직한 일 실시예로서, 단말기에서 타겟클래스를 추가하여 표시하는 방법을 도시한다. FIG. 5 shows a method of adding and displaying a target class in a terminal as a preferred embodiment of the present invention.

CCTV와 사용자 단말기를 포함하는 영상감시시스템에서 사용자 단말기를 적어도 하나의 CCTV에서 촬영되는 라이브영상을 수신한다(S510). 단말기에서는 수신하는 라이브 영상을 디코딩, 렌더링하여 디스플레이에 표시하고, 표시한 영상을 바로 제거하지 않고 MSE버퍼에 보유한다(S520, S530). S520, S530 단계는 순서적으로 수행되거나 S530 단계가 이루어진 후 S520 단계가 수행되거나 또는 S520 과 S530 단계가 동시에 수행되어 질 수 있음을 유의하여야 한다. In a video surveillance system including a CCTV and a user terminal, the user terminal receives a live image photographed by at least one CCTV (S510). The terminal decodes and renders the received live image and displays it on the display, and does not remove the displayed image immediately, but retains the displayed image in the MSE buffer (S520, S530). It should be noted that steps S520 and S530 may be sequentially performed, step S520 may be performed after step S530 is performed, or steps S520 and S530 may be performed simultaneously.

사용자는 단말기의 디스플레이에 표시되는 적어도 하나의 라이브 영상을 모니터링하며, 추가하고자 하는 새로운 타겟 클래스가 있는 경우 손가락, 첨펜, 그 외 다양한 입력 인터페이스를 이용하여 타겟 클래스를 선택할 수 있다(S540). 이 후, 사용자는 인스턴트 플레이백 인터페이스를 이용하여 상기 디스플레이에 표시되는 상기 라이브영상에서 적어도 하나의 특정시점 t를 선택한다(S550). The user monitors at least one live image displayed on the display of the terminal, and if there is a new target class to be added, the user may select the target class using a finger, a stylus, or various input interfaces (S540). Thereafter, the user selects at least one specific time point t from the live image displayed on the display using the instant playback interface (S550).

단말기 내의 제 1 딥러닝부는 MSE버퍼에 보유된 일련의 시계열적 영상을 수집하여 데이터 증강을 통해 새로운 타겟 클래스의 딥러닝 변수 및 클래스를 전이학습하여 업데이트 하고 이에 대한 정보를 적어도 하나의 CCTV에 전송한다.The first deep learning unit in the terminal collects a series of time-series images held in the MSE buffer, transfers and updates the deep learning variables and classes of the new target class through data augmentation, and transmits information about this to at least one CCTV. .

적어도 하나의 CCTV에서는 수신한 새로운 타겟 클래스의 딥러닝 변수 및 클래스를 기초로 딥러닝 추론을 수행한다. 단말기에서는, 새로운 타겟 클래스가 선택된 이후 상기 적어도 하나의 영상촬영장치로부터 수신하는 라이브 영상에 선택한 타겟 클래스가 추정되어 표시된다(S560).At least one CCTV performs deep learning inference based on the received deep learning variable and class of the new target class. In the terminal, after the new target class is selected, the selected target class is estimated and displayed in the live image received from the at least one image capturing apparatus (S560).

본 발명에 따른 장치는 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서 상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨데, 하드 디스크 등) 및 광학적 판독 매체(예컨데, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 상기 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 상기 매체는 컴퓨터에 의해 판독가능하며, 상기 메모리에 저장되고, 상기 프로세서에서 실행될 수 있다. The device according to the present invention includes a processor, a memory for storing and executing program data, a permanent storage such as a disk drive, a communication port for communicating with an external device, a user interface such as a touch panel, keys, buttons, etc. It may include a device and the like. Methods implemented as software modules or algorithms may be stored on a computer-readable recording medium as computer-readable codes or program instructions executable on the processor. Here, computer-readable recording media include magnetic storage media (eg, hard disks, etc.) and optical reading media (eg, CD-ROMs, DVDs: Digital Versatile Discs). The computer-readable recording medium is distributed over networked computer systems, so that computer-readable codes can be stored and executed in a distributed manner. The medium is readable by a computer, stored in the memory, and executed by the processor.

본 발명의 명세서(특히 특허청구범위에서)에서 “상기”의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 마지막으로, 본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순선에 따라 본 발명이 한정되는 것은 아니다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.In the specification of the present invention (especially in the claims), the use of the term “above” and a similar designating term may correspond to both the singular and the plural. In addition, when a range is described in the present invention, the invention to which individual values falling within the range are applied (unless otherwise stated), and each individual value constituting the range is described in the detailed description of the invention. Same as. Finally, unless explicitly stated or contradicted to the order of the steps constituting the method according to the present invention, the steps may be performed in a suitable order. The present invention is not necessarily limited according to the order of description of the above steps. The use of all examples or illustrative terms (for example, etc.) in the present invention is merely for describing the present invention in detail, and the scope of the present invention is limited by the above examples or illustrative terms unless limited by the claims. It does not become. In addition, those skilled in the art can recognize that various modifications, combinations, and changes may be configured according to design conditions and factors within the scope of the appended claims or their equivalents.

이제까지 본 발명에 대하여 그 바람직한 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at around its preferred embodiments. Those of ordinary skill in the art to which the present invention pertains will be able to understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from a descriptive point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the above description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

Claims

A receiver configured to receive an image captured by at least one image capturing device in real time;
A Media Source Extensions (MSE) buffer for storing a partial section of the received image;
A display displaying the received image;
A first interface for selecting a target class from an image displayed on the display;
A second interface for selecting at least one specific point in the image displayed on the display; Including,
And the display is configured to estimate and display the target class on a captured image received in real time from the at least one image capturing device after the target class is selected.

The method of claim 1,
And learning the target class based on frames within a predetermined period of time based on the selected specific time point, and transmitting the learned target class and a deep learning variable related thereto to the at least one image capturing apparatus.

The method of claim 1, wherein the display
And displaying the captured image received in real time to support live monitoring, and selecting the target class in the live monitoring process using the first interface.

The method of claim 3,
And the MSE buffer decodes a partial section of the image received in the live monitoring process, renders it, displays it on the display, and stores it in the MSE buffer for a predetermined period of time.

The terminal according to claim 1, wherein when the at least one specific time point is selected through the second interface, frames within a predetermined time based on the selected specific time point are selected among frames stored in the MSE buffer. .

The method of claim 5,
A terminal, characterized in that transfer learning is performed on the target class using a data augumentation method using all frames within a predetermined time based on the selected specific time point.

The method of claim 1,
And performing transfer learning on the target class by using a plurality of frames arranged in chronological order within a predetermined time based on the selected specific time point.

The method of claim 1, wherein the second interface
Terminal, characterized in that the instant playback (instant playback) interface.

display;
A memory for storing one or more instructions;
A buffer for storing a partial section of the received live image; And
A processor that executes the one or more instructions stored in the memory;
A first interface for selecting a target class in the live image; And
Including; a second interface for selecting at least one specific point in the live image,
The processor controls the display to output a live image received in real time by executing the one or more instructions, and estimates and displays the target class on the live image received in real time after the target class is selected. Control to be able to do it,
Wherein the memory stores a learning result obtained by learning the target class from frames before and after the at least one specific time point through deep learning.

10. The terminal of claim 9, wherein the target class is learned by performing learning on one object using all frames before and after the at least one specific time point.

As a method of displaying by adding a target class in a terminal,
Receiving a live image captured by at least one image capturing device;
Storing a partial section of the live video in a Media Source Extensions (MSE) buffer;
Displaying the received live image on a display;
Designating a target class in the live image displayed on the display;
Selecting at least one specific time point from the live image displayed on the display using an instant playback interface; And
Estimating and displaying the target class in a live image received from the at least one image capturing device after the target class is selected, in a time series order within a predetermined time based on the selected at least one specific time point And performing transfer learning on the target class using a plurality of arranged frames.