KR102321205B1

KR102321205B1 - Apparatus for Providing AI Service and Driving Method Thereof

Info

Publication number: KR102321205B1
Application number: KR1020210075343A
Authority: KR
Inventors: 윤진; 나종근
Original assignee: 주식회사 스누아이랩
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2021-11-03

Abstract

The present invention relates to an AI service apparatus and a driving method thereof. According to an embodiment of the present invention, the AI service apparatus may comprise: a communication interface unit sequentially receiving video frames of a filmed image; and a control unit setting an event section of an object detection error based on a confidence score calculated relating to a designated object in each of the received video frames and allowing learning and reusing of an error detection objection of the video frames where the confidence score deviates a threshold in the set event section.

Description

Artificial intelligence service device and driving method thereof

본 발명은 인공지능 서비스장치 및 그 장치의 구동방법에 관한 것으로서, 더 상세하게는 가령 현장에서 동작하는 검출기를 이용하여 오탐이나 미탐에 가까운 유의미한 데이터를 후처리 과정에서 추출하는 인공지능 서비스장치 및 그 장치의 구동방법에 관한 것이다.The present invention relates to an artificial intelligence service apparatus and a method of driving the apparatus, and more particularly, to an artificial intelligence service apparatus for extracting meaningful data close to false positives or false positives in a post-processing process using, for example, a detector operating in the field, and an artificial intelligence service apparatus and the same It relates to a method of driving a device.

인공지능(AI)을 데이터 기준으로 나누어 볼 때, 정답이 있는 데이터를 가지고 학습하는 것을 지도학습(supervised learning)이라 하고, 정답이 없는 일반 데이터를 가지고 학습하는 것을 비지도학습(unsupervised learning)이라 한다. 비지도학습을 하려면 데이터가 대단히 많이 필요하고 학습의 효과도 그리 크지 않다. 이러한 이유로 현대의 AI는 거의 지도학습을 하고 있다. 이와 같이 지도학습 기반 딥러닝 모델은 학습된 데이터가 필요하다.When artificial intelligence (AI) is divided into data standards, learning with data with correct answers is called supervised learning, and learning with general data without correct answers is called unsupervised learning. . Unsupervised learning requires a lot of data and the effect of learning is not that great. For this reason, modern AI is mostly supervised learning. As such, supervised learning-based deep learning models require trained data.

보통 산업분야에서는 해당 사이트에서 획득한 데이터로 학습된 검출기를 사용하지만, 오탐이나 미탐이 발생한다. 이때 오탐이나 미탐 데이터를 재학습하게 되면 검출기의 성능은 향상이 되지만 유의미한 데이터를 얻기가 힘들다. 물론 종래에는 그와 관련한 다양한 방법론들이 존재하지만 개발 비용이 높은 문제가 있다.Usually, in industrial fields, detectors learned from data obtained from the site are used, but false positives or false positives occur. At this time, if false or false positive data is re-learned, the performance of the detector is improved, but it is difficult to obtain meaningful data. Of course, there are various methodologies related thereto in the prior art, but there is a problem that the development cost is high.

예를 들어, 종래에는 학습에 유의미한 즉 오탐이나 미탐 확률이 높은 데이터는 사람이 그 시점을 관찰하여 영상 백업 등을 진행하거나 방법론을 적용하여 추출함으로써 높은 비용이 요구되는 문제가 있다.For example, in the related art, data that is meaningful for learning, ie, has a high probability of false positives or false positives, requires a high cost because a person observes the point in time and performs image backup or applies a methodology to extract the data.

한국공개특허공보 제10-2021-0006502호(2021.01.18)Korean Patent Application Laid-Open No. 10-2021-0006502 (2021.01/18) 한국등록특허공보 제10-2168558호(2020.10.15)Korean Patent Publication No. 10-2168558 (2020.10.15)

웹사이트 https://en.wikipedia.org/wiki/Active_learningWebsite https://en.wikipedia.org/wiki/Active_learning 웹사이트 https://inspaceai.github.io/2019/06/05/ActiveLearning_IntroducWebsite https://inspaceai.github.io/2019/06/05/ActiveLearning_Introduc tion/tion/

본 발명의 실시예는 가령 현장에서 동작하는 검출기를 이용하여 오탐이나 미탐에 가까운 유의미한 데이터를 후처리 과정에서 추출하는 인공지능 서비스장치 및 그 장치의 구동방법을 제공함에 그 목적이 있다.An object of the present invention is to provide an artificial intelligence service apparatus for extracting meaningful data close to false positives or false positives in a post-processing process using, for example, a detector operating in the field, and a method of driving the apparatus.

본 발명의 실시예에 따른 인공지능 서비스장치는, 촬영영상의 비디오 프레임을 순차적으로 수신하는 통신 인터페이스부, 및 상기 수신한 각 비디오 프레임 내의 지정 객체와 관련해 산출되는 신뢰 점수를 근거로 객체 검출 오류(false)의 이벤트 구간을 설정하고, 상기 설정한 이벤트 구간 내에서 상기 신뢰 점수가 임계치를 벗어나는 비디오 프레임의 오검출 객체에 대하여 학습시 재사용이 이루어지도록 하는 제어부를 포함한다.An artificial intelligence service apparatus according to an embodiment of the present invention includes a communication interface unit for sequentially receiving video frames of a captured image, and an object detection error ( false), and a control unit configured to reuse an erroneous detection object of a video frame in which the confidence score deviates from a threshold within the set event interval during learning.

상기 제어부는, 상기 이벤트 구간을 설정하기 위하여 영상 분석을 위한 비디오 프레임의 수를 결정할 수 있다.The controller may determine the number of video frames for image analysis in order to set the event period.

상기 제어부는, 상기 이벤트 구간을 설정하기 위하여 주변의 비디오 프레임들과 비교하여 객체 검출 오류가 있는지를 판단할 수 있다.The controller may determine whether there is an object detection error by comparing it with surrounding video frames to set the event period.

상기 제어부는, 상기 오검출 객체의 이미지를 지도학습 데이터로서 저장한 후 학습시 재사용할 수 있다.The controller may store the image of the erroneous detection object as supervised learning data and then reuse it during learning.

상기 제어부는, 상기 오검출 객체에 대한 비디오 프레임 내의 좌표값을 상기 이미지에 매칭하여 더 저장하며, 상기 좌표값은 주변 비디오 프레임에서 상기 오검출 객체에 대응되는 지정 객체의 좌표값을 근거로 산출할 수 있다.The control unit further stores a coordinate value in a video frame for the erroneous detection object by matching the image, and the coordinate value is calculated based on a coordinate value of a specified object corresponding to the erroneous detection object in a surrounding video frame. can

상기 제어부는, 상기 오검출 객체의 비디오 프레임을 기준으로 이전 시간과 이후 시간의 비디오 프레임에서 상기 지정 객체의 좌표값을 각각 추출하고, 상기 각각 추출한 좌표값의 중간값을 계산하여 상기 오검출 객체의 좌표값을 산출할 수 있다.The control unit extracts the coordinate values of the specified object from video frames of a previous time and a subsequent time based on the video frame of the erroneous detection object, and calculates a median value of the extracted coordinate values of the erroneous detection object. Coordinate values can be calculated.

상기 제어부는 상기 오검출 객체의 학습시 재사용에 따라 상기 오검출 객체의 동일 객체에 대하여 객체 정보를 수정하거나 객체 주변의 바운더리 박스에 대한 상태를 변경할 수 있다.The controller may modify object information for the same object of the false detection object or change the state of a boundary box around the object according to reuse when learning the false detection object.

또한, 본 발명의 실시예에 따른 인공지능 서비스장치의 구동방법은, 통신 인터페이스부가, 촬영영상의 비디오 프레임을 순차적으로 수신하는 단계, 및 제어부가, 상기 수신한 각 비디오 프레임 내의 지정 객체와 관련해 산출되는 신뢰 점수를 근거로 객체 검출 오류의 이벤트 구간을 설정하고, 상기 설정한 이벤트 구간 내에서 상기 신뢰 점수가 임계치를 벗어나는 비디오 프레임의 오검출 객체에 대하여 학습시 재사용이 이루어지도록 하는 단계를 포함한다.In addition, the method of driving an artificial intelligence service apparatus according to an embodiment of the present invention includes the steps of: sequentially receiving, by a communication interface unit, video frames of a captured image; and setting an event interval of an object detection error based on the confidence score obtained, and allowing the erroneous detection object of a video frame whose confidence score to exceed a threshold within the set event interval to be reused during learning.

본 발명의 실시예에 따르면 가령 동영상 검출기를 후처리 과정에서 오탐이나 미탐이 발생하거나 그에 가까운 유의미한 데이터를 학습시 재사용함으로써 학습의 정확도가 증가하며, 아울러 기술 개선에 따르는 개발 비용을 줄일 수 있을 것이다.According to an embodiment of the present invention, for example, by reusing meaningful data in which false positives or false positives occur in the post-processing process of the video detector or close thereto during learning, the accuracy of learning is increased, and the development cost due to technological improvement can be reduced.

도 1은 본 발명의 실시예에 따른 인공지능 서비스 시스템을 예시한 도면,
도 2는 도 1의 인공지능 서비스장치의 동작을 설명하기 위한 도면,
도 3은 도 1의 인공지능 서비스장치의 세부구조를 예시한 블록다이어그램, 그리고
도 4는 도 1의 인공지능 서비스장치의 구동과정을 나타내는 흐름도이다.1 is a diagram illustrating an artificial intelligence service system according to an embodiment of the present invention;
Figure 2 is a view for explaining the operation of the artificial intelligence service device of Figure 1;
3 is a block diagram illustrating the detailed structure of the artificial intelligence service apparatus of FIG. 1, and
4 is a flowchart illustrating a driving process of the artificial intelligence service device of FIG. 1 .

이하, 도면을 참조하여 본 발명의 실시예에 대하여 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 인공지능 서비스 시스템을 예시한 도면이며, 도 2는 도 1의 인공지능 서비스장치의 동작을 설명하기 위한 도면이다.1 is a diagram illustrating an artificial intelligence service system according to an embodiment of the present invention, and FIG. 2 is a diagram for explaining the operation of the artificial intelligence service apparatus of FIG. 1 .

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 인공지능 서비스 시스템(90)은 촬영장치(100), 통신망(110) 및 인공지능 서비스장치(120)의 일부 또는 전부를 포함한다.As shown in FIG. 1 , the artificial intelligence service system 90 according to an embodiment of the present invention includes some or all of the photographing device 100 , the communication network 110 , and the artificial intelligence service device 120 .

여기서, "일부 또는 전부를 포함한다"는 것은 통신망(110)과 같은 일부 구성요소가 생략되어 촬영장치(100)와 인공지능 서비스장치(120)가 다이렉트 통신(예: P2P 통신)을 수행하거나, 인공지능 서비스장치(120)를 구성하는 구성요소의 일부 또는 전부가 통신망(110)을 구성하는 네트워크장치(예: 무선교환장치 등)에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, “including some or all” means that some components such as the communication network 110 are omitted so that the photographing device 100 and the artificial intelligence service device 120 perform direct communication (eg, P2P communication), or It means that some or all of the components constituting the artificial intelligence service device 120 may be integrated into a network device (eg, a wireless switching device, etc.) constituting the communication network 110 and configured. In order to facilitate a sufficient understanding, it is explained that everything is included.

촬영장치(100)는 산업현장의 다양한 곳에 설치되어 촬영 동작을 수행할 수 있다. 예를 들어, 자율주행차, 스마트공장, 지능형 CCTV, 지능형범죄분석, AI 기반 질환예측 등을 위해 다양한 장소에 설치되어 동작할 수 있다. 촬영장치(100)는 이외에도 아이들이 활동하는 공간, 가령 어린이집이나 유치원 등의 임의 공간에 설치될 수 있다. 물론 촬영장치(100)는 가정 내에 설치되어 영아와 보모를 관찰하기 위해 설치될 수도 있다. 촬영장치(100)는 감시카메라로서 일반 CCTV(Closed Circuit Television) 카메라나 IP(Internet Protocol) 카메라 등을 포함한다. 또한, 촬영장치(100)는 고정식 카메라뿐 아니라 팬(Pan), 틸트(Tilt) 및 줌(Zoom) 동작이 가능한 PTZ(Pan-Tilt-Zoom) 카메라를 포함할 수 있다. 물론 촬영장치(100)는 스마트폰이나 태블릿PC 등의 사용자 단말장치에 탑재되는 카메라를 포함할 수도 있으므로, 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.The photographing apparatus 100 may be installed in various places in an industrial site to perform photographing operations. For example, it can be installed and operated in various places for self-driving cars, smart factories, intelligent CCTV, intelligent crime analysis, and AI-based disease prediction. In addition, the photographing device 100 may be installed in a space where children are active, for example, in an arbitrary space such as a daycare center or kindergarten. Of course, the photographing apparatus 100 may be installed in a home to observe infants and nanny. The photographing apparatus 100 includes a general CCTV (Closed Circuit Television) camera or an IP (Internet Protocol) camera as a surveillance camera. In addition, the photographing apparatus 100 may include a Pan-Tilt-Zoom (PTZ) camera capable of pan, tilt, and zoom operations as well as a fixed camera. Of course, since the photographing apparatus 100 may include a camera mounted on a user terminal device such as a smartphone or a tablet PC, the embodiment of the present invention will not be particularly limited to any one form.

산업현장에 설치되는 촬영장치(100)는 에지장치에 연동할 수도 있다. 다시 말해, 4차산업혁명 시대에 접어들면서 통신망(110) 등의 네트워크장치나 인공지능 서비스장치(120)는 처리해야 하는 데이터량이 폭증하면서 연산처리에 많은 부담을 느낄 수 있다. 이를 위하여 촬영장치(100)의 주변에 에지장치를 설치하여 1차적으로 영상분석 동작을 수행하도록 함으로써 데이터 처리량을 급격히 줄일 수 있다. 에지장치는 인공지능 서비스장치(120)와 연계하여 다양한 동작을 수행할 수 있으며, 예를 들어 분석을 위한 설정 데이터(예: 관찰 객체 등)를 변경함으로써 그에 상응하는 분석 결과를 인공지능 서비스장치(120)로 제공하도록 할 수 있다.The photographing device 100 installed in the industrial site may be linked to the edge device. In other words, as the 4th industrial revolution era enters, the network device such as the communication network 110 or the artificial intelligence service device 120 may feel a lot of burden in arithmetic processing as the amount of data to be processed increases. To this end, by installing an edge device around the photographing device 100 to primarily perform an image analysis operation, the amount of data processing can be drastically reduced. The edge device may perform various operations in connection with the artificial intelligence service device 120, for example, by changing the setting data for analysis (eg, observation object, etc.) 120) can be provided.

통신망(110)은 유무선 통신망을 모두 포함한다. 가령 통신망(110)으로서 유무선 인터넷망이 이용되거나 연동될 수 있다. 여기서, 유선망은 케이블망이나 공중 전화망(PSTN)과 같은 인터넷망을 포함하는 것이고, 무선 통신망은 CDMA, WCDMA, GSM, EPC(Evolved Packet Core), LTE(Long Term Evolution), 와이브로(Wibro) 망 등을 포함하는 의미이다. 물론 본 발명의 실시예에 따른 통신망(110)은 이에 한정되는 것이 아니며, 가령 클라우드 컴퓨팅 환경하의 클라우드 컴퓨팅망, 5G망 등에 사용될 수 있다. 가령, 통신망(110)이 유선 통신망인 경우 통신망(110) 내의 액세스포인트는 전화국의 교환국 등에 접속할 수 있지만, 무선 통신망인 경우에는 통신사에서 운용하는 SGSN 또는 GGSN(Gateway GPRS Support Node)에 접속하여 데이터를 처리하거나, BTS(Base Transmissive Station), NodeB, e-NodeB 등의 다양한 중계기에 접속하여 데이터를 처리할 수 있다.The communication network 110 includes both wired and wireless communication networks. For example, a wired/wireless Internet network may be used or interlocked as the communication network 110 . Here, the wired network includes an Internet network such as a cable network or a public telephone network (PSTN), and the wireless communication network includes CDMA, WCDMA, GSM, Evolved Packet Core (EPC), Long Term Evolution (LTE), Wibro network, etc. is meant to include Of course, the communication network 110 according to the embodiment of the present invention is not limited thereto, and may be used, for example, in a cloud computing network under a cloud computing environment, a 5G network, and the like. For example, when the communication network 110 is a wired communication network, the access point in the communication network 110 can connect to a switching center of a telephone company. or by accessing various repeaters such as a Base Transmissive Station (BTS), NodeB, and e-NodeB to process data.

통신망(110)은 액세스포인트(AP)를 포함할 수 있다. 여기서의 액세스포인트는 건물 내에 많이 설치되는 펨토(femto) 또는 피코(pico) 기지국과 같은 소형 기지국을 포함한다. 펨토 또는 피코 기지국은 소형 기지국의 분류상 촬영장치(100) 등을 최대 몇 대까지 접속할 수 있느냐에 따라 구분된다. 물론 액세스포인트는 촬영장치(100) 등과 지그비 및 와이파이 등의 근거리 통신을 수행하기 위한 근거리 통신모듈을 포함할 수 있다. 액세스포인트는 무선통신을 위하여 TCP/IP 혹은 RTSP(Real-Time Streaming Protocol)를 이용할 수 있다. 여기서, 근거리 통신은 와이파이 이외에 블루투스, 지그비, 적외선, UHF(Ultra High Frequency) 및 VHF(Very High Frequency)와 같은 RF(Radio Frequency) 및 초광대역 통신(UWB) 등의 다양한 규격으로 수행될 수 있다. 이에 따라 액세스포인트는 데이터 패킷의 위치를 추출하고, 추출된 위치에 대한 최상의 통신 경로를 지정하며, 지정된 통신 경로를 따라 데이터 패킷을 다음 장치, 예컨대 인공지능 서비스장치(120) 등으로 전달할 수 있다. 액세스포인트는 일반적인 네트워크 환경에서 여러 회선을 공유할 수 있으며, 예컨대 라우터(router), 리피터(repeater) 및 중계기 등이 포함된다.The communication network 110 may include an access point (AP). Here, the access point includes a small base station such as a femto or pico base station that is often installed in a building. Femto or pico base stations are classified according to the maximum number of access to the imaging device 100, etc. in the classification of small base stations. Of course, the access point may include a short-distance communication module for performing short-distance communication such as Zigbee and Wi-Fi with the photographing device 100 and the like. The access point may use TCP/IP or Real-Time Streaming Protocol (RTSP) for wireless communication. Here, short-range communication may be performed in various standards such as Bluetooth, Zigbee, infrared, radio frequency (RF) such as ultra high frequency (UHF) and very high frequency (VHF), and ultra wideband communication (UWB) in addition to Wi-Fi. Accordingly, the access point can extract the location of the data packet, designate the best communication path for the extracted location, and forward the data packet to the next device, such as the artificial intelligence service device 120 , along the designated communication path. The access point may share several lines in a general network environment, and includes, for example, a router, a repeater, and a repeater.

인공지능 서비스장치(120)는 인공지능(AI) 프로그램을 실행하는 다양한 유형의 장치를 포함할 수 있다. 예를 들어, 사용자들이 소지하는 스마트폰이 될 수 있으며, 태블릿PC, 데스크탑이나 랩탑컴퓨터, 또 스마트TV나 영상분석 동작을 수행하는 서버 등을 포함할 수도 있다. 무엇보다 본 발명의 실시예에 따른 인공지능 서비스장치(120)는 동영상 검출기를 포함하여 이를 통해 객체의 오탐이나 미탐과 같은 유의미한 데이터를 획득하기 위한 동작을 수행하는 것이 바람직하다. 더 정확하게는 인공지능 서비스장치(120)는 지도학습 기반의 딥러닝 모델을 실행하여 다양한 유형의 서비스가 이루어지도록 한다. 예를 들어, 사건, 사고를 탐지하기 위하여 딥러닝이 적용된다고 가정할 때, 비디오 프레임 내에서 객체에 대한 오탐이나 미탐이 있는 경우 이에 대한 학습이 이루어지도록 함으로써 학습의 정확도를 높일 수 있다. 또는 지도학습 데이터로서 학습 데이터의 양을 증가시킬 수 있다.The artificial intelligence service device 120 may include various types of devices that execute artificial intelligence (AI) programs. For example, it may be a smartphone possessed by users, and may include a tablet PC, a desktop or laptop computer, or a smart TV or a server performing image analysis operation. Above all, the artificial intelligence service device 120 according to the embodiment of the present invention preferably includes a video detector and performs an operation for acquiring meaningful data such as false positives or false positives of an object through this. More precisely, the artificial intelligence service device 120 executes a supervised learning-based deep learning model to provide various types of services. For example, assuming that deep learning is applied to detect events and accidents, when there are false or false positives about objects within a video frame, learning accuracy can be increased by allowing learning to take place. Alternatively, the amount of training data may be increased as supervised learning data.

본 발명의 실시예에 따른 인공지능 서비스장치(120)는 현장에서 동작하는 다양한 장치의 검출기를 이용하여 유의미한 데이터 가령 오탐이나 미탐에 가까운 데이터를 후처리 과정에서 추출한다. 여기서, 오탐이나 미탐은 비디오 프레임 내의 객체 등에 대하여 잘못된 탐지 즉 인식을 하거나 탐지를 하지 못하는 경우를 의미한다. 다만, 본 발명의 실시예에서는 지정된 시간 구간 내의 동일 객체에 대하여 나머지 비디오 프레임에서와 다른 검출 결과를 내는 것을 의미한다고 볼 수 있다.The artificial intelligence service device 120 according to an embodiment of the present invention extracts meaningful data, for example, data close to false positives or false positives, in a post-processing process using detectors of various devices operating in the field. Here, false positives or false positives mean a case in which an object in a video frame is erroneously detected, that is, recognized or not detected. However, in the embodiment of the present invention, it can be seen that a detection result different from that of the remaining video frames is generated for the same object within a designated time period.

무엇보다 본 발명의 실시예에서는 포지티브(positive) 이벤트나 네거티브(negative) 이벤트라는 용어를 사용할 수 있다. 포지티브 이벤트는 비디오 프레임 내의 객체에 대하여 이진 분류를 수행한다고 가정할 때 참(true)으로 판단한 이벤트이다. 참이라고 판단한 이벤트 구간 중 거짓으로 판단된 프레임이 있다면 미탐으로 가정할 수 있다. 반면 네거티브 이벤트는 이진 분류에서 거짓(false)으로 판단한 이벤트이다. 거짓이라고 판단한 이벤트 구간 중 참으로 판단된 프레임이 있다면 오탐으로 가정한다. 도 2에서 볼 때, (a)는 포지티브 이벤트를 설명하기 위한 도면이며, (b)는 네거티브 이벤트를 설명하기 위한 도면이다. 이와 같이 본 발명의 실시예에 따른 인공지능 서비스장치(120)는 촬영장치(100)의 촬영영상에 대하여 영상 분석을 수행하고, 가령 이의 과정에서 비디오 프레임 내의 객체들을 추출하고, 추출한 객체를 추적하며, 추적 결과를 근거로 사건, 사고 등의 이벤트를 판단할 수 있다. 대표적으로 교통사고를 판단하거나, 쓰레기 무단투척 행위 등을 감지할 수 있을 것이다. 다만, 본 발명의 실시예에서는 다수의 비디오 프레임에 대한 지정 객체의 객체 검출 과정에서 다른 비디오 프레임과 다른 결과를 낼 때를 이벤트로 가정할 수 있다.Above all, in the embodiment of the present invention, the term positive event or negative event may be used. A positive event is an event determined to be true when it is assumed that binary classification is performed on an object in a video frame. If there is a frame determined to be false among the event sections determined to be true, it may be assumed to be false. On the other hand, a negative event is an event determined to be false by binary classification. If there is a frame determined to be true among the event sections determined to be false, it is assumed to be a false positive. Referring to FIG. 2 , (a) is a diagram for explaining a positive event, and (b) is a diagram for explaining a negative event. As described above, the artificial intelligence service device 120 according to the embodiment of the present invention performs image analysis on the captured image of the photographing device 100, for example, extracts objects in a video frame in the process, and tracks the extracted object. , it is possible to judge events such as incidents and accidents based on the tracking results. Typically, it will be able to judge a traffic accident or detect an act of throwing garbage without permission. However, in the embodiment of the present invention, it may be assumed as an event when a result different from other video frames is obtained in the object detection process of a designated object for a plurality of video frames.

인공지능 서비스장치(120)는 인공지능의 딥러닝 모델을 적용할 때, 비지도학습 데이터를 활용하기보다는 지도학습 데이터를 근거로 학습 동작을 수행한다. 따라서, 비디오 프레임의 객체 분석 결과 등을 위해 지도학습 데이터를 활용하는 경우 해당 데이터 풀(pool) 혹은 DB에 없을 때 오탐이 될 수도 있다. 따라서, 오탐이 감지된 비디오 프레임의 객체(예: 오검출 객체) 등에 대하여 데이터 풀을 확장해가면서 재학습시 이용함으로써 정확도를 높여갈 수 있다.The artificial intelligence service device 120 performs a learning operation based on supervised learning data rather than using unsupervised learning data when applying the deep learning model of artificial intelligence. Therefore, when supervised learning data is used for object analysis results of video frames, etc., it may be a false positive when it is not in the corresponding data pool or DB. Accordingly, accuracy can be improved by using the data pool for re-learning while expanding the data pool for an object (eg, a false-detection object) of a video frame in which a false positive is detected.

좀더 구체적으로, 본 발명의 실시예에 따른 인공지능 서비스장치(120)는 도 2에서 볼 수 있는 바와 같이 포지티브 또는 네거티브 방식으로 오탐과 미탐을 판단하기 위하여 시간 변화에 따라 순차적으로 수신되는 각 비디오 프레임에 대하여 각 프레임별 신뢰 점수(confidence score)를 산출한다. 일반적인 머신 러닝에서의 객체 검출은 비디오 프레임(혹은 이미지)를 입력받아 미리 정의된 유형 혹은 부류(class)별 신뢰 점수를 산출한다. 예를 들어, 개, 고양이 등의 부류를 정의하고 개의 이미지를 입력하면 개: 08, 고양이: 0.2 등의 신뢰점수가 산출된다. 개의 확률이 0.8이라는 의미이다. 이와 같은 방식으로 입력되는 모든 비디오 프레임, 더 정확하게는 포지티브나 네거티브 이벤트가 있는 구간의 비디오 프레임에 대하여 신뢰 점수를 산출한다. 포지티브 이벤트 구간은 객체 검출에 있어서 참(true) 가령 모두 개로 판단되어야 할 구간에 거짓으로 판단된 가령 고양이로 판단된 프레임이 있다면 미탐(False Negative)으로 가정하는 것이다. 따라서 네거티브 이벤트 구간은 포지티브 이벤트 구간과 반대의 개념이라고 볼 수 있다. 즉 모두 거짓으로 판단되어야 할 구간에 참으로 판단된 프레임이 있다면 오탐(False Positive)으로 가정하는 것이다.More specifically, as shown in FIG. 2 , the artificial intelligence service device 120 according to the embodiment of the present invention sequentially receives each video frame according to time change in order to determine false positives and false positives in a positive or negative manner. For each frame, a confidence score is calculated. Object detection in general machine learning receives a video frame (or image) and calculates a confidence score for each predefined type or class. For example, if a class such as a dog or a cat is defined and an image of a dog is input, a confidence score of 08 for dog and 0.2 for cat is calculated. This means that the probability of a dog is 0.8. In this way, a confidence score is calculated for every video frame inputted in this way, more precisely, a video frame in a section having positive or negative events. It is assumed that the positive event section is true in object detection, for example, if there is a frame determined to be false, for example, a cat, in a section where all should be determined as dogs, as false negative. Therefore, the negative event period can be regarded as the opposite concept to the positive event period. That is, if there is a frame determined to be true in a section where all should be determined to be false, it is assumed as a false positive.

그리고, 인공지능 서비스장치(120)는 후처리에서 윈도우의 크기 가령 비디오 프레임의 수와 임계치(threshold)를 설정하여 해당 윈도우의 이벤트를 추출한다. 이의 과정에서 움직임 평균(moving average) 등을 이용할 수 있다. 도 2에서 볼 때 임계치를 0.75로 설정한 경우, 특정 비디오 프레임에서 산출된 신뢰 점수가 0.75보다 작은지를 판단하는 것이다. 또는 네거티브의 경우에는 0.75보다 큰지를 판단할 수 있다. 가령, 도 2의 (a)에서와 같이 포지티브하다고 추정되는 이벤트 구간에서 임계치보다 신뢰 점수가 낮게 검출한다면 미탐 확률이 높은 프레임이라 가정한다. 반대로 도 2의 (b)에서와 같이 네거티브하다고 추정되는 이벤트 구간에서 임계치보다 신뢰 점수가 높게 검출된다면 오탐 확률이 높은 프레임이라 가정한다.Then, the artificial intelligence service device 120 extracts the event of the window by setting the size of the window, for example, the number of video frames and the threshold in the post-processing. In this process, a moving average or the like may be used. When the threshold is set to 0.75 as seen in FIG. 2 , it is determined whether the confidence score calculated from a specific video frame is less than 0.75. Alternatively, in the case of a negative, it may be determined whether it is greater than 0.75. For example, if the confidence score is lower than the threshold value in the event section estimated to be positive as in (a) of FIG. 2 , it is assumed that the frame has a high undetected probability. Conversely, if a confidence score higher than the threshold is detected in the event section that is estimated to be negative as in FIG. 2B , it is assumed that the frame has a high false positive probability.

이후 인공지능 서비스장치(120)는 오탐이나 미탐이 감지된 비디오 프레임에 대하여 재학습을 수행하기 위한 동작을 수행한다. 다시 말해 오탐이나 미탐에 해당하는 데이터를 추출하여 저장하고 재학습시키면 검출기의 성능이 향상되는 것이다. 이를 위하여, 도 2에서는 가령 1번 부류(예: 개)의 상황 중 6번째 비디오 프레임(f6)에서 미탐이나 오탐이 발생하였다고 가정하며, 해당 비디오 프레임의 이미지를 기준으로 이전 시간과 이후 시간의 비디오 프레임을 이용하여 재학습을 위한 동작을 수행한다. 이전과 이후의 프레임은 기준 프레임을 근거로 대칭되는 시간 간격에 있는 프레임을 사용하는 것이 바람직하다. 가령, 객체의 주변에 바운더리 박수를 설정한다고 가정할 때, 이전 비디오 프레임(f4)과 이후 비디오 프레임(f8)의 바운더리 박스의 중간값, 가령 좌표값의 중간값을 저장하는 방식으로 이용할 수 있다. 다시 말해 f6의 이미지와 매칭하여 좌표값을 설정해 두어, 이를 근거로 재학습이 이루어지도록 하는 것이다. 가령, 재학습을 통해 해당 오검출 객체와 동일 객체에 대하여 오검출이 있을 때 이의 객체 정보를 정정하거나, 바운더리 박스가 설정된 상태를 변경하는 등의 동작이 이루어질 수 있을 것이다.Thereafter, the artificial intelligence service device 120 performs an operation for performing re-learning on the video frame in which false positives or false positives are detected. In other words, if data corresponding to false positives or false positives is extracted, stored, and re-learned, the performance of the detector is improved. For this purpose, in FIG. 2, for example, it is assumed that false positives or false positives occur in the 6th video frame f6 among situations of class 1 (eg, dogs), and videos of the previous and subsequent times based on the image of the corresponding video frame. An operation for re-learning is performed using the frame. It is preferable to use frames in a time interval symmetrical based on the reference frame for the previous and subsequent frames. For example, assuming that boundary clapping is set around the object, it can be used as a method of storing the median value of the boundary box of the previous video frame f4 and the subsequent video frame f8, for example, the median value of the coordinate values. In other words, the coordinates are set by matching the image of f6, and re-learning is performed based on this. For example, when there is a false detection of the same object as the corresponding false detection object through re-learning, an operation such as correcting object information or changing a state in which a boundary box is set may be performed.

머신 러닝에서의 객체 검출은 분류(classfication)와 지역화(localization)가 된다. 예를 들어 입력 이미지에서 객체가 있는 지역을 찾고(예: 바운더리 박스(BBox)) 그 객체가 개 또는 고양이인지를 판단할 수 있다. 그러므로 개라고 판단되는 이미지 이벤트 구간에서 고양이라고 판단하였다면 해당하는 바운더리 박스도 신뢰할 수 없으므로 그 전과 후의 바운더리 박스의 중간값 즉 바운더리 박스에 대한 좌표값을 저장하여 이용할 수 있다. 비디오 프레임에서 바운더리 박스의 중간값을 이용한다는 것은 주변의 2개의 비디오 프레임에서의 바운더리 박스의 중간 위치에 위치하는 바운더리 박스를 형성한다는 것으로 이해될 수 있다. 이를 통해 오탐이나 미탐이 발생된 비디오 프레임에 대하여도 액티브 러닝을 위한 DB 즉 풀에 저장하여 해당 데이터를 활용함으로써 검출기의 성능을 개선시킬 수 있다.Object detection in machine learning is classified as classification and localization. For example, you can find the area where an object is in the input image (eg, a boundary box (BBox)) and determine whether the object is a dog or a cat. Therefore, if it is determined that it is a cat in the image event section determined to be a dog, the corresponding boundary box is also unreliable, so the intermediate value of the boundary box before and after that, that is, the coordinate value of the boundary box can be stored and used. Using the median value of the boundary box in a video frame may be understood as forming a boundary box located at an intermediate position of the boundary box in two adjacent video frames. Through this, it is possible to improve the performance of the detector by storing the video frames in which false positives or false positives have occurred in the DB for active learning, that is, the pool and utilizing the corresponding data.

머신 러닝에서의 객체 검출은 이미지(혹은 비디오 프레임이나 그 프레임 내의 객체 등), 부류, 바운더리 박스를 학습하여 모델을 생성한다. 따라서 f6의 이미지와 f4, f8에서 판단된 부류와 f4, f8의 바운더리 박스의 중간값을 이용함으로써 현장에서 동작하는 검출기를 이용하여 유의미한 데이터(예: 오탐이나 미탐에 가까운 데이터)를 후처리 과정에서 추출하여 기존과 같이 높은 개발 비용 없이도 지도학습 기반의 딥러닝 모델을 개선할 수 있을 것이다.Object detection in machine learning creates a model by learning an image (or a video frame or an object within that frame), a class, and a boundary box. Therefore, by using the image of f6, the class determined at f4 and f8, and the median value of the boundary box of f4 and f8, meaningful data (eg, data close to false positives or false positives) in the post-processing process using a detector operating in the field By extracting, it will be possible to improve supervised learning-based deep learning models without high development costs as in the past.

물론 본 발명의 실시예에서는 지금까지 인공지능 서비스장치(120)의 동작을 설명하면서 바운더리 박스의 중간값을 이용하는 것을 예시하였지만, 객체 검출 등은 화소 분석 등을 통해 다양한 형태로 이루어질 수 있으므로 위의 내용에 특별히 한정하지는 않을 것이며, 무엇보다 본 발명의 실시예에는 객체 검출 이벤트 구간에서 (기설정된) 임계값 미만의 프레임을 학습에 재사용하는 것이라 볼 수 있다. Of course, in the embodiment of the present invention, the median value of the boundary box has been exemplified while explaining the operation of the artificial intelligence service device 120 so far, but object detection and the like can be made in various forms through pixel analysis, etc. is not particularly limited, and above all, in the embodiment of the present invention, it can be seen that a frame less than a (preset) threshold value is reused for learning in an object detection event section.

도 3은 도 1의 인공지능 서비스장치의 세부구조를 예시한 블록다이어그램이다.3 is a block diagram illustrating a detailed structure of the artificial intelligence service apparatus of FIG. 1 .

도 3에 도시된 바와 같이, 본 발명의 실시예에 따른 도 1의 인공지능 서비스장치(120)는 통신 인터페이스부(300), 제어부(310), 액티브러닝 후처리부(320) 및 저장부(330)의 일부 또는 전부를 포함하며, 디스플레이부를 더 포함할 수도 있다.As shown in FIG. 3 , the artificial intelligence service device 120 of FIG. 1 according to an embodiment of the present invention includes a communication interface unit 300 , a control unit 310 , an active learning post-processing unit 320 , and a storage unit 330 . ) in part or all, and may further include a display unit.

통신 인터페이스부(300)는 도 1의 촬영장치(100)로부터 촬영영상을 수신하여 제어부(310)로 제공한다. 이의 과정에서 통신 인터페이스부(300)는 변/복조, 인코딩/디코딩, 먹싱/디먹싱 등의 동작을 수행할 수 있으며, 이는 당업자에게 자명하므로 더 이상의 설명은 생략하도록 한다.The communication interface unit 300 receives the captured image from the photographing apparatus 100 of FIG. 1 and provides it to the control unit 310 . In this process, the communication interface unit 300 may perform operations such as modulation/demodulation, encoding/decoding, muxing/demuxing, etc., which are obvious to those skilled in the art, and thus further description will be omitted.

제어부(310)는 도 1의 인공지능 서비스장치(120)를 구성하는 도 3의 통신 인터페이스부(300), 액티브러닝 후처리부(320) 및 저장부(330)의 전반적인 제어 동작을 담당한다. 대표적으로 제어부(310)는 통신 인터페이스부(300)에서 제공되는 촬영영상을 저장부(330)에 임시 저장한 후 불러내어 액티브러닝 후처리부(320)로 제공할 수 있다.The control unit 310 is in charge of overall control operations of the communication interface unit 300 of FIG. 3 , the active learning post-processing unit 320 , and the storage unit 330 constituting the artificial intelligence service device 120 of FIG. 1 . Typically, the control unit 310 may temporarily store the captured image provided from the communication interface unit 300 in the storage unit 330 , then call it and provide it to the active learning post-processing unit 320 .

또한, 제어부(310)는 액티브러닝 후처리부(320)의 요청에 따라 후처리되는 지도학습 데이터를 제공받아 도 1의 DB(120a) 등에 체계적으로 분류하여 저장시킬 수 있다. 물론 제어부(310)는 액티브러닝 후처리부(320)에서 제공되는 영상의 분석 결과를 제공받아 도 1의 DB(120a)에 저장시키며, 이의 과정에서 지도학습 데이터를 저장시킬 수 있다.In addition, the control unit 310 may receive the supervised learning data to be post-processed according to the request of the active learning post-processing unit 320 and systematically classify it and store it in the DB 120a of FIG. 1 . Of course, the control unit 310 receives the analysis result of the image provided from the active learning post-processing unit 320 and stores it in the DB 120a of FIG. 1 , and in this process, the supervised learning data may be stored.

액티브러닝 후처리부(320)는 동영상 검출기를 포함할 수 있으며, 지도 학습 기반의 딥러닝 모델을 실행할 수 있다. 검출기를 이용하여 미탐이나 오탐에 가까운 유의미한 데이터를 후처리 과정에서 추출하는 동작을 수행하고, 이를 통해 후처리된 유의미한 데이터에 대한 재학습이 이루어지도록 한다. 이를 통해 내부에 탑재된 검출기의 성능을 개선시킬 수 있다.The active learning post-processing unit 320 may include a video detector and execute a supervised learning-based deep learning model. An operation of extracting meaningful data close to false positives or false positives in the post-processing process is performed using the detector, and through this, re-learning of the post-processed meaningful data is performed. Through this, the performance of the detector mounted therein can be improved.

좀더 구체적으로 액티브러닝 후처리부(320)는 촬영영상의 비디오 프레임이 수신되면 각 프레임별 신뢰점수를 산출한다. 예를 들어, 사전에 개나 고양이와 같은 부류를 정의하고 개의 이미지가 입력되면 0.8, 그리고 고양이의 이미지가 입력되면 0.2의 신뢰점수가 산출된다. 이는 개의 확률이 0.8이라는 의미이다. 자세한 내용은 앞서 도 2를 참조하여 충분히 설명한 바 있다. 후처리에서 윈도우의 크기 즉 프레임의 수와 임계치를 설정하여 해당 윈도우의 이벤트를 추출한다. 여기서, 프레임의 수는 도 2에서 볼 때, 포지티브나 네거티브 구간에 해당하는 비디오 프레임의 수를 의미할 수 있다. 도 2에서는 10개의 비디오 프레임을 보여주고 있다.More specifically, the active learning post-processing unit 320 calculates a confidence score for each frame when a video frame of a captured image is received. For example, if a class such as a dog or a cat is defined in advance, and an image of a dog is input, a confidence score of 0.8 is calculated, and when an image of a cat is input, a confidence score of 0.2 is calculated. This means that the probability of a dog is 0.8. The details have been sufficiently described with reference to FIG. 2 above. In post-processing, the event of the window is extracted by setting the window size, that is, the number of frames and the threshold. Here, the number of frames may refer to the number of video frames corresponding to a positive or negative period as seen in FIG. 2 . 2 shows 10 video frames.

액티브러닝 후처리부(320)는 포지티브하다고 추정되는 이벤트 구간에서 임계치보다 신뢰점수가 낮게 검출되면 미탐 확률이 높은 비디오 프레임으로 가정하고, 반대로 네거티브하다고 추정되는 이벤트 구간에서 임계치보다 신뢰점수가 높게 검출되면 오탐 확률이 높은 비디오 프레임이라 가정한다. 따라서, 객체 검출 이벤트 구간에서 임계값 미만의 비디오 프레임을 학습에 재사용하도록 한다. 이러한 재사용을 위하여 액티브러닝 후처리부(320)는 미탐이나 오탐이 있는 지정 부류의 객체에 대한 이미지와, 또 해당 이미지에 대한 비디오 프레임 내에서의 위치값, 또는 좌표값을 매칭시켜 가령 지도학습 데이터 풀에 저장한 후 이용할 수 있다. The active learning post-processing unit 320 assumes a video frame with a high probability of missing detection when a confidence score is detected lower than the threshold in the event section estimated to be positive, and, conversely, a false positive when the confidence score is higher than the threshold value in the event section estimated to be negative. Assume that it is a video frame with high probability. Therefore, video frames less than the threshold value are reused for learning in the object detection event period. For this reuse, the active learning post-processing unit 320 matches an image for a specified class of objects with false positives or false positives, and a position value or coordinate value within a video frame for the image, for example, a supervised learning data pool. You can use it after saving it to .

통상 머신 러닝에서의 객체 검출은 이미지, 객체의 유형 또는 부류, 바운더리 박스를 학습하여 모델을 생성한다. 여기서, 바운더리 박스는 비디오 프레임 내에서의 객체의 위치 혹은 좌표값을 의미할 수 있다. 따라서, 오탐이나 미탐이 감지되는 객체의 정확한 위치를 판단하기 위하여 해당 미탐이나 오탐이 있는 비디오 프레임의 주변 비디오 프레임을 이용하여 그 중간되는 위치의 좌표값으로 설정하여 사용할 수 있다. 이와 같이 데이터 풀 내에 오탐이나 미탐의 해당 데이터를 추출하여 저장하고 재학습시킴으로써 검출기의 성능을 향상시킬 수 있게 된다.In general, object detection in machine learning creates a model by learning an image, a type or class of an object, and a boundary box. Here, the boundary box may mean a position or coordinate value of an object within a video frame. Accordingly, in order to determine the correct position of the false positive or the false positive detected object, the video frame surrounding the corresponding false positive or false positive may be used by setting the coordinate value of the intermediate position using the video frame surrounding the false positive or false positive. In this way, it is possible to improve the performance of the detector by extracting, storing, and re-learning the corresponding data of false positives or false positives in the data pool.

저장부(330)는 제어부(310)의 제어하에 처리되는 다양한 데이터를 저장할 수 있으며, 가령 촬영영상을 임시 저장한 후 출력해 줄 수 있다. 또한, 저장부(330)는 액티브러닝 후처리부(320)에서 분석되어 제공되는 지도학습 데이터를 임시 저장한 후 출력하여 도 1의 DB(120a)에 제공되도록 할 수 있다.The storage unit 330 may store various data processed under the control of the control unit 310 , for example, may temporarily store a captured image and then output it. Also, the storage unit 330 may temporarily store the supervised learning data analyzed and provided by the active learning post-processing unit 320 , and then output it to be provided to the DB 120a of FIG. 1 .

상기한 내용 이외에도 도 3의 통신 인터페이스부(300), 제어부(310), 액티브러닝 후처리부(320) 및 저장부(330)와 관련한 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the above, detailed information regarding the communication interface unit 300, the control unit 310, the active learning post-processing unit 320, and the storage unit 330 of FIG.

한편, 본 발명의 다른 실시예로서 제어부(310)는 CPU 및 메모리를 포함할 수 있으며, 원칩화하여 형성될 수 있다. CPU는 제어회로, 연산부(ALU), 명령어해석부 및 레지스트리 등을 포함하며, 메모리는 램을 포함할 수 있다. 제어회로는 제어동작을, 그리고 연산부는 2진비트정보의 연산동작을, 그리고 명령어해석부는 인터프리터나 컴파일러 등을 포함하여 고급언어를 기계어로, 또 기계어를 고급언어로 변환하는 동작을 수행할 수 있으며, 레지스트리는 소프트웨어적인 데이터 저장에 관여할 수 있다. 상기의 구성에 따라, 가령 인공지능 서비스장치(120)의 동작 초기에 액티브러닝 후처리부(320)에 저장되어 있는 프로그램을 복사하여 메모리 즉 램(RAM)에 로딩한 후 이를 실행시킴으로써 데이터 연산 처리 속도를 빠르게 증가시킬 수 있다.On the other hand, as another embodiment of the present invention, the control unit 310 may include a CPU and a memory, and may be formed as a single chip. The CPU includes a control circuit, an arithmetic unit (ALU), an instruction interpreter and a registry, and the memory may include a RAM. The control circuit performs a control operation, the operation unit performs an operation operation of binary bit information, and the instruction interpreter performs an operation of converting a high-level language into a machine language and a machine language into a high-level language, including an interpreter or a compiler. , the registry may be involved in software data storage. According to the above configuration, for example, at the beginning of the operation of the artificial intelligence service device 120, the program stored in the active learning post-processing unit 320 is copied, loaded into a memory, that is, RAM, and then executed, thereby speeding up data operation processing. can be increased quickly.

도 4는 도 1의 인공지능 서비스장치의 구동과정을 나타내는 흐름도이다.4 is a flowchart illustrating a driving process of the artificial intelligence service apparatus of FIG. 1 .

설명의 편의상 도 4를 도 1과 함께 참조하면, 본 발명의 실시예에 따른 인공지능 서비스장치(120)는 촬영영상의 비디오 프레임을 순차적으로 수신한다(S400). 여기서, 순차적으로 수신한다는 것은 가령 초당 60장의 비디오 프레임이 수신된다고 가정할 때 단위 비디오 프레임당 1/60초에 수신된다는 것이며, 이와 같이 시간 변화에 따라 연속적으로 수신되는 비디오 프레임을 의미한다고 볼 수 있다.Referring to FIG. 4 together with FIG. 1 for convenience of explanation, the artificial intelligence service device 120 according to an embodiment of the present invention sequentially receives video frames of a captured image (S400). Here, sequential reception means that, for example, if it is assumed that 60 video frames per second are received, it is received at 1/60 of a unit video frame, and it can be seen that video frames are continuously received according to time change. .

가령, 도 2에서 볼 때, f1이 최초에 수신되는 비디오 프레임이라면 f10은 10/60초 후에 수신되는 비디오 프레임이라 이해될 수 있다. 물론, 이러한 수신되는 비디오 프레임은 인공지능 서비스장치(120)의 동작 주파수에 따라 결정되는 것이므로 어느 하나의 형태에 특별히 한정하지는 않을 것이다.For example, referring to FIG. 2 , if f1 is a video frame received initially, it may be understood that f10 is a video frame received after 10/60 seconds. Of course, since the received video frame is determined according to the operating frequency of the artificial intelligence service device 120, it will not be specifically limited to any one form.

또한, 인공지능 서비스장치(120)는 수신한 각 비디오 프레임 내의 지정 객체와 관련해 산출되는 신뢰점수를 근거로 객체 검출 오류의 이벤트 구간을 설정하고, 설정한 이벤트 구간 내에서 신뢰점수가 임계치를 벗어나는 비디오 프레임의 오검출 객체에 대하여 학습시 재사용이 이루어지도록 한다(S410).In addition, the artificial intelligence service device 120 sets the event interval of the object detection error based on the confidence score calculated in relation to the specified object in each received video frame, and the video in which the confidence score deviates from the threshold within the set event interval. The erroneous detection object of the frame is reused when learning (S410).

수신한 각 비디오 프레임에서 추출하는 동일 객체(예: 고양이, 또는 개 등)에 대하여 검출 오류가 발생할 때 검출 오류가 발생한 비디오 프레임 내의 주변 비디오 프레임을 근거로 검출 오류가 있는 비디오 프레임 내의 객체를 학습시 재사용한다. 물론 여기서 검출 오류는 포지티브 이벤트 구간에서는 임계치보다 신뢰점수가 낮게 검출되는 프레임을 나타낼 수 있고, 네거티브 이벤트 구간에서는 임계치보다 신뢰점수가 높게 검출되는 비디오 프레임을 나타낼 수 있다.When a detection error occurs for the same object (eg, cat, or dog) extracted from each received video frame When learning an object in a video frame with a detection error based on surrounding video frames within the video frame where the detection error occurs Reuse. Of course, the detection error may indicate a frame in which a confidence score is detected to be lower than a threshold value in a positive event interval, and may indicate a video frame in which a confidence score is detected higher than a threshold value in a negative event interval.

따라서, 오탐이나 미탐이 감지된 비디오 프레임에 대하여도 학습이 이루어지도록 함으로써 검출기의 성능을 향상시킬 수 있게 된다.Accordingly, it is possible to improve the performance of the detector by allowing learning to be performed even on a video frame in which false positives or false positives are detected.

상기한 내용 이외에도 도 1의 인공지능 서비스장치(120)는 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the above, the artificial intelligence service device 120 of FIG. 1 may perform various operations, and since other detailed information has been sufficiently described above, it will be replaced with the contents.

한편, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합하거나 결합하여 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 비일시적 저장매체(non-transitory computer readable media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시 예를 구현할 수 있다.On the other hand, even though it has been described that all components constituting the embodiment of the present invention are combined or operated as one, the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining one or more. In addition, although all of the components may be implemented as one independent hardware, some or all of the components are selectively combined to perform some or all functions of the combined components in one or a plurality of hardware program modules It may be implemented as a computer program having Codes and code segments constituting the computer program can be easily deduced by those skilled in the art of the present invention. Such a computer program is stored in a computer-readable non-transitory computer readable media, read and executed by the computer, thereby implementing an embodiment of the present invention.

여기서 비일시적 판독 가능 기록매체란, 레지스터, 캐시(cache), 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라, 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로, 상술한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리 카드, ROM 등과 같은 비일시적 판독가능 기록매체에 저장되어 제공될 수 있다.Here, the non-transitory readable recording medium refers to a medium that stores data semi-permanently and can be read by a device, not a medium that stores data for a short moment, such as a register, cache, memory, etc. . Specifically, the above-described programs may be provided by being stored in a non-transitory readable recording medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.In the above, preferred embodiments of the present invention have been illustrated and described, but the present invention is not limited to the specific embodiments described above, and it is common in the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims. Various modifications may be made by those having the knowledge of, of course, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

100: 촬영장치 110: 통신망
120: 인공지능 서비스장치 300: 통신 인터페이스부
310: 제어부 320: 액티브러닝 후처리부
330: 저장부100: photographing device 110: communication network
120: artificial intelligence service device 300: communication interface unit
310: control unit 320: active learning post-processing unit
330: storage

Claims

a communication interface unit for sequentially receiving video frames of captured images; and
Based on a confidence score for each frame calculated for each video frame sequentially received according to time change with respect to a specified object in each received video frame, an object detection error that falsely detects or does not detect an object (false) ), a control unit that sets an event section indicating a temporal section, and reuses the false positive or the non-detected false detection object of a video frame in which the confidence score exceeds a threshold within the set event interval when learning; including but
The control unit is an artificial intelligence service device for determining whether there is an object detection error by comparing the sequentially received video frames with neighboring video frames to set the event period.

According to claim 1,
The control unit, an artificial intelligence service device for determining the number of video frames for image analysis in order to set the event section.

delete

According to claim 1,
The control unit stores the image of the false detection object as supervised learning data and then reuses it during learning.

5. The method of claim 4,
The control unit further stores a coordinate value in a video frame for the erroneous detection object by matching the image, and the coordinate value is calculated based on the coordinate value of a specified object corresponding to the erroneous detection object in a surrounding video frame artificial intelligence service.

6. The method of claim 5,
The control unit extracts the coordinate values of the specified object from video frames of a previous time and a subsequent time based on the video frame of the erroneous detection object, calculates a median value of the extracted coordinate values, An artificial intelligence service device that calculates coordinate values.

6. The method of claim 5,
The control unit is an artificial intelligence service device for correcting object information for the same object of the false detection object or changing the state of a boundary box around the object according to reuse when learning the false detection object.

receiving, by the communication interface unit, video frames of the captured image sequentially; and
The control unit detects an object that falsely detects or does not detect an object based on a confidence score for each frame calculated for each video frame sequentially received according to time change with respect to a designated object in each of the received video frames Set an event section indicating a temporal section of error, and reuse the false positive or the false false detection object of a video frame in which the confidence score exceeds a threshold within the set event interval when learning step; including,
determining, by the controller, whether there is an object detection error by comparing the sequentially received video frames with neighboring video frames to set the event period;
The driving method of the artificial intelligence service device further comprising.