KR102274913B1

KR102274913B1 - Apparatus for Bounding Box Redundancy Removal and Driving Method Thereof

Info

Publication number: KR102274913B1
Application number: KR1020210040882A
Authority: KR
Inventors: 곽노준; 유재영; 나종근
Original assignee: 서울대학교 산학협력단; 주식회사 스누아이랩
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-07-08

Abstract

The present invention relates to a bounding box redundancy removal apparatus and a driving method thereof. A bounding box redundancy removal apparatus according to an embodiment of the present invention may include a data receiving part for receiving object data for objects detected from the video frame of a photographed image; and a bounding box redundancy removal part for setting a bounding box around the detected objects using the received object data, and performing redundancy removal of the bounding box set in the same object based on a redundancy range automatically set based on an artificial neural network. The bounding box redundancy removal apparatus automatically sets an appropriate redundancy range for each bounding box based on an artificial neural network without the need to set a redundancy threshold directly in advance, and is more suitable for parallel processing.

Description

Apparatus for Bounding Box Redundancy Removal and Driving Method Thereof

본 발명은 경계박스 중복제거장치 및 그 장치의 구동방법에 관한 것으로서, 더 상세하게는 가령 사전에 직접 중복 임계치를 설정할 필요없이 인공신경망을 기반으로 하여 각각의 경계박스에 적절한 중복제거 범위가 자동으로 설정되고, 병렬처리에 보다 더 적합한 중복제거방법을 제공하는 경계박스 중복제거장치 및 그 장치의 구동방법에 관한 것이다.The present invention relates to an apparatus for deduplication of a bounding box and a method of driving the apparatus, and more particularly, an appropriate deduplication range for each bounding box is automatically determined based on an artificial neural network without, for example, directly setting a redundancy threshold in advance. It is set and relates to a bounding box deduplication apparatus that provides a deduplication method more suitable for parallel processing, and a method of driving the apparatus.

일반적으로, 객체검출 시에 경계박스(혹은 바운딩박스)의 중복을 제거하기 위해서는 Greedy NMS(Greedy Non-Maximum Suppression)라 불리는 알고리즘이 사용된다. Greedy NMS는 경계박스 중에 가장 높은 객체점수(Objectiveness Score)를 갖는 경계박스를 기준으로 사전에 설정한 중복 임계치(IoU: Intersection over Union Threshold) 이상 중복된 경계박스를 제거하는 방식으로 작동한다.In general, an algorithm called Greedy Non-Maximum Suppression (Greedy NMS) is used to remove duplicates of bounding boxes (or bounding boxes) during object detection. Greedy NMS works by removing the overlapping bounding box over a preset overlapping threshold (IoU: Intersection over Union Threshold) based on the bounding box with the highest Objectiveness Score among the bounding boxes.

그런데 이러한 Greedy NMS는 정확한 결과를 얻기 위해선 사전에 적절한 중복 임계치를 찾아서 설정해야 하는 번거로움이 있으며, 또한 각 경계박스에 대해서 중복제거가 순차적으로 수행되기 때문에 병렬처리에 적합하지 않은 문제가 있다.However, such Greedy NMS is cumbersome to find and set an appropriate redundancy threshold in advance to obtain an accurate result, and also has a problem in that it is not suitable for parallel processing because deduplication is sequentially performed for each bounding box.

또한, 종래의 Greedy NMS는 중복정도를 판단하는 IoU 임계치가 각 박스에 필요한 중복제거 정도와 상관없이 모든 박스에 일괄적으로 적용되므로, 경계박스의 중복제거시 정확도도 떨어지는 문제가 있다. 이로 인해 가령 제거되지 말아야 할 경계박스가 제거되는 문제가 발생할 수 있다.Also, in the conventional Greedy NMS, since the IoU threshold for determining the degree of redundancy is applied to all boxes at once regardless of the degree of deduplication required for each box, there is a problem in that the accuracy of deduplication of the bounding box is also reduced. This may cause, for example, a problem in which a bounding box that should not be removed is removed.

한국등록특허공보 제10-1469099호(2014.11.28)Korean Patent Publication No. 10-1469099 (2014.11.28) 한국등록특허공보 제10-1982942호(2019.05.21)Korean Patent Publication No. 10-1982942 (2019.05.21) 한국등록특허공보 제10-2217003호(2021.02.10)Korean Patent Publication No. 10-2217003 (2021.02.10)

웹사이트 https://cool24151.tistory.com/36Website https://cool24151.tistory.com/36 웹사이트 https://blog.naver.com/dr_moms/221649538493Website https://blog.naver.com/dr_moms/221649538493

본 발명의 실시예는 가령 사전에 직접 중복 임계치를 설정할 필요없이 인공신경망을 기반으로 하여 각각의 경계박스에 적절한 중복제거 범위가 자동으로 설정되고, 병렬처리에 보다 더 적합한 중복제거방법을 제공하는 경계박스 중복제거장치 및 그 장치의 구동방법을 제공함에 그 목적이 있다.In an embodiment of the present invention, for example, an appropriate deduplication range is automatically set for each bounding box based on an artificial neural network without the need to directly set a duplication threshold in advance, and a boundary providing a deduplication method more suitable for parallel processing An object of the present invention is to provide a box deduplication device and a method of driving the device.

본 발명의 실시예에 따른 경계박스 중복제거장치는, 촬영영상의 비디오 프레임으로부터 검출되는 객체(object)들에 대한 객체 데이터를 수신하는 데이터 수신부, 및 상기 수신한 객체 데이터를 이용하여 상기 검출한 객체들의 주변으로 경계박스(bounding box)를 설정하며, 인공신경망(artificial neural network)에 기반하여 자동 설정되는 중복제거 범위를 근거로 동일 객체에 설정되는 상기 경계박스의 중복제거를 수행하는 경계박스 중복제거부를 포함한다.A bounding box deduplication apparatus according to an embodiment of the present invention includes a data receiver configured to receive object data for objects detected from a video frame of a captured image, and the object detected using the received object data. Deduplication of a bounding box that sets a bounding box around the perimeters and performs deduplication of the bounding box set in the same object based on a deduplication range automatically set based on an artificial neural network includes wealth.

상기 경계박스 중복제거부는, 서로 다른 객체에 대하여 상기 중복제거 범위를 각각 자동 설정할 수 있다.The bounding box deduplication unit may automatically set the deduplication range for different objects, respectively.

상기 경계박스 중복제거부는, 상기 서로 다른 객체에 대하여 병렬 처리 방식으로 상기 경계박스의 중복제거를 수행할 수 있다.The bounding box deduplication unit may perform deduplication of the bounding box in a parallel processing method for the different objects.

상기 경계박스 중복제거부는, 상기 객체 데이터에 포함되는 상기 경계박스에 대한 신뢰도 스코어(Confidence score)를 이용해 최대 스코어를 계산하며, 상기 계산한 최대 스코어를 학습하여 학습 결과를 근거로 상기 인공신경망에 기반한 중복제거 범위를 자동으로 설정할 수 있다.The bounding box deduplication unit calculates a maximum score using a confidence score for the bounding box included in the object data, and learns the calculated maximum score based on the artificial neural network based on the learning result. You can set the deduplication range automatically.

상기 경계박스 중복제거부는, 상기 경계박스, 상기 경계박스의 신뢰도 점수, 및 복수의 중복제거 정도(IoU)를 근거로 각 IoU에 대한 2진 비트 형태의 중복제거 마스크(mask) 데이터를 생성하고, 상기 생성한 마스크 데이터에 상기 객체의 특징에 따른 가중치를 반영해 가중합(weighted summation)하여 상기 최대 스코어를 계산할 수 있다.The bounding box deduplication unit generates binary bit-type deduplication mask data for each IoU based on the bounding box, the reliability score of the bounding box, and a plurality of deduplication degrees (IoU), The maximum score may be calculated by weighted summation of the generated mask data by reflecting a weight according to the characteristic of the object.

상기 경계박스 중복제거부는, 상기 경계박스의 개수에 따른 매트릭스(matrix)를 구한 후 IoU 임계치에 따라 인접한 박스와 비인접한 박스로 상기 경계박스를 이진화하며, 상기 인접한 박스 중 가장 스코어가 높은 박스에 대하여 1을 출력하고, 아니면 0을 출력하여 상기 중복제거 마스크 데이터를 생성할 수 있다.The bounding box deduplication unit, after obtaining a matrix according to the number of bounding boxes, binarizes the bounding box into adjacent boxes and non-adjacent boxes according to the IoU threshold, and for the box with the highest score among the adjacent boxes The deduplication mask data may be generated by outputting 1 or outputting 0 otherwise.

또한, 본 발명의 실시예에 따른 경계박스 중복제거장치의 구동방법은, 데이터 수신부가, 촬영영상의 비디오 프레임으로부터 검출되는 객체들에 대한 객체 데이터를 수신하는 단계, 및 경계박스 중복제거부가, 상기 수신한 객체 데이터를 이용하여 상기 검출한 객체들의 주변으로 경계박스를 설정하며, 인공신경망에 기반하여 자동 설정되는 중복제거 범위를 근거로 동일 객체에 설정되는 상기 경계박스의 중복제거를 수행하는 단계를 포함한다.In addition, the method of driving a bounding box deduplication apparatus according to an embodiment of the present invention includes the steps of: receiving, by a data receiving unit, object data for objects detected from a video frame of a photographed image; and a bounding box deduplication unit, the above Setting a bounding box around the detected objects using the received object data, and performing deduplication of the bounding box set in the same object based on a deduplication range automatically set based on an artificial neural network. include

상기 중복제거를 수행하는 단계는, 서로 다른 유형의 객체에 대하여 상기 중복제거 범위를 각각 자동 설정할 수 있다.The performing of the deduplication may include automatically setting the deduplication ranges for different types of objects, respectively.

상기 중복제거를 수행하는 단계는, 상기 서로 다른 유형의 객체에 대하여 병렬 처리 방식으로 상기 경계박스의 중복제거를 수행할 수 있다.In the performing of the deduplication, the deduplication of the bounding box may be performed in a parallel processing method for the different types of objects.

상기 중복제거를 수행하는 단계는, 상기 객체 데이터에 포함되는 상기 경계박스에 대한 신뢰도 스코어를 이용해 최대 스코어를 계산하는 단계, 및 상기 계산한 최대 스코어를 학습하여 학습 결과를 근거로 상기 인공신경망에 기반한 중복제거 범위를 자동으로 설정하는 단계를 포함할 수 있다.The performing of the deduplication may include calculating a maximum score using a confidence score for the bounding box included in the object data, and learning the calculated maximum score based on the learning result based on the artificial neural network. It may include the step of automatically setting the deduplication range.

상기 중복제거를 수행하는 단계는, 상기 경계박스, 상기 경계박스의 신뢰도 점수, 및 복수의 중복제거 정도(IoU)를 근거로 각 IoU에 대한 2진 비트 형태의 중복제거 마스크 데이터를 생성하는 단계, 및 상기 생성한 마스크 데이터에 상기 객체의 특징에 따른 가중치를 반영해 가중합하여 상기 최대 스코어를 계산하는 단계를 포함할 수 있다.The performing of the deduplication includes: generating deduplication mask data in the form of binary bits for each IoU based on the bounding box, a reliability score of the bounding box, and a plurality of deduplication degrees (IoU); and calculating the maximum score by weighted summing the generated mask data by reflecting a weight according to the characteristic of the object.

상기 각 IoU에 대한 중복제거 마스크를 생성하는 단계는, 상기 경계박스의 개수에 따른 매트릭스를 구한 후 IoU 임계치에 따라 인접한 박스와 비인접한 박스로 상기 경계박스를 이진화하는 단계, 및 상기 인접한 박스 중 가장 스코어가 높은 박스에 대하여 1을 출력하고, 아니면 0을 출력하여 상기 중복제거 마스크 데이터를 생성하는 단계를 포함할 수 있다.The generating of the deduplication mask for each IoU includes: after obtaining a matrix according to the number of the bounding boxes, binarizing the bounding box into adjacent boxes and non-adjacent boxes according to an IoU threshold; It may include outputting 1 for a box having a high score, otherwise outputting 0 to generate the deduplication mask data.

본 발명의 실시예에 따르면 중복제거를 위한 가령 인공신경망 모듈을 통해 각각의 경계박스에 적절한 중복제거 범위를 자동으로 설정할 수 있을 것이다.According to an embodiment of the present invention, an appropriate deduplication range may be automatically set for each bounding box through, for example, an artificial neural network module for deduplication.

또한, 본 발명의 실시예는 중복제거 범위의 자동 설정에 따라 일반적인 검출 결과의 평가방법에서 더 높은 검출 성능을 제공할 수 있을 것이다.In addition, the embodiment of the present invention may provide higher detection performance in a general method for evaluating a detection result according to automatic setting of a deduplication range.

나아가, 본 발명의 실시예는 모든 경계박스에 대한 중복제거 과정을 병렬적으로 처리할 수 있을 것이다.Furthermore, the embodiment of the present invention will be able to process the deduplication process for all bounding boxes in parallel.

도 1은 본 발명의 실시예에 따른 영상처리시스템을 예시한 도면,
도 2는 도 1의 영상처리장치의 세부구조를 예시한 블록다이어그램,
도 3은 도 1의 영상처리장치의 다른 세부구조를 예시한 블록다이어그램,
도 4는 도 3의 LMM의 세부구조 및 동작을 설명하기 위한 도면,
도 5는 도 4의 마스크생성기의 작동원리를 설명하기 위한 도면,
도 6은 도 1의 영상처리장치의 또다른 세부구조를 예시한 블록다이어그램, 그리고
도 7은 본 발명의 실시예에 따른 경계박스 중복제거장치의 구동과정을 나타내는 흐름도이다.1 is a diagram illustrating an image processing system according to an embodiment of the present invention;
2 is a block diagram illustrating a detailed structure of the image processing apparatus of FIG. 1;
3 is a block diagram illustrating another detailed structure of the image processing apparatus of FIG. 1;
4 is a view for explaining the detailed structure and operation of the LMM of FIG. 3;
5 is a view for explaining the operating principle of the mask generator of FIG. 4;
6 is a block diagram illustrating another detailed structure of the image processing apparatus of FIG. 1, and
7 is a flowchart illustrating a driving process of a bounding box deduplication device according to an embodiment of the present invention.

이하, 도면을 참조하여 본 발명의 실시예에 대하여 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 영상처리시스템을 예시한 도면이다.1 is a diagram illustrating an image processing system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 영상처리시스템(90)은 촬영장치(100), 통신망(110), 영상처리장치(120) 및 서드파티장치(130)의 일부 또는 전부를 포함한다.As shown in FIG. 1 , the image processing system 90 according to the embodiment of the present invention includes some or all of the photographing apparatus 100 , the communication network 110 , the image processing apparatus 120 , and the third-party apparatus 130 . includes

여기서, "일부 또는 전부를 포함한다"는 것은 서드파티장치(130)와 같은 일부 구성요소가 생략되어 영상처리시스템(90)이 구성되거나, 영상처리장치(120)를 구성하는 구성요소의 일부 또는 전부가 통신망(110)을 구성하는 네트워크장치(예: 무선교환장치 등)에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, “including some or all” means that some components such as the third-party device 130 are omitted to configure the image processing system 90 , or a part of the components constituting the image processing device 120 or It means that all of them can be configured by being integrated into a network device (eg, a wireless switching device, etc.) constituting the communication network 110, and it will be described as including all in order to help a sufficient understanding of the invention.

촬영장치(100)는 가령 감시카메라로서 일반 CCTV(Closed Circuit Television) 카메라나 IP(Internet Protocol) 카메라 등을 포함한다. 또한, 촬영장치(100)는 고정식 카메라뿐 아니라 팬(Pan), 틸트(Tilt) 및 줌(Zoom) 동작이 가능한 PTZ(Pan-Tilt-Zoom) 카메라를 포함할 수 있다. 촬영장치(100)는 사회 안전(Social Safety), 범죄 예방(Crime Prevention), 사회 문제(suicide issue) 및 공공 감시(Public Surveilance)를 위하여 다양한 장소에 설치되어 촬영영상을 제공할 수 있다. 예를 들어, 촬영장치(100)는 지하철이나 버스정류장 등 공공장소에 설치되어 사건, 사고 등 다양한 상황을 감시할 수 있으며, 다리의 난간이나 외진 곳에서 발생하는 폭력 행위 등도 감시하도록 할 수 있다. 나아가, 어린이집 등에 설치되어 있는 CCTV를 통해서도 감시가 이루어지도록 할 수 있을 것이다.The photographing apparatus 100 includes, for example, a general CCTV (Closed Circuit Television) camera or an IP (Internet Protocol) camera as a surveillance camera. In addition, the photographing apparatus 100 may include a Pan-Tilt-Zoom (PTZ) camera capable of pan, tilt, and zoom operations as well as a fixed camera. The photographing apparatus 100 may be installed in various places to provide photographed images for social safety, crime prevention, suicide issue, and public surveillance. For example, the photographing device 100 may be installed in a public place such as a subway or bus stop to monitor various situations such as incidents and accidents, and may also monitor acts of violence occurring on a railing of a bridge or in a remote place. Furthermore, it will be possible to monitor through CCTV installed in daycare centers and the like.

통신망(110)은 유무선 통신망을 모두 포함한다. 가령 통신망(110)으로서 유무선 인터넷망이 이용되거나 연동될 수 있다. 여기서, 유선망은 케이블망이나 공중 전화망(PSTN)과 같은 인터넷망을 포함하는 것이고, 무선 통신망은 CDMA, WCDMA, GSM, EPC(Evolved Packet Core), LTE(Long Term Evolution), 와이브로(Wibro) 망 등을 포함하는 의미이다. 물론 본 발명의 실시예에 따른 통신망(110)은 이에 한정되는 것이 아니며, 가령 클라우드 컴퓨팅 환경하의 클라우드 컴퓨팅망, 5G망 등에 사용될 수 있다. 가령, 통신망(110)이 유선 통신망인 경우 통신망(110) 내의 액세스포인트는 전화국의 교환국 등에 접속할 수 있지만, 무선 통신망인 경우에는 통신사에서 운용하는 SGSN 또는 GGSN(Gateway GPRS Support Node)에 접속하여 데이터를 처리하거나, BTS(Base Transmissive Station), NodeB, e-NodeB 등의 다양한 중계기에 접속하여 데이터를 처리할 수 있다.The communication network 110 includes both wired and wireless communication networks. For example, a wired/wireless Internet network may be used or interlocked as the communication network 110 . Here, the wired network includes an Internet network such as a cable network or a public telephone network (PSTN), and the wireless communication network includes CDMA, WCDMA, GSM, Evolved Packet Core (EPC), Long Term Evolution (LTE), Wibro network, etc. is meant to include Of course, the communication network 110 according to the embodiment of the present invention is not limited thereto, and may be used, for example, in a cloud computing network under a cloud computing environment, a 5G network, and the like. For example, when the communication network 110 is a wired communication network, the access point in the communication network 110 can connect to a switching center of a telephone company, etc., but in the case of a wireless communication network, it accesses the SGSN or GGSN (Gateway GPRS Support Node) operated by the communication company to transmit data. or by accessing various repeaters such as a Base Transmissive Station (BTS), NodeB, and e-NodeB to process data.

통신망(110)은 액세스포인트(AP)를 포함할 수 있다. 여기서의 액세스포인트는 건물 내에 많이 설치되는 펨토(femto) 또는 피코(pico) 기지국과 같은 소형 기지국을 포함한다. 펨토 또는 피코 기지국은 소형 기지국의 분류상 촬영장치(100) 등을 최대 몇 대까지 접속할 수 있느냐에 따라 구분된다. 물론 액세스포인트는 촬영장치(100) 등과 지그비 및 와이파이 등의 근거리 통신을 수행하기 위한 근거리 통신모듈을 포함할 수 있다. 액세스포인트는 무선통신을 위하여 TCP/IP 혹은 RTSP(Real-Time Streaming Protocol)를 이용할 수 있다. 여기서, 근거리 통신은 와이파이 이외에 블루투스, 지그비, 적외선, UHF(Ultra High Frequency) 및 VHF(Very High Frequency)와 같은 RF(Radio Frequency) 및 초광대역 통신(UWB) 등의 다양한 규격으로 수행될 수 있다. 이에 따라 액세스포인트는 데이터 패킷의 위치를 추출하고, 추출된 위치에 대한 최상의 통신 경로를 지정하며, 지정된 통신 경로를 따라 데이터 패킷을 다음 장치, 예컨대 영상처리장치(120) 등으로 전달할 수 있다. 액세스포인트는 일반적인 네트워크 환경에서 여러 회선을 공유할 수 있으며, 예컨대 라우터(router), 리피터(repeater) 및 중계기 등이 포함된다.The communication network 110 may include an access point (AP). Here, the access point includes a small base station such as a femto or pico base station that is often installed in a building. Femto or pico base stations are classified according to the maximum number of access to the imaging device 100, etc. in the classification of small base stations. Of course, the access point may include a short-distance communication module for performing short-distance communication such as Zigbee and Wi-Fi with the photographing device 100 and the like. The access point may use TCP/IP or Real-Time Streaming Protocol (RTSP) for wireless communication. Here, short-range communication may be performed in various standards such as Bluetooth, Zigbee, infrared, radio frequency (RF) such as ultra high frequency (UHF) and very high frequency (VHF), and ultra wideband communication (UWB) in addition to Wi-Fi. Accordingly, the access point may extract the location of the data packet, designate the best communication path for the extracted location, and transmit the data packet to the next device, for example, the image processing apparatus 120 along the designated communication path. The access point may share several lines in a general network environment, and includes, for example, a router, a repeater, and a repeater.

영상처리장치(120)는 본 발명의 실시예에 따라 (인공신경망 기반의) 경계박스 중복제거장치라 명명되거나 경계박스 중복제거장치를 포함할 수 있다. 영상처리장치(120)는 촬영장치(100)의 촬영영상을 분석하는 영상분석장치, 분석결과를 이용해 관제가 이루어지도록 하는 관제장치 등의 역할을 수행할 수 있다. 영상처리장치(120)는 가령 서버로서 동작할 수 있으며, 인공신경망 가령 DCNN(Deep Convolutional Neural Network) 기반의 객체 검출, 검출한 객체에 경계박스 매칭, 그리고 동일 유형의 객체에 중복되는 경계박스를 제거하는 등의 동작을 수행할 수 있다. 예를 들어, 영상처리장치(120)는 촬영영상의 비디오 프레임으로부터 객체 검출 DCNN을 이용하여, 기학습된 종류의 객체들을 검출한다. 객체 검출 결과(혹은 객체 데이터)로는 검출된 객체의 경계박스 좌표 값과 객체 종류(또는 객체 클래스) 값을 포함한다. 사용하는 객체 검출 DCNN의 연산은, 실시간 처리를 위해 통상적으로 CPU가 아닌 대규모 병렬 연산이 가능한 GPU, DSP, VPU, NPU 등에서 처리될 수 있다. 지금까지 다양한 형태의 객체 검출 DCNN 모델들이 제안되었으나 대표적인 객체 검출 DCNN 모델로 Fast/Faster R-CNN, SSD, YOLO 시리즈 등을 들 수 있다.The image processing apparatus 120 may be named a (artificial neural network-based) bounding box deduplication apparatus or include a bounding box deduplication apparatus according to an embodiment of the present invention. The image processing apparatus 120 may serve as an image analysis apparatus for analyzing the captured image of the photographing apparatus 100 , a control apparatus for performing control using the analysis result, and the like. The image processing apparatus 120 may operate as a server, for example, detecting objects based on an artificial neural network, such as a deep convolutional neural network (DCNN), matching a bounding box to the detected object, and removing a bounding box overlapping with an object of the same type. and so on. For example, the image processing apparatus 120 detects pre-learned types of objects by using object detection DCNN from a video frame of a captured image. The object detection result (or object data) includes the bounding box coordinate value of the detected object and the object type (or object class) value. The operation of the object detection DCNN to be used may be processed in a GPU, DSP, VPU, NPU, etc. capable of large-scale parallel operation instead of a conventional CPU for real-time processing. Various types of object detection DCNN models have been proposed so far, but examples of representative object detection DCNN models include Fast/Faster R-CNN, SSD, and YOLO series.

또한, 영상처리장치(120)는 획득한(혹은 생성한) 객체 검출 결과(예: 검출 객체의 경계박스 정보 등)와 이전 프레임의 객체 추적 결과(예: 기존 추적 객체의 경계박스 정보 등)를 입력으로 받아, 경계박스 매칭 기반으로 현재 프레임에서의 객체 추적 결과를 얻는다. 이전 프레임까지의 객체 추적 결과와 현재 비디오 프레임 데이터를 입력으로 받아, 가령 템플릿 매칭 기반으로 현재 프레임에서의 객체 추적 결과를 얻을 수 있다. 물론 영상처리장치(120)는 비디오프레임에서 추출한 객체에 대하여 경계박스를 매칭하고, 또 객체 추적을 위하여 경계박스의 중복제거를 수행할 수 있다.In addition, the image processing apparatus 120 collects the acquired (or generated) object detection result (eg, bounding box information of the detection object, etc.) and the object tracking result of the previous frame (eg, bounding box information of the existing tracking object, etc.) Received as input and based on bounding box matching, we get the object tracking result in the current frame. By receiving the object tracking result up to the previous frame and the current video frame data as inputs, it is possible to obtain the object tracking result in the current frame based on template matching, for example. Of course, the image processing apparatus 120 may match the bounding box with respect to the object extracted from the video frame, and may perform deduplication of the bounding box in order to track the object.

좀더 구체적으로 본 발명의 실시예에 따른 영상처리장치(120)는 경계박스의 중복제거를 위한 인공신경망 모듈을 탑재할 수 있으며 이를 통해 기존에 중복제거를 수행하기 위해 사전에 직접 중복 임계치를 설정해야 하는 번거로움을 없애고, 각각의 경계박스에서 적절한 중복제거 범위를 자동으로 설정할 수 있도록 함으로써 병렬처리에 보다 더 적합한 중복제거 방법을 제공할 수 있다. 다시 말해, 인공지능 동작을 통해 중복제거 범위를 자동을 설정할 수 있으므로, 예를 들어 A라는 객체에 대한 경계박스에서의 중복제거와 B라는 객체에 대한 경계박스에서의 중복제거가 동시에 가능하여 경계박스의 중복제거에 있어 병렬 처리가 가능할 수 있다. 또한, 인공지능 가령 딥러닝과 같은 학습에 의해 중복 임계치를 자동으로 설정하므로, 중복제거 정도의 정확도를 높일 수 있다. 다시 말해, 기존에 IoU 임계치는 각 박스에 필요한 중복제거 정도와 상관없이 모든 박스에 일괄적으로 적용함으로써 제거되지 말아야 할 박스가 제거되는 경우도 발생할 수 있으므로, 이러한 문제를 개선할 수 있다. 경계박스의 중복제거와 관련한 자세한 내용은 이후에 좀더 다루기로 한다.More specifically, the image processing apparatus 120 according to the embodiment of the present invention may be equipped with an artificial neural network module for deduplication of the bounding box. It is possible to provide a more suitable deduplication method for parallel processing by eliminating the hassle of doing so and by automatically setting an appropriate deduplication range in each bounding box. In other words, since the deduplication range can be set automatically through artificial intelligence operation, for example, deduplication in the bounding box for object A and deduplication in the bounding box for object B are simultaneously possible. Parallel processing may be possible in the deduplication of In addition, since the redundancy threshold is automatically set by artificial intelligence, such as deep learning, it is possible to increase the accuracy of the degree of deduplication. In other words, since the existing IoU threshold is applied to all boxes at once regardless of the degree of deduplication required for each box, boxes that should not be removed may occur, so this problem can be improved. More details on deduplication of bounding boxes will be discussed later.

서드파티장치(130)는 경찰서 등의 관공서에서 운영하는 서버, 기타 콘텐츠 영상을 제공하는 업체의 서버를 포함할 수 있다. 가령, 지방자치단체에서 운영하는 관제장치는 서드파티장치(130)가 될 수도 있다. 물론 이러한 관제장치는 본 발명의 실시예에 따른 영상처리장치(120)인 것이 바람직하다. 하지만, 본 발명의 실시예에서는 어느 하나에 특별히 한정하지는 않을 것이다. 서드파티장치(130)는 다양한 목적에 사용될 수 있으므로 콘텐츠 즉 비디오 영상을 제공하는 업체의 서버 또는 컴퓨팅장치로 이해해도 좋다.The third-party device 130 may include a server operated by a public office such as a police station, and a server of a company that provides other content images. For example, a control device operated by a local government may be a third-party device 130 . Of course, the control device is preferably the image processing device 120 according to the embodiment of the present invention. However, the embodiment of the present invention will not be particularly limited to any one. Since the third-party device 130 can be used for various purposes, it may be understood as a server or computing device of a company that provides content, that is, a video image.

도 2는 도 1의 영상처리장치의 세부구조를 예시한 블록다이어그램이다.FIG. 2 is a block diagram illustrating a detailed structure of the image processing apparatus of FIG. 1 .

도 2에 도시된 바와 같이, 본 발명의 실시예에 따른 도 1의 영상처리장치(120)는 통신 인터페이스부(200), 제어부(210), 경계박스처리부(혹은 경계박스 중복제거장치)(220) 및 저장부(230)의 일부 또는 전부를 포함한다.As shown in FIG. 2 , the image processing apparatus 120 of FIG. 1 according to an embodiment of the present invention includes a communication interface unit 200 , a control unit 210 , a boundary box processing unit (or a boundary box deduplication device) 220 . ) and some or all of the storage unit 230 .

여기서, "일부 또는 전부를 포함한다"는 것은 저장부(230)와 같은 일부 구성요소가 생략되어 영상처리장치(120)가 구성되거나 경계박스처리부(220)와 같은 일부 구성요소가 제어부(210)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, “including some or all” means that some components such as the storage unit 230 are omitted to configure the image processing apparatus 120 or some components such as the bounding box processing unit 220 are included in the control unit 210 . It means that it can be configured by being integrated with other components, such as, and will be described as including all in order to help a sufficient understanding of the invention.

통신 인터페이스부(200)는 도 1의 통신망(110)을 경유하여 촬영장치(100) 및 서드파티장치(130)와 각각 통신한다. 통신을 수행하는 과정에서 통신 인터페이스부(200)는 변/복조, 인코딩/디코딩, 먹싱/디먹싱, 해상도를 변환하는 스케일링 등의 동작을 수행할 수 있으며, 이는 당업자에게 자명하므로 더 이상의 설명은 생략한다.The communication interface unit 200 communicates with the photographing device 100 and the third-party device 130 via the communication network 110 of FIG. 1 , respectively. In the process of performing communication, the communication interface unit 200 may perform operations such as modulation/demodulation, encoding/decoding, muxing/demuxing, and scaling for converting resolution, which are obvious to those skilled in the art, so further description will be omitted. do.

통신 인터페이스부(200)는 가령 촬영장치(100)로부터 수신되는 촬영영상을 제어부(210)로 전달할 수 있으며, 제어부(210)의 요청에 따라 관리자 장치로서 서드파티장치(130)로부터 요청이 있는 경우 촬영영상의 분석 결과를 제공할 수 있다.The communication interface unit 200 may transmit, for example, a photographed image received from the photographing device 100 to the control unit 210 , when there is a request from the third-party device 130 as a manager device according to the request of the control unit 210 . An analysis result of the captured image may be provided.

제어부(210)는 도 1의 영상처리장치(120)를 구성하는 도 2의 통신 인터페이스부(200), 경계박스처리부(220) 및 저장부(230)의 전반적인 제어 동작을 담당한다. 제어부(210)는 CPU, MPU, GPU 등을 포함할 수 있다. 제어부(210)는 통신 인터페이스부(200)에서 제공되는 촬영영상을 저장부(230)에 임시 저장한 후 불러내어 경계박스처리부(220)로 제공한다. 또한, 제어부(210)는 경계박스처리부(220)에서 처리되는 지정 포맷의 영상 분석결과를 도 1의 DB(120a)에 체계적으로 저장시킬 수 있다.The control unit 210 is in charge of overall control operations of the communication interface unit 200, the bounding box processing unit 220, and the storage unit 230 of FIG. 2 constituting the image processing apparatus 120 of FIG. 1 . The control unit 210 may include a CPU, an MPU, a GPU, and the like. The control unit 210 temporarily stores the captured image provided from the communication interface unit 200 in the storage unit 230 , and then calls it and provides it to the bounding box processing unit 220 . Also, the control unit 210 may systematically store the image analysis result of the specified format processed by the bounding box processing unit 220 in the DB 120a of FIG. 1 .

무엇보다, 제어부(210)는 본 발명의 실시예에 따라 촬영영상의 비디오 프레임에서 추출되는 객체들에 대하여 경계박스를 설정(혹은 매칭)하고, 물론 이에 앞서 인경신경망 기반으로 객체를 추출하기 위한 동작을 수행할 수 있으며, 각각 추출되는 서로 다른 대상을 나타내는 객체들에 대하여 경계박스를 설정할 수 있다. 예를 들어, 비디오 프레임은 초당 60장의 정지 영상이 구현되므로 동일 객체에 대하여 경계박스는 중복될 수 있고, 따라서 이러한 중복되는 경계박스는 가령 효율적인 관제 등을 위하여 제거되는 것이 바람직하다. 이를 위하여 제어부(210)는 경계박스처리부(220)와 연계하여 동작할 수 있다.Above all, the control unit 210 sets (or matches) a bounding box with respect to objects extracted from a video frame of a captured image according to an embodiment of the present invention, and, of course, prior to this, an operation for extracting an object based on an intra-neural network can be performed, and a bounding box can be set for objects representing different objects to be extracted, respectively. For example, since 60 still images are implemented in a video frame per second, the bounding box may be overlapped with respect to the same object. Therefore, it is preferable that the overlapping bounding box be removed for efficient control, for example. To this end, the control unit 210 may operate in conjunction with the bounding box processing unit 220 .

경계박스처리부(220)는 본 발명의 실시예에 따른 경계박스 중복제거장치라 명명될 수 있으며, 나아가서는 도 1의 영상처리장치(120)가 경계박스 중복제거장치가 될 수도 있다. 따라서, 경계박스 중복제거장치의 범위를 어디까지 설정하는지에 대하여는 어느 하나의 형태에 특별히 한정하지는 않을 것이다. 예를 들어, 경계박스처리부(220)는 경계박스를 처리하는 동작에만 관여하고, 영상분석이나 객체 검출 등은 도 2의 제어부(210)에서 담당할 수도 있다. 다만, 객체 검출, 나아가 DCNN 검출 등은 이미 공지된 사항이고, 본 발명의 실시예에서는 경계박스처리부(220)가 기존의 경계박스 처리 동작과 어떻게 다른지를 중심으로 설명하고자 한다.The bounding box processing unit 220 may be referred to as a bounding box deduplication apparatus according to an embodiment of the present invention, and furthermore, the image processing apparatus 120 of FIG. 1 may be a bounding box deduplication apparatus. Therefore, the extent to which the range of the bounding box deduplication device is set will not be particularly limited to any one form. For example, the bounding box processing unit 220 may be involved only in the operation of processing the bounding box, and the control unit 210 of FIG. 2 may be in charge of image analysis or object detection. However, object detection and DCNN detection are already known, and in the embodiment of the present invention, the description will be focused on how the bounding box processing unit 220 differs from the existing bounding box processing operation.

경계박스를 제거함에 있어서 중복제거 정도를 정확하게 설정하지 않는 경우에는 유용한 경계박스가 제거될 수 있다. 즉 제거되지 않아야 할 경제박스가 제거될 수도 있다. 따라서, 본 발명의 실시예에 따른 경계박스처리부(220)는 이러한 오류가 발생하지 않거나 기발생된 오류를 바로잡기 위하여 중복제거 정도는 인공신경망 즉 인공지능을 통해 가령 딥러닝 등의 학습을 통해 자유롭게 설정될 수 있고, 이의 과정에서 자동 설정 또는 기설정된 중복제거 정도의 자동 변경이 이루어질 수 있다.If the degree of deduplication is not accurately set in removing the bounding box, a useful bounding box may be removed. That is, the economic box that should not be removed may be removed. Therefore, the bounding box processing unit 220 according to an embodiment of the present invention does not generate such an error or, in order to correct a previously generated error, the degree of deduplication can be freely determined through learning such as deep learning through an artificial neural network, that is, artificial intelligence. may be set, and in the process, automatic setting or automatic change of a preset degree of deduplication may be performed.

또한, 경계박스처리부(220)는 경계박스의 중복 제거를 서로 다른 대상 또는 유형의 객체들에 대하여 동시에 수행함으로써 중복제거의 병렬 처리가 얼마든지 가능할 수 있다. 즉 A 객체에 대한 경계박스 중복제거와 B 객체에 대한 경계박스 중복제거를 동시에 수행할 수 있는 것이다. 그 결과 영상의 연산처리 속도는 그만큼 빨리지게 된다.In addition, the bounding box processing unit 220 simultaneously performs the deduplication of the bounding box on different objects or types of objects, so that parallel processing of deduplication may be possible. That is, deduplication of bounding box for object A and deduplication of bounding box for object B can be performed simultaneously. As a result, the processing speed of image processing is increased that much.

저장부(230)는 제어부(210)의 제어하에 처리되는 다양한 유형의 데이터를 저장 및 출력할 수 있다. 예를 들어, 저장부(230)는 경계박스처리부(220)에서 룩업테이블(LUT)을 활용하는 경우에는 LUT 데이터를 저장부(230)에 저장한 후 출력할 수 있다.The storage 230 may store and output various types of data processed under the control of the controller 210 . For example, when the lookup table (LUT) is utilized by the bounding box processing unit 220 , the storage unit 230 may store the LUT data in the storage unit 230 and then output the LUT data.

상기한 내용 이외에도 본 발명의 실시예에 따른 도 2의 통신 인터페이스부(200), 제어부(210), 경계박스처리부(220) 및 저장부(230)는 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the above, the communication interface unit 200, the control unit 210, the bounding box processing unit 220 and the storage unit 230 of FIG. 2 according to an embodiment of the present invention may perform various operations, and other detailed information has been sufficiently explained above, so I will substitute those contents.

한편, 본 발명의 다른 실시예로서 제어부(210)는 CPU 및 메모리를 포함할 수 있으며, 원칩화하여 형성될 수 있다. CPU는 제어회로, 연산부(ALU), 명령어해석부 및 레지스트리 등을 포함하며, 메모리는 램을 포함할 수 있다. 제어회로는 제어동작을, 그리고 연산부는 2진비트정보의 연산동작을, 그리고 명령어해석부는 인터프리터나 컴파일러 등을 포함하여 고급언어를 기계어로, 또 기계어를 고급언어로 변환하는 동작을 수행할 수 있으며, 레지스트리는 소프트웨어적인 데이터 저장에 관여할 수 있다. 상기의 구성에 따라, 가령 영상처리장치(120)의 동작 초기에 경계박스처리부(220)에 저장되어 있는 프로그램을 복사하여 메모리 즉 램(RAM)에 로딩한 후 이를 실행시킴으로써 데이터 연산 처리 속도를 빠르게 증가시킬 수 있다.On the other hand, as another embodiment of the present invention, the control unit 210 may include a CPU and a memory, and may be formed as a single chip. The CPU includes a control circuit, an arithmetic unit (ALU), an instruction interpreter and a registry, and the memory may include a RAM. The control circuit performs a control operation, the operation unit performs an operation operation of binary bit information, and the instruction interpreter performs an operation of converting a high-level language into a machine language and a machine language into a high-level language, including an interpreter or a compiler. , the registry may be involved in software data storage. According to the above configuration, for example, at the beginning of the operation of the image processing apparatus 120, a program stored in the bounding box processing unit 220 is copied, loaded into a memory, that is, a RAM, and then executed to speed up the data operation processing speed. can increase

도 3은 도 1의 영상처리장치의 다른 세부구조를 예시한 블록다이어그램, 도 4는 도 3의 LMM의 세부구조 및 동작을 설명하기 위한 도면, 도 5는 도 4의 마스크생성기의 작동원리를 설명하기 위한 도면, 그리고 도 6은 도 1의 영상처리장치의 또다른 세부구조를 예시한 블록다이어그램이다.FIG. 3 is a block diagram illustrating another detailed structure of the image processing apparatus of FIG. 1 , FIG. 4 is a diagram for explaining the detailed structure and operation of the LMM of FIG. 3 , and FIG. 5 illustrates the operating principle of the mask generator of FIG. 4 FIG. 6 is a block diagram illustrating another detailed structure of the image processing apparatus of FIG. 1 .

도 3에 도시된 바와 같이, 본 발명의 실시예에 따른 도 1의 영상처리장치(120) 또는 도 2의 경계박스처리부(220)는 도 3에서와 같은 LLM(Local Maximum Module)(부)(300)를 포함할 수 있다. 도 3에서 볼 때, 네트워크부(290)는 도 3에서의 LLM부(300)를 제외한 나머지 구성요소를 의미하는 것으로 이해해도 좋다. 예를 들어, 네트워크부(290)는 도 2의 제어부(210)가 될 수도 있으며, 또는 제어부(210)를 구성하는 SW 모듈로서 객체검출기, 또는 경계박스처리부(220)에 구성되는 SW 모듈로서 객체검출기를 의미할 수도 있다. 따라서, 본 발명의 실시예에서는 네트워크부(290)를 어느 특정 대상으로 특별히 한정하지는 않을 것이다.3, the image processing apparatus 120 of FIG. 1 or the bounding box processing unit 220 of FIG. 2 according to an embodiment of the present invention is an LLM (Local Maximum Module) (part) ( 300) may be included. Referring to FIG. 3 , the network unit 290 may be understood to mean the remaining components except for the LLM unit 300 in FIG. 3 . For example, the network unit 290 may be the control unit 210 of FIG. 2 , or an object detector as a SW module constituting the control unit 210 , or an object as a SW module configured in the bounding box processing unit 220 . It can also mean a detector. Therefore, in the embodiment of the present invention, the network unit 290 will not be specifically limited to any specific object.

LLM부(300)는 네트워크부(290)에서 출력된 박스(box), 스코어(score), 특징(정보)(feature)을 입력받아 로컬 최대 스코어(local Maximum score)(이하, LM 스코어라 함)를 출력한다. 이때 중복제거되는 박스는 낮은 LM 스코어를 갖게 된다.The LLM unit 300 receives the box, score, and feature (feature) output from the network unit 290 to obtain a local Maximum score (hereinafter referred to as LM score). to output In this case, the deduplicated box has a low LM score.

LLM부(300)는 마스크 생성 단계와, 생성된 마스크를 통해서 LM 스코어를 계산하는 두 단계의 동작을 수행한다. 이를 위해 LLM부(300)는 도 4에서와 같이 마스크 생성기(400)를 포함할 수 있다. 물론 마스크 생성기(400)는 S/W모듈, H/W모듈 또는 그 조합에 의해 구성될 수 있다. 먼저 마스크 생성기(400)는 IoU와 박스, 스코어를 입력받아 각 IoU에 대해 중복제거 마스크 즉 마스크 데이터를 생성한다. 각 IoU에 대응되는 마스크를 생성한다고도 볼 수 있다. 여기서 IoU는 중복제거 정도 또는 정도의 임계치(threshold)를 의미한다. 또한 마스크 데이터 생성은 가령 매트릭스를 마스킹하는 원리에 따라 계산하는 방식을 의미할 수 있다. 그 다음 단계에서 LLM부(300)는 입력받은 특징으로부터 마스크의 가중치를 계산하고 이를 가중합(weighted summation)하여 LM 스코어를 계산한다. 통상 인공신경망은 가중합과 비선형 함수(Non-linear function)로 이루어진 연산을 수행하며, 도 4의 하단은 가중치의 계산 과정을 보여주고 있다.The LLM unit 300 performs two operations: generating a mask and calculating an LM score using the generated mask. To this end, the LLM unit 300 may include a mask generator 400 as shown in FIG. 4 . Of course, the mask generator 400 may be configured by a S/W module, a H/W module, or a combination thereof. First, the mask generator 400 receives an IoU, a box, and a score, and generates a deduplication mask, that is, mask data for each IoU. It can also be seen that a mask corresponding to each IoU is generated. Here, IoU means the degree or threshold of the degree of deduplication. Also, the mask data generation may refer to, for example, a method of calculating a matrix according to a masking principle. In the next step, the LLM unit 300 calculates an LM score by calculating the weight of the mask from the input features and performing a weighted summation thereof. In general, an artificial neural network performs an operation consisting of a weighted sum and a non-linear function, and the lower part of FIG. 4 shows a weight calculation process.

예를 들어, 박스가 4개 일 때(가령, IoU 0.5를 기준으로 하는 Mask 0.5의 경우) LLM부(300)는 도 5에서 볼 수 있는 바와 같이 IoU 매트릭스를 구한 후 IoU 임계치에 따라서 인접한 박스는 1로, 아닌 박스는 0으로 이진화(binarize)한다. 여기서 인접한 박스 중에 해당 박스가 가장 스코어가 높으면 즉 LM(Local Maximum)이면 그 박스에 대해서 1을 출력하고 LM이 아니면 0을 출력한다. 그래서, IoU 0.5에 대한 마스크 정보(혹은 데이터)는 "1010"이 출력된다. 이러한 원리에 따라 IoU 0.9에 대한 마스크 정보는 "1110"이 출력된다.For example, when there are 4 boxes (for example, in the case of Mask 0.5 based on IoU 0.5), the LLM unit 300 obtains the IoU matrix as shown in FIG. 5 and then the adjacent boxes according to the IoU threshold are Binarize to 1 and non-boxes to 0. Here, if the corresponding box has the highest score among adjacent boxes, that is, if it is LM (Local Maximum), 1 is output for that box, and 0 is output if it is not LM. So, "1010" is output as mask information (or data) for IoU 0.5. According to this principle, "1110" is output as mask information for IoU 0.9.

도 6은 손실 함수(Loss function), 그리고 MoG(Mixture of Gaussian)를 갖는 LMM을 보여준다. LM 스코어의 학습을 위해 MoG의 밀도 추정(density estimation)을 이용한다. 이 MoG의 컴포넌트(component)의 개수는 박스의 수이다. 따라서 특징 즉 feature로부터 MoG의 파라미터 혼합 계수(parameter mixing coefficient)와 표준 편차(standard deviation)가 추가로 필요하다. 즉 사용된다. 혼합 계수는 각 가우시안의 가중치이며, 모든 혼합 계수의 합은 1로, 소프트맥스(softmax)를 통해 얻는다. 소프트맥스는 가령 다중분류 문제에서 각 클래스에 대한 분류가 정답일 확률을 결과값으로 출력하는 함수를 의미한다. 표준 편차는 0보다 커야 하므로 소프트플러스 함수(softplus fuction)를 통해 얻는다. 소프트플러스 함수는 렐루를 변형한 함수이다. 즉 렐루의 0이 되는 순간을 완화한 렐루 함수의 변형인 것이다. 손실함수는 <수학식 1>과 같이 나타낼 수 있다.6 shows an LMM with a loss function and a MoG (Mixture of Gaussian). MoG density estimation is used to learn the LM score. The number of components of this MoG is the number of boxes. Therefore, the parameter mixing coefficient and standard deviation of MoG from the feature, that is, the feature are additionally required. ie used. The mixing coefficient is the weight of each Gaussian, and the sum of all the mixing coefficients is 1, obtained through softmax. Softmax refers to a function that outputs, as a result, the probability that the classification for each class is the correct answer in a multi-classification problem, for example. Since the standard deviation must be greater than zero, it is obtained through the softplus function. The SoftPlus function is a modified version of Relu. That is, it is a modification of the relu function that relieves the moment when the relu becomes 0. The loss function can be expressed as <Equation 1>.

(여기서, F: the probability density function of multi-variate Gaussian distribution, πk: mixing coefficient of the k-th component among K components, gk: local maximum score, αk: α controls the balance between the two loss terms → 1/K* 0.2 ~ 0.5)(Where F: the probability density function of multi-variate Gaussian distribution, πk: mixing coefficient of the k-th component among K components, gk: local maximum score, αk: α controls the balance between the two loss terms → 1/ K* 0.2 ~ 0.5)

손실 함수의 첫번째 조건(term)은 MoG의 NLL(Negative Log Likelihood)에 LM 스코어를 적용한 것으로서, 박스의 분포를 표현하도록 학습된다. 두번째 조건은 LM 스코어의 l1(혹은 L1) penalty term이다. 이를 통해 꼭 필요한 가우시안만 사용되도록 유도하며 LLM(Local Maximum Module)에 의해 LM이 아닌 박스들이 먼저 제거되도록 LM 스코어가 학습된다. The first term of the loss function is an LM score applied to NLL (Negative Log Likelihood) of MoG, and is learned to express the distribution of boxes. The second condition is the l1 (or L1) penalty term of the LM score. Through this, only necessary Gaussian is used, and the LM score is learned so that non-LM boxes are first removed by LLM (Local Maximum Module).

결론적으로 본 발명의 실시예에 따른 도 3 및 도 4의 LLM(local max module)은 마스크 생성기(400)를 통해 박스, 스코어 및 IoU를 근거로 하는 각 IoU에 대한 중복제거 마스크를 생성하고, 특징을 근거로 마스크의 가중치를 계산하여 이를 가중합하여 LM 스코어를 계산한다. 또한, 계산한 LM 스코어에 대하여는 학습을 수행함으로써 LM이 아닌 박스들이 먼저 제거되도록 함으로써 그 결과 경계박스 중복제거의 정확도를 높일 수 있고, 또 각각의 경계박스에 적절한 중복제거 범위가 자동으로 설정될 수 있도록 하며, 나아가 모든 경계박스에 대한 중복제거 과정을 병렬적으로 처리하는 것이 가능하게 된다.In conclusion, the local max module (LLM) of FIGS. 3 and 4 according to an embodiment of the present invention generates a deduplication mask for each IoU based on a box, a score, and an IoU through the mask generator 400, and features LM score is calculated by calculating the weight of the mask based on the LM score. In addition, by performing learning on the calculated LM score, non-LM boxes are removed first, resulting in increased accuracy of bounding box deduplication, and an appropriate deduplication range can be automatically set for each bounding box. In addition, it becomes possible to process the deduplication process for all bounding boxes in parallel.

한편, 도 3 및 도 6에 도시된 바 있는 LLM부(300, 300')는 가령 S/W 모듈로서, 네트워크부(290)로부터 검출 객체와 관련되는 객체 데이터를 수신하는 데이터 수신부, 그리고 데이터 수신부에서 제공되는 객체 데이터를 이용하여 인공신경망에 기반하여 자동 설정되는 중복제거 범위를 근거로 동일 객체에 설정되는 경계박스의 중복제거를 수행하는 경계박스 중복제거부를 포함할 수 있다. 가령, LLM부(300)는 S/W 모듈에 해당하므로, 데이터 수신부는 데이터의 경로인 인터페이스부를 의미할 수 있으며, 경계박스 중복제거부는 매니저(Manager) 등을 포함할 수 있다. 다만, 본 발명의 실시예에서는 이러한 구성에 특별히 한정하지는 않을 것이다. 다시 말해, 도 2의 제어부(210)는 CPU와 원칩화하여 구성되는 메모리에 본 발명의 실시예에 따른 경계박스의 중복제거를 위한 프로그램을 저장한 후 이를 실행시키는 경우 그 동작 대상은 제어부(210) 또는 프로세서인 CPU 등이 될 수 있을 것이다.Meanwhile, the LLM units 300 and 300 ′ shown in FIGS. 3 and 6 are, for example, S/W modules, a data receiving unit for receiving object data related to a detection object from the network unit 290 , and a data receiving unit It may include a bounding box deduplication unit that performs deduplication of a bounding box set in the same object based on a deduplication range automatically set based on an artificial neural network using the object data provided in . For example, since the LLM unit 300 corresponds to a S/W module, the data receiving unit may mean an interface unit that is a data path, and the bounding box deduplication unit may include a manager and the like. However, the embodiment of the present invention will not be particularly limited to this configuration. In other words, when the control unit 210 of FIG. 2 stores the program for deduplication of the bounding box according to the embodiment of the present invention in a memory configured as one-chip with the CPU, and then executes it, the operation target is the control unit 210 Alternatively, it may be a CPU, which is a processor, or the like.

도 7은 본 발명의 실시예에 따른 경계박스 중복제거장치의 구동과정을 나타내는 흐름도이다.7 is a flowchart illustrating a driving process of a bounding box deduplication device according to an embodiment of the present invention.

설명의 편의상 도 7을 도 3 및 도 6과 함께 참조하면, 본 발명의 실시예에 따른 경계박스 중복제거장치로서 LLM부(300, 300')는 촬영영상의 비디오 프레임으로부터 검출되는 객체들에 대한 객체 데이터를 수신한다(S700). 여기서, 객체 데이터는 비디오 프레임의 객체에 대한 정보를 포함하며, 또 경계박스, 경계박스의 신뢰도 스코어, 나아가 중복제거 정도(IoU) 등을 포함할 수 있다.For convenience of explanation, referring to FIG. 7 together with FIGS. 3 and 6 , as a bounding box deduplication device according to an embodiment of the present invention, the LLM units 300 and 300 ′ provide information about objects detected from a video frame of a captured image. Receive object data (S700). Here, the object data includes information about the object of the video frame, and may include a bounding box, a reliability score of the bounding box, and furthermore, a degree of deduplication (IoU).

또한, LLM부(300, 300')는 수신한 객체 데이터를 이용하여 (기)검출한 객체들의 주변으로 경계박스를 설정하며, 인공신경망에 기반하여, 가령 인공지능 학습에 기반하여 자동 설정되는 중복제거 범위를 근거로 동일 객체에 설정되는 경계박스의 중복제거를 수행한다(S710).In addition, the LLM unit (300, 300') sets a bounding box around the (pre)detected objects using the received object data, and based on the artificial neural network, for example, based on artificial intelligence learning, automatically set overlapping Deduplication of the bounding box set in the same object is performed based on the removal range (S710).

예를 들어, 비디오 프레임이 초당 60장을 화면으로 구현하는 경우, 동일 객체가 화면에서 사라지지 않는 이상 복수의 프레임에서 중복하여 나타나게 되며, 따라서 경계박스를 설정하는 영상처리 분야의 경우에는 경계박스의 중복제거가 반드시 요구될 수 있다. 따라서, 중복제거시에 객체가 밀접해 있는 경우 경계박스의 불필요한 제거가 발생할 수 있으며, 또 기존과 같이 수동에 의한 중복제거 정도의 설정은 또한 불필요한 제거를 발생시킴으로써 경계박스의 중복제거의 정확도를 저하시킬 수 있다. 따라서, 본 발명의 실시예에서는 인공지능의 딥러닝과 같은 인공신경망에 기반해 중복제거 범위를 자동으로 정확히 설정하고, 또는 영상처리 상태에 따라 주기적으로 중복제거 범위를 변경함으로써 중복제거의 정확도를 높일 수 있으며, 나아가 서로 다른 객체들에 대하여 중복제거 동작의 병렬 처리가 가능함으로써 연산 처리 속도도 빠르게 이루어질 수 있을 것이다.For example, if a video frame is implemented at 60 frames per second as a screen, the same object appears repeatedly in multiple frames unless it disappears from the screen. Therefore, in the case of image processing that sets a bounding box, the bounding box Deduplication may be required. Therefore, when objects are close to each other during deduplication, unnecessary removal of the bounding box may occur, and the manual setting of the degree of deduplication as in the past also causes unnecessary deduplication, thereby lowering the accuracy of deduplication of the bounding box. can do it Therefore, in an embodiment of the present invention, the accuracy of deduplication is increased by automatically setting the deduplication range accurately based on an artificial neural network such as deep learning of artificial intelligence, or by periodically changing the deduplication range according to the image processing state. In addition, since parallel processing of the deduplication operation for different objects is possible, the operation processing speed may be made faster.

상기한 내용 이외에 도 7의 경계박스 중복제거장치는 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the above, the bounding box deduplication apparatus of FIG. 7 can perform various operations, and since other details have been sufficiently described above, they will be replaced with the contents.

한편, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합하거나 결합하여 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 비일시적 저장매체(non-transitory computer readable media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시 예를 구현할 수 있다.On the other hand, even though it has been described that all components constituting the embodiment of the present invention are combined or operated as one, the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining one or more. In addition, although all of the components may be implemented as one independent hardware, some or all of the components are selectively combined to perform some or all functions of the combined components in one or a plurality of hardware program modules It may be implemented as a computer program having Codes and code segments constituting the computer program can be easily deduced by those skilled in the art of the present invention. Such a computer program is stored in a computer-readable non-transitory computer readable media, read and executed by the computer, thereby implementing an embodiment of the present invention.

여기서 비일시적 판독 가능 기록매체란, 레지스터, 캐시(cache), 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라, 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로, 상술한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리 카드, ROM 등과 같은 비일시적 판독가능 기록매체에 저장되어 제공될 수 있다.Here, the non-transitory readable recording medium refers to a medium that stores data semi-permanently and can be read by a device, not a medium that stores data for a short moment, such as a register, cache, memory, etc. . Specifically, the above-described programs may be provided by being stored in a non-transitory readable recording medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.In the above, preferred embodiments of the present invention have been illustrated and described, but the present invention is not limited to the specific embodiments described above, and it is common in the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims. Various modifications may be made by those having the knowledge of, of course, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

100: 촬영장치 110: 통신망
120: 영상처리장치 130: 서드파티장치
200: 통신 인터페이스부 210: 제어부
220: 경계박스처리부 230: 저장부
290, 290': 네트워크부 300, 300': LLM부
400: 마스크 생성기100: photographing device 110: communication network
120: image processing device 130: third-party device
200: communication interface unit 210: control unit
220: bounding box processing unit 230: storage unit
290, 290': Network unit 300, 300': LLM unit
400: mask generator

Claims

a data receiver configured to receive object data for objects detected from a video frame of a photographed image; and
A bounding box is set around the detected objects using the received object data, and the same object is set based on a deduplication range automatically set based on an artificial neural network. Bounding box deduplication unit for performing deduplication of the bounding box; including,
The bounding box deduplication unit calculates a maximum score using a confidence score for the bounding box included in the object data, and learns the calculated maximum score based on the artificial neural network based on the learning result. Automatically set the deduplication range,
The bounding box deduplication unit generates binary bit-type deduplication mask data for each IoU based on the bounding box, the reliability score of the bounding box, and a plurality of deduplication degrees (IoU), A bounding box deduplication apparatus for calculating the maximum score by performing weighted summation by reflecting a weight according to the characteristics of the object to the generated mask data.

According to claim 1,
The bounding box deduplication unit is configured to automatically set the deduplication range for different objects, respectively.

3. The method of claim 2,
The bounding box deduplication unit, bounding box deduplication device for performing deduplication of the bounding box in a parallel processing method for the different objects.

delete

According to claim 1,
The bounding box deduplication unit, after obtaining a matrix according to the number of bounding boxes, binarizes the bounding box into adjacent boxes and non-adjacent boxes according to the IoU threshold, and for the box with the highest score among the adjacent boxes A bounding box deduplication apparatus for generating the deduplication mask data by outputting 1, otherwise outputting 0.

receiving, by the data receiving unit, object data for objects detected from a video frame of a captured image; and
The bounding box deduplication unit sets a bounding box around the detected objects using the received object data, and the bounding box is set in the same object based on the automatically set deduplication range based on an artificial neural network. Including; performing deduplication;
The step of performing the deduplication includes:
calculating a maximum score using a confidence score for the bounding box included in the object data;
automatically setting a deduplication range based on the artificial neural network based on a learning result by learning the calculated maximum score;
generating deduplication mask data in the form of binary bits for each IoU based on the bounding box, a reliability score of the bounding box, and a plurality of deduplication degrees (IoU); and
calculating the maximum score by weighted summing the generated mask data by reflecting a weight according to the characteristics of the object;
A method of driving a bounding box deduplication device comprising a.

8. The method of claim 7,
The step of performing the deduplication includes:
A method of driving a bounding box deduplication apparatus for automatically setting the deduplication range for different objects, respectively.

9. The method of claim 8,
The step of performing the deduplication includes:
A driving method of a bounding box deduplication apparatus for performing deduplication of the bounding box in a parallel processing manner for the different objects.

delete

8. The method of claim 7,
The step of generating a deduplication mask for each IoU comprises:
binarizing the bounding box into adjacent boxes and non-adjacent boxes according to an IoU threshold after obtaining a matrix according to the number of bounding boxes; and
Outputting 1 for the box having the highest score among the adjacent boxes, otherwise outputting 0 to generate the deduplication mask data;
A method of driving a bounding box deduplication device comprising a.