KR102478335B1

KR102478335B1 - Image Analysis Method and Server Apparatus for Per-channel Optimization of Object Detection

Info

Publication number: KR102478335B1
Application number: KR1020170128209A
Authority: KR
Inventors: 안수남; 양승지; 조형준
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2017-09-29
Filing date: 2017-09-29
Publication date: 2022-12-15
Also published as: KR20190038137A

Abstract

채널별 객체 검출 최적화를 위한 영상분석 방법 및 서버장치를 개시한다.
각 채널별 입력영상으로부터 객체를 검출하여 인식하도록 학습된 신경망 기반의 복수의 채널별 객체 탐지부; 및 상기 각 채널별 입력영상 및 상기 복수의 채널별 객체 탐지부로부터 출력되는 탐지결과를 획득하여 각 채널에 대한 학습 데이터를 수집하는 학습 데이터 수집부; 수집한 학습 데이터를 이용하여 상기 복수의 채널별 객체 탐지부를 개별적으로 학습시키는 학습부를 포함하는 영상분석 서버장치를 제공한다.An image analysis method and server device for optimizing object detection for each channel are disclosed.
a plurality of object detection units for each channel based on a neural network trained to detect and recognize an object from an input image for each channel; and a learning data collection unit that collects learning data for each channel by acquiring the input image for each channel and the detection result output from the object detection unit for each of the plurality of channels. Provided is an image analysis server device including a learning unit that individually learns the object detection unit for each of the plurality of channels using the collected learning data.

Description

Image Analysis Method and Server Apparatus for Per-channel Optimization of Object Detection}

본 발명은 채널별 객체 검출 최적화를 위한 영상분석 방법 및 서버장치에 관한 것이다.The present invention relates to an image analysis method and server device for optimizing object detection for each channel.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this part merely provide background information on the present embodiment and do not constitute prior art.

CCTV(Closed Circuit Television)와 DVR(Digital Video Recorder)를 활용한 감시 시스템은 관리자가 육안으로 단순히 모니터링하고 영상을 저장하는 기술에서 나아가 지능형 영상감시 시스템(Intelligent Surveillance System)으로 영역을 넓히고 있다. 지능형 영상감시 시스템은 카메라로부터 입력되는 영상정보를 실시간으로 분석하여 객체를 탐지, 인식, 분류, 추적 등을 할 수 있는 형태의 시스템을 말한다. 지능형 영상감지 시스템은 객체가 보안과 관련된 이벤트를 발생시켰는지를 판단 및 분석하여 관리자에게 정보를 제공하거나 데이터 및 이벤트 내용을 저장한 후 사후예방 관리 및 검색의 효율성을 극대화시킬 수 있는 기술이다. Surveillance systems using CCTV (Closed Circuit Television) and DVR (Digital Video Recorder) are expanding their scope from a technology in which an administrator simply monitors and stores images with the naked eye to an intelligent video surveillance system. An intelligent video surveillance system refers to a system capable of detecting, recognizing, classifying, and tracking an object by analyzing video information input from a camera in real time. The intelligent image detection system is a technology that can maximize the efficiency of post-prevention management and search after determining and analyzing whether an object has generated an event related to security, providing information to a manager or storing data and event contents.

CCTV 영상감시 시스템에서는 객체의 탐지, 인식, 분류, 추적 등을 위하여 컨볼루션 신경망(CNN: Convolutional Neural Network)과 같은 딥 러닝(Deep Learning) 기술을 채택하고 있다. 컨볼루션 신경망(CNN) 기반의 기술을 이용할 경우, 각 영상 프레임마다 객체를 검출하고 검출된 객체에 대한 정보를 활용할 수 있다. 이러한 객체 검출 과정은 많은 연산량을 요구하기 때문에, 영상 촬영 후의 객체 검출 과정은 주로 서버에서 이루어지게 된다. 따라서 서버장치의 하드웨어 성능에 따라 서비스를 제공할 수 있는 영상 채널의 수가 제한되게 되며, 서비스 영상 채널의 수가 증가함에 따라 많은 비용이 든다.A CCTV video surveillance system adopts a deep learning technology such as a convolutional neural network (CNN) for object detection, recognition, classification, and tracking. In the case of using a convolutional neural network (CNN)-based technology, an object may be detected for each image frame and information on the detected object may be utilized. Since this object detection process requires a large amount of computation, the object detection process after image capture is mainly performed in the server. Therefore, the number of video channels capable of providing a service is limited according to the hardware performance of the server device, and as the number of service video channels increases, a lot of cost is incurred.

또한, 학습된 결과를 이용하여 객체를 검출하는 경우, 실제로는 영상 내 객체가 존재하지 않는 경우에도 조명환경과 날씨변화, 사용자의 카메라 위치 조절 등의 요인으로 인하여 객체를 오검출하는 경우가 빈번히 발생할 수 있다. 오검출로 인한 경보(False Alarm)는 위험 상황의 정확한 인지 및 신속한 대응을 저해한다. 특히, CCTV 영상은 유사한 배경영상이 지속되기 때문에 일단 오검출이 발생하게 되면 유사한 영상에 대해 계속하여 잘못된 경보가 발생하는 문제가 생긴다.In addition, when an object is detected using the learned result, even if the object does not actually exist in the image, the object is frequently incorrectly detected due to factors such as lighting environment, weather change, and user's camera position adjustment. can A false alarm due to false detection hinders accurate recognition of dangerous situations and rapid response. In particular, since similar background images continue to exist in CCTV images, once an erroneous detection occurs, there is a problem in that false alarms continue to occur for similar images.

본 발명의 실시예들은 CCTV와 같은 고정형 카메라 환경에서 객체의 오검출을 최소화하면서도 객체 검출을 위한 서버장치의 운용 비용을 절감할 수 있는 영상분석 방법 및 서버장치를 제공하고자 한다.Embodiments of the present invention are intended to provide an image analysis method and server device that can reduce the operating cost of a server device for object detection while minimizing false detection of an object in a fixed camera environment such as CCTV.

본 발명의 실시예에 의하면, 각 채널별 입력영상으로부터 객체를 검출하여 인식하도록 학습된 신경망 기반의 복수의 채널별 객체 탐지부; 및 상기 각 채널별 입력영상 및 상기 복수의 채널별 객체 탐지부로부터 출력되는 탐지결과를 획득하여 각 채널에 대한 학습 데이터를 수집하는 학습 데이터 수집부; 수집한 학습 데이터를 이용하여 상기 복수의 채널별 객체 탐지부를 개별적으로 재학습시키는 학습부를 포함하는 영상분석 서버장치를 제공한다.According to an embodiment of the present invention, a plurality of object detection units for each channel based on a neural network learned to detect and recognize an object from an input image for each channel; and a learning data collection unit that collects learning data for each channel by acquiring the input image for each channel and the detection result output from the object detection unit for each of the plurality of channels. Provided is an image analysis server device including a learning unit for individually re-learning the object detection unit for each of the plurality of channels using the collected learning data.

본 발명의 실시예에 의하면, 상기 학습부는, 상기 학습 데이터를 이용하여 32비트 연산 기반으로 별도의 신경망을 훈련시키고, 상기 별도의 신경망을 8비트 연산 기반의 신경망에 최적화하는 캘리브레이션을 수행하여 상기 복수의 채널별 객체 탐지부를 개별적으로 재학습시킬 수 있다.According to an embodiment of the present invention, the learning unit trains a separate neural network based on a 32-bit operation using the training data, performs calibration to optimize the separate neural network for a neural network based on an 8-bit operation, and It is possible to individually re-learn the object detection unit for each channel of .

본 발명의 실시예에 의하면 상기 학습 데이터 수집부는, 각 채널별로 기설정된 시간주기마다 상기 각 채널에 대한 학습 데이터를 수집하고, 상기 학습부는, 상기 각 채널에 대한 학습 데이터가 기설정된 개수만큼 수집되는 경우 상기 복수의 채널별 객체 탐지부를 개별적으로 재학습시킬 수 있다.According to an embodiment of the present invention, the learning data collection unit collects learning data for each channel at a predetermined time period for each channel, and the learning unit collects a predetermined number of learning data for each channel. In this case, the object detection unit for each of the plurality of channels may be individually re-learned.

본 발명의 실시예에 의하면, 입력영상을 획득하는 과정; 상기 입력영상으로부터 객체를 검출하여 인식하도록 학습된 객체 탐지기에 의해 탐지결과가 생성되는 과정; 상기 각 채널별 입력영상 및 상기 복수의 채널별 객체 탐지부로부터 출력되는 탐지결과를 획득하여 각 채널에 대한 학습 데이터를 수집하는 과정; 및 수집한 학습 데이터를 이용하여 상기 복수의 채널별 객체 탐지부를 개별적으로 재학습시키는 과정을 포함하는 영상분석방법을 제공한다.According to an embodiment of the present invention, the process of obtaining an input image; generating a detection result by an object detector learned to detect and recognize an object from the input image; collecting learning data for each channel by acquiring the input image for each channel and the detection result output from the object detection unit for each of the plurality of channels; and individually re-learning the object detector for each of the plurality of channels using the collected learning data.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 각 채널별로 주기적으로 입력영상을 획득하여 객체 검출을 위한 최적화를 수행함으로써 영상 내 환경이 변화하더라도 오탐지가 발생하는 것을 방지하거나 최소화할 수 있는 효과가 있다. As described above, according to the embodiments of the present invention, by periodically acquiring an input image for each channel and performing optimization for object detection, even if the environment in the image changes, the occurrence of false positives can be prevented or minimized. It works.

또한, 객체 검출에 필요한 연산량 및 메모리를 감소시킴으로써 서버장치의 제한된 하드웨어 성능 하에서 더 많은 수의 영상 채널 서비스를 제공할 수 있게 되어 서비스 운용에 필요한 비용이 절감되는 효과가 있다.In addition, by reducing the amount of computation and memory required for object detection, it is possible to provide a larger number of video channel services under limited hardware performance of the server device, thereby reducing costs required for service operation.

도 1은 본 발명의 실시예에 따른 영상감시 시스템의 블록도이다.
도 2는 본 발명의 실시예에 따른 영상분석 서버장치의 구성을 간략히 나타낸 블록도이다.
도 3은 본 발명의 실시예에 따른 객체 탐지부의 학습 방법을 설명하기 위한 블록도이다.
도 4는 본 발명의 실시예에 따른 학습부의 연산 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 영상분석 방법의 흐름도이다.
도 6은 본 발명의 실시예에 따른 영상분석 방법의 일례를 나타내는 흐름도이다.1 is a block diagram of a video surveillance system according to an embodiment of the present invention.
2 is a block diagram briefly showing the configuration of a video analysis server device according to an embodiment of the present invention.
3 is a block diagram illustrating a learning method of an object detection unit according to an embodiment of the present invention.
4 is a diagram for explaining an operation process of a learning unit according to an embodiment of the present invention.
5 is a flowchart of an image analysis method according to an embodiment of the present invention.
6 is a flowchart illustrating an example of an image analysis method according to an embodiment of the present invention.

이하, 본 발명의 일부 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 본 발명을 설명함에 있어, '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, '... Terms such as 'unit' and 'module' refer to a unit that processes at least one function or operation, and may be implemented as hardware, software, or a combination of hardware and software.

본 발명은 객체의 오검출을 최소화하기 위하여 CCTV와 같은 고정형 카메라 환경의 특성 즉, 변화가 거의 없이 유사한 배경영상이 지속적으로 캡쳐되는 특성을 이용하고자 한다. 보다 상세하게, 비슷한 배경영상으로 객체 탐지 및 인식을 위한 신경망을 재학습(update)시킴으로써 객체가 존재하지 않는 배경영상으로부터 객체가 검출되는 것을 최소화하는 방법을 제안한다.The present invention intends to use the characteristic of a fixed camera environment such as CCTV, that is, the characteristic that a similar background image is continuously captured with little change in order to minimize false detection of an object. More specifically, we propose a method of minimizing the detection of an object from a background image in which no object exists by retraining (updating) a neural network for object detection and recognition with a similar background image.

도 1은 본 발명의 실시예에 따른 영상감시 시스템의 블록도이다.1 is a block diagram of a video surveillance system according to an embodiment of the present invention.

영상감시 시스템(100)은 영상촬영 장치(110), 영상분석 서버장치(120) 및 모니터링 장치(130)를 포함한다.The video surveillance system 100 includes an image capture device 110 , a video analysis server device 120 and a monitoring device 130 .

영상촬영 장치(110)는 영상분석 서버장치(120)로 입력될 영상을 실시간으로 생성한다. 영상촬영 장치(110)는 CCTV용 카메라와 같이 고정형 카메라로 구현될 수 있으나, 반드시 이에 한정되는 것은 아니고 영상분석 서버장치(120)의 입력영상을 생성할 수 있는 모든 장치를 포함할 수 있다. 도시된 것과 같이, 영상감시 시스템(100)에서 영상촬영 장치(110)는 하나 이상 포함되며, 서로 다른 위치에 설치되어 여러 장소를 촬영한 영상을 생성한다.The image capture device 110 generates an image to be input to the image analysis server device 120 in real time. The image capture device 110 may be implemented as a fixed camera such as a CCTV camera, but is not necessarily limited thereto and may include any device capable of generating an input image of the image analysis server device 120. As shown, one or more image capture devices 110 are included in the video surveillance system 100 and are installed in different locations to generate images of various locations.

영상분석 서버장치(120)는 영상촬영 장치(110)로부터 수신한 입력영상을 실시간으로 분석하여 객체를 탐지 및 인식하고, 분석결과를 기초로 사전에 정의된 하나 이상의 이벤트를 감지한다. 영상분석 서버장치(120)는 감지된 이벤트에 따른 정보를 모니터링 장치(130)로 전송한다. 영상분석 서버장치(120)는 각 모니터링 장치(130)와 매칭되는 하나 이상의 영상촬영 장치(110)에서 촬영된 영상을 전송하며, 이외의 영상촬영 장치(110)에서 촬영된 영상에는 접근이 불가능하도록 암호화하여 관리한다. The image analysis server device 120 analyzes the input image received from the image capture device 110 in real time to detect and recognize an object, and detects one or more predefined events based on the analysis result. The video analysis server device 120 transmits information according to the detected event to the monitoring device 130. The video analysis server device 120 transmits images captured by one or more imaging devices 110 that are matched with each monitoring device 130, and prevents access to images captured by other imaging devices 110. Encrypt and manage

영상분석 서버장치(120)는 객체를 탐지하고 인식하도록 미리 설계된 신경망을 학습(training)하고, 학습된 신경망을 기반으로 입력영상 내 객체를 탐지 및 인식한다. 예를 들어, 신경망으로는 컨볼루션 신경망(CNN)이 이용될 수 있다. 컨볼루션 신경망(CNN)은 입력영상에 대해 컨볼루션 연산을 수행하는 하나 이상의 컨볼루션 층(Convolution Layer) 및 컨볼루션 층의 출력을 샘플링하는 하나 이상의 풀링 층(Pooling Layer)을 포함한다. 다만, 컨볼루션 신경망(CNN)은 예시일 뿐 반드시 이에 한정되는 것은 아니며, 순환 신경망(RNN: Recurrent Neural Network) 또는 CNN 및 RNN의 조합 등 그 밖의 다양한 신경망이 이용될 수 있다.The image analysis server device 120 trains a neural network designed in advance to detect and recognize objects, and detects and recognizes objects in an input image based on the learned neural network. For example, a convolutional neural network (CNN) may be used as a neural network. A convolutional neural network (CNN) includes one or more convolution layers that perform convolution operations on input images and one or more pooling layers that sample outputs of the convolution layers. However, the convolutional neural network (CNN) is only an example and is not necessarily limited thereto, and various other neural networks such as a recurrent neural network (RNN) or a combination of CNN and RNN may be used.

모니터링 장치(130)는 영상촬영 장치(110)로부터 촬영된 영상을 수신하여 실시간으로 디스플레이할 수 있다. 또한, 모니터링 장치(130)는 영상분석 서버장치(120)로부터 수신한 정보에 따라 경보 발생 등과 같은 적절한 이벤트를 발생시킬 수 있다. The monitoring device 130 may receive an image captured by the imaging device 110 and display it in real time. In addition, the monitoring device 130 may generate an appropriate event such as an alarm according to information received from the video analysis server device 120 .

도 2는 본 발명의 실시예에 따른 영상분석 서버장치의 구성을 간략히 나타낸 블록도이다.2 is a block diagram briefly showing the configuration of a video analysis server device according to an embodiment of the present invention.

영상분석 서버장치(120)는 입력영상 획득부(210), 객체 탐지부(220), 학습 데이터 수집부(230) 및 학습부(240)를 포함한다. 도 2에 도시한 각 구성요소는 하드웨어 칩으로 구현될 수 있으며, 또는 소프트웨어로 구현되고 마이크로프로세서가 각 구성요소에 대응하는 소프트웨어의 기능을 실행하도록 구현될 수도 있다.The image analysis server device 120 includes an input image acquisition unit 210, an object detection unit 220, a learning data collection unit 230, and a learning unit 240. Each component shown in FIG. 2 may be implemented as a hardware chip or may be implemented as software, and a microprocessor may execute software functions corresponding to each component.

입력영상 획득부(210)는 영상촬영 장치(110)로부터 입력영상을 수신하여 이를 캡쳐한다. 입력영상 획득부(210)는 캡쳐한 입력영상을 객체 탐지 및 인식 신경망의 입력층(input layer)의 크기로 조정(resizing)된다. 조정된 입력영상은 객체 탐지부(220)로 전달된다. 또한, 조정된 입력영상은 객체 탐지부(220)의 재학습(update)을 위해 학습 데이터 수집부(230)에도 전달된다. The input image acquisition unit 210 receives an input image from the image capture device 110 and captures it. The input image acquisition unit 210 resizes the captured input image to the size of an input layer of an object detection and recognition neural network. The adjusted input image is transmitted to the object detection unit 220 . Also, the adjusted input image is transmitted to the learning data collection unit 230 for re-learning (update) of the object detection unit 220 .

객체 탐지부(220)는 학습된 신경망을 기반으로 입력영상으로부터 객체를 검출하여 인식한다. 객체 탐지부(220)는 신경망에 포함된 여러 개의 층(예: Convolution Layer, Max-pooling Layer 등)에서의 계산을 통해 검출 후보를 도출한다. 객체 탐지부(220)는 검출 후보에 대하여 임계값(threshold)을 적용하여 최종 탐지결과를 생성 및 출력한다. 출력된 최종 탐지결과는 객체 탐지부(220)의 재학습(update)을 위해 학습 데이터 수집부(230)에 전달된다.The object detection unit 220 detects and recognizes an object from an input image based on the learned neural network. The object detection unit 220 derives detection candidates through calculations in several layers (eg, convolution layer, max-pooling layer, etc.) included in the neural network. The object detection unit 220 generates and outputs a final detection result by applying a threshold to detection candidates. The output final detection result is transmitted to the learning data collection unit 230 for re-learning (update) of the object detection unit 220 .

출력된 탐지결과는 객체의 위치정보, 객체의 클래스정보 및 객체의 신뢰도 중 적어도 하나를 포함한다. 여기서, 객체의 위치정보는 객체가 검출된 영역(또는 Box)에 대한 정보(예: 위치좌표, 영역의 폭과 높이)를 나타낸다. 객체의 클래스정보는 객체가 어떤 클래스(예: 사람, 차량 등)로 분류되는지를 나타낸다. 객체의 신뢰도는 객체가 최종적으로 결정된 클래스에 속할 확률을 나타낸다. 예를 들어, 객체가 존재하지 않는 경우 이상적인 신뢰도 값은 0이 된다. 하나의 입력영상 내에 복수 개의 검출영역이 존재하는 경우, 각 검출영역마다 위와 같은 탐지결과가 출력될 수 있다.The output detection result includes at least one of object location information, object class information, and object reliability. Here, the location information of the object represents information (eg, location coordinates, width and height of the area) on the area (or box) in which the object is detected. Class information of an object indicates which class (eg, person, vehicle, etc.) the object is classified into. The reliability of an object represents the probability that an object belongs to a finally determined class. For example, if the object does not exist, the ideal reliability value is 0. When a plurality of detection areas exist in one input image, the above detection result may be output for each detection area.

객체 탐지부(220)는 8비트 연산 기반의 신경망을 기반으로 객체를 검출하여 인식한다. 객체 탐지부(220)는 주어진 입력-출력으로 이루어진 학습 데이터에 대해 학습을 마친 신경망을 이용하여, 새로운 입력에 대한 객체 탐지결과를 출력하는 구성으로, 학습된 신경망의 추론과정을 거쳐 결과를 출력하는데, 이 때 학습된 신경망을 구성하는 파라미터는 8비트로 표현되며 추론과정에서 필요한 연산도 8비트 단위로 이루어진다. The object detection unit 220 detects and recognizes an object based on an 8-bit arithmetic-based neural network. The object detection unit 220 is a configuration that outputs an object detection result for a new input using a neural network that has been trained on learning data consisting of given input-output, and outputs a result through an inference process of the learned neural network. , At this time, the parameters constituting the learned neural network are expressed in 8 bits, and the calculations necessary for the inference process are also performed in units of 8 bits.

도 2에는 도시하지 않았지만 영상분석 서버장치(120)는 탐지결과를 기반으로 상황을 판단하기 위한 구성요소를 더 포함할 수 있다. 상황 판단을 위한 구성요소(미도시)는 탐지된 객체를 이용하여 입력영상 내에서 발생한 상황을 감지할 수 있다. 예를 들어, 본 구성요소는 감지하고자 하는 상황이 사람이 나타나는 상황인 경우 사람이 탐지되는지를 판단하고, 사람이 차량에서 내리는 상황을 감지하고자 하는 경우에는 사람과 차량이 동시에 탐지되는지를 판단할 수 있다. 또한, 본 구성요소는 탐지된 각각의 객체의 신뢰도 값에 따라 상황 판단을 할 수도 있다.Although not shown in FIG. 2 , the video analysis server device 120 may further include a component for determining a situation based on a detection result. A component (not shown) for determining a situation may detect a situation occurring within the input image using the detected object. For example, this component may determine whether a person is detected when the situation to be detected is a situation in which a person appears, and determine whether a person and a vehicle are simultaneously detected when a situation to be detected is a situation where a person is getting out of a vehicle. there is. In addition, this component may make a situation judgment according to the reliability value of each detected object.

학습 데이터 수집부(230)는 영상 입력 획득부(210) 및 객체 탐지부(220)로부터 입력영상 및 객체 탐지결과를 획득하여 학습 데이터를 수집한다. 학습 데이터 수집부(230)는 복수의 영상촬영 장치(110)로부터 수신되는 입력영상을 각 채널별로 구분하여 획득하고, 각 채널별 입력영상에 따라 객체 탐지부(220)로부터 출력되는 탐지결과를 획득하여 각 채널에 대한 학습 데이터를 수집한다. 수집한 학습 데이터는 객체 탐지부(220)를 재학습시키는 데에 이용되며, 입력영상 및 그에 대응되는 탐지결과 쌍으로 구성된다. The learning data collection unit 230 obtains an input image and an object detection result from the image input acquisition unit 210 and the object detection unit 220 to collect learning data. The learning data collection unit 230 classifies and obtains input images received from the plurality of imaging devices 110 for each channel, and obtains a detection result output from the object detection unit 220 according to the input images for each channel. to collect learning data for each channel. The collected learning data is used to re-learn the object detection unit 220, and is composed of an input image and a pair of detection results corresponding thereto.

학습 데이터 수집부(230)는 주기적으로 학습 데이터를 수집할 수 있다. 학습 데이터 수집부(230)에서 학습 데이터를 수집하는 시간주기는 각 채널별로 서로 다르게 설정될 수 있으며, 설정된 시간에 대한 학습 데이터를 수집하거나 일정 시간 간격으로 학습 데이터를 수집할 수 있다. 또한, 학습 데이터 수집부(230)는 입력영상에서 변화를 감지하고, 변화가 감지될 때마다 학습 데이터를 수집할 수 있다. 입력영상의 변화는, 주간에서 야간으로 가면서 영상 내 객체는 그대로인 채 조도만 변하는 경우나, 기후의 변화로 인해 영상 내 배경에서 비나 바람의 영향으로 배경 사물이 움직이는 경우를 포함하여 다양한 경우가 있을 수 있다. The learning data collection unit 230 may periodically collect learning data. A time period for collecting learning data in the learning data collection unit 230 may be set differently for each channel, and learning data may be collected for a set time or at regular time intervals. In addition, the learning data collection unit 230 may detect a change in the input image and collect learning data whenever a change is detected. Changes in the input image may include various cases, including cases in which only the illuminance changes while the object in the image remains the same as it goes from daytime to nighttime, or cases where background objects move due to rain or wind in the background of the image due to climate change. there is.

채널별 입력영상에 대해 각각 객체를 검출하는 성능을 최적화하기 위해, 학습 데이터 수집부(230)는 학습 데이터를 채널별로 구분하여 수집함으로써, 객체 탐지부(220)를 개별적으로 학습시킬 수 있도록 한다. 또한, 학습 데이터 수집부(230)에서는 객체의 오검출을 방지하기 위해, 영상 내 환경의 변화에 따라 다양하게 나타날 수 있는 입력영상을 수집하고, 객체 탐지부(220)를 학습시켜 변화된 환경에 대해서도 적절한 객체 검출 결과를 출력할 수 있도록 한다.In order to optimize the performance of detecting each object for each input image for each channel, the learning data collection unit 230 classifies and collects learning data for each channel so that the object detection unit 220 can learn individually. In addition, the learning data collection unit 230 collects input images that can appear in various ways according to changes in the environment in the image in order to prevent false detection of objects, and learns the object detection unit 220 to learn about the changed environment. Enables proper object detection results to be output.

학습부(240)는 수집한 학습 데이터를 이용하여 객체 탐지부(220)를 학습시킨다. 학습부(240)는 채널별로 수집한 학습 데이터를 이용하여, 복수의 채널에 대해 개별적으로 동작하는 복수의 채널별 객체 탐지부(220)를 개별적으로 학습시킬 수 있다. 객체 탐지부(220)를 학습시킨다는 것은 신경망에 포함된 노드들 사이의 기 학습된 파라미터(또는 가중치)들을 업데이트시킨다는 것이다. 학습부(240)는 학습에 적절하다고 판단되어 미리 설정된 수만큼 학습 데이터가 수집되었는지를 판단하거나, 설정된 기간 동안 학습 데이터를 수집하여 객체 탐지부(220)를 학습시킬 수 있다.The learning unit 240 learns the object detection unit 220 using the collected learning data. The learning unit 240 may individually learn the object detection unit 220 for each of a plurality of channels, which operates individually for a plurality of channels, using the learning data collected for each channel. Learning the object detector 220 means updating pre-learned parameters (or weights) between nodes included in the neural network. The learning unit 240 may determine whether a preset number of learning data has been collected as appropriate for learning, or may collect learning data for a set period of time to train the object detection unit 220 .

본 발명에서 입력영상 획득부(210) 및 객체 탐지부(220)는 영상 분석 서버장치(120) 내에서 객체를 검출하기 위한 영상분석을 수행하는 구성으로, 도 2에는 하나의 입력영상 획득부(210) 및 객체 탐지부(220)가 도시되어 있지만, 본 발명이 적용되는 시스템에서 영상 촬영 장치(110)의 개수와 동일한 수가 구비되며, 각 채널별 입력영상으로부터 서로 다른 채널별 객체 검출 결과를 제공할 수 있도록 개별적으로 동작한다. 이를 위해 각 채널별 객체 탐지부(220)는 서로 다른 파라미터를 갖는 신경망을 기반으로 객체를 검출하여 인식한다. 본 발명이 적용되는 시스템에서 영상촬영 장치(110)는 동일한 장소를 지속적으로 촬영하는 경우가 많기 때문에, 채널별 객체 탐지부(220)는 서로 다른 시간에 촬영된 동일 장소에 대한 영상을 학습함으로써 다양한 환경을 학습할 수 있고, 결과적으로 채널별로 최적화된 객체 검출 성능을 보여줄 수 있다. In the present invention, the input image acquisition unit 210 and the object detection unit 220 are components that perform image analysis to detect an object in the video analysis server device 120, and in FIG. 2, one input image acquisition unit ( 210) and object detection unit 220 are shown, but the same number as the number of image capturing devices 110 is provided in the system to which the present invention is applied, and object detection results for different channels are provided from input images for each channel. work individually to be able to To this end, the object detection unit 220 for each channel detects and recognizes an object based on a neural network having different parameters. In the system to which the present invention is applied, since the imaging device 110 continuously captures the same place in many cases, the object detection unit 220 for each channel learns images of the same place captured at different times, The environment can be learned, and as a result, object detection performance optimized for each channel can be shown.

이하, 도 3 및 도 4를 참조하여 본 발명의 실시예에 따른 학습부에서 객체 탐지부를 학습시키는 방법에 대하여 구체적으로 설명한다. 도 3은 본 발명의 실시예에 따른 객체 탐지부의 학습 방법을 설명하기 위한 블록도이다. 도 4는 본 발명의 실시예에 따른 학습부의 연산 과정을 설명하기 위한 도면이다.Hereinafter, a method for learning an object detection unit in a learning unit according to an embodiment of the present invention will be described in detail with reference to FIGS. 3 and 4 . 3 is a block diagram illustrating a learning method of an object detection unit according to an embodiment of the present invention. 4 is a diagram for explaining an operation process of a learning unit according to an embodiment of the present invention.

학습 데이터 수집부(230)에서는 채널별 입력영상 및 입력영상에 따른 탐지결과를 주기적으로 획득하여 학습 데이터를 수집한다. 학습 데이터 수집부(230)에서 수집된 학습 데이터는 각 채널에 따라 서로 다른 데이터를 포함하며, 복수의 영상 촬영 장치(110)마다 할당된 복수의 채널별 객체 탐지부(220)는 채널별 학습 데이터를 개별적으로 학습하고, 학습된 결과를 이용하여 서로 독립되어 개별적으로 동작한다.The learning data collection unit 230 collects learning data by periodically obtaining an input image for each channel and a detection result according to the input image. The learning data collected by the learning data collector 230 includes different data according to each channel, and the object detection unit 220 for each channel assigned to each of the plurality of image capturing devices 110 uses the learning data for each channel. are individually learned, and operate independently of each other using the learned results.

학습부(240)는 학습 데이터를 이용하여 객체 탐지부(220)를 학습시키는 구성으로, 32비트 연산부(310) 및 8비트 연산 최적화부(320)를 포함한다. 학습부(240)의 학습은 32비트 연산 기반의 신경망에서 먼저 이루어지고 이를 8비트 연산에 최적화하는 캘리브레이션(calibration)을 수행하여 객체 탐지부(220)를 학습시킨다. 즉, 본 발명에서 수집된 학습 데이터에 대해서는 32비트(FP32) 연산을 기반으로 학습하고, 실시간으로 이루어지는 객체탐지에서는 8비트(INT8) 연산을 기반으로 추론한다.The learning unit 240 is configured to train the object detection unit 220 using learning data, and includes a 32-bit operation unit 310 and an 8-bit operation optimization unit 320. The training of the learning unit 240 is first performed in a neural network based on 32-bit operation, and the object detection unit 220 is trained by performing calibration to optimize it for 8-bit operation. That is, the learning data collected in the present invention is learned based on 32-bit (FP32) operation, and in real-time object detection, inference is made based on 8-bit (INT8) operation.

32비트 연산부(310)는 수집한 학습 데이터를 이용하여 입력영상과 그에 따른 탐지결과를 학습한다. 32비트 연산부(310)는 기존의 신경망 연산에서 이용하는 32비트 기반 신경망을 이용하여 학습 데이터를 학습한다.The 32-bit operation unit 310 uses the collected learning data to learn an input image and a detection result accordingly. The 32-bit arithmetic unit 310 learns training data using a 32-bit based neural network used in conventional neural network calculation.

8비트 연산 최적화부(320)는 32비트 연산부(310)에서 학습된 32비트 기반의 신경망을 8비트 기반의 신경망에 최적화하는 연산을 수행한다. 기존에 32비트로 표현되었던 파라미터를 8비트로 그대로 양자화(quantization)하게 되면, 정보 손실이 발생할 수 밖에 없으므로, 8비트 연산 최적화부(320)는, 도 4에 도시된 것과 같이, 32비트 연산의 데이터 범위 분포를 8비트 연산의 데이터 범위에 근사화(approximation)하여, 8비트 기반 신경망에서도 적절히 동작할 수 있도록 최적화하는 캘리브레이션을 수행한다.The 8-bit operation optimizer 320 performs an operation to optimize the 32-bit-based neural network learned in the 32-bit operation unit 310 to the 8-bit-based neural network. If a parameter that was previously expressed in 32 bits is quantized as it is in 8 bits, information loss inevitably occurs, so the 8-bit operation optimizer 320, as shown in FIG. 4, the data range of 32-bit operation By approximating the distribution to the data range of 8-bit operation, calibration is performed to optimize it so that it can operate properly even in an 8-bit based neural network.

객체 탐지부(220)는 8비트 연산 최적화부(320)에 의해 8비트 연산에 최적화된 신경망을 기반으로 입력영상에 대한 탐지결과를 출력한다. 신경망을 이용하여 영상에서 객체를 탐지하는 기술은 영상 특정 영역에서의 움직임을 검출하는 것으로, 높은 정밀도의 연산을 요구하지 않는다. 따라서, 본 발명에서는 다수의 영상 촬영 장치(110)로부터 생성된 입력영상을 실시간으로 영상분석 서버장치(120)에서 처리하기 위하여 8비트 연산 기반의 신경망을 이용하여, 서버에서 실시간 영상분석을 위해 요구하는 연산량 및 메모리를 감소시키고, 연산에 필요한 시간도 감소시킨다. The object detection unit 220 outputs a detection result for an input image based on a neural network optimized for 8-bit operation by the 8-bit operation optimization unit 320 . A technique for detecting an object in an image using a neural network detects motion in a specific region of an image and does not require high-precision calculation. Therefore, in the present invention, in order to process input images generated from a plurality of image capturing devices 110 in real time in the image analysis server device 120, an 8-bit arithmetic-based neural network is used, and the server requests for real-time image analysis. It reduces the amount of computation and memory required for computation, and also reduces the time required for computation.

이하, 도 5 및 도 6을 참조하여 본 발명의 실시예에 따른 영상분석방법에 대하여 설명한다. 도 5는 본 발명의 실시예에 따른 영상분석 방법의 흐름도이다.Hereinafter, an image analysis method according to an embodiment of the present invention will be described with reference to FIGS. 5 and 6 . 5 is a flowchart of an image analysis method according to an embodiment of the present invention.

도 5를 참조하면, 우선 단계 S510에서 입력영상을 획득한다. 구체적으로, 영상촬영 장치로부터 전달된 입력영상을 캡쳐하고, 캡쳐한 입력영상을 객체 탐지 및 인식 신경망의 입력층(input layer)의 크기로 조정(resizing)한다.Referring to FIG. 5 , first, an input image is acquired in step S510. Specifically, an input image delivered from an imaging device is captured, and the captured input image is resized to the size of an input layer of an object detection and recognition neural network.

입력영상이 획득되면, 입력영상으로부터 객체를 검출하여 인식하도록 학습된 객체 탐지기에 의해 탐지결과가 생성된다(S520). 객체 탐지기는 신경망에 포함된 여러 개의 층(예: Convolution Layer, Max-pooling Layer 등)에서의 계산을 통해 검출 후보를 도출한다. 객체 탐지기는 검출 후보에 대하여 임계값(threshold)을 적용하여 최종 탐지결과를 생성 및 출력한다. When an input image is obtained, a detection result is generated by an object detector learned to detect and recognize an object from the input image (S520). The object detector derives detection candidates through calculations in several layers (eg, convolution layer, max-pooling layer, etc.) included in the neural network. The object detector generates and outputs a final detection result by applying a threshold to detection candidates.

이 때, 객체 탐지기는 8비트 연산을 기반으로 하는 신경망을 기반으로 객체를 검출하여 인지한다. 즉, 객체 탐지기는 학습된 신경망의 추론과정을 거쳐 결과를 출력하는데, 이 때 학습된 신경망을 구성하는 파라미터는 8비트로 표현되며 추론과정에서 필요한 연산도 8비트 단위로 이루어진다. At this time, the object detector detects and recognizes the object based on a neural network based on 8-bit operation. That is, the object detector outputs the result through the inference process of the learned neural network. At this time, the parameters constituting the learned neural network are expressed in 8 bits, and the calculation required in the inference process is also performed in units of 8 bits.

단계 S530에서는 단계 S510의 채널별 입력영상 및 단계 S520의 객체 탐지결과를 주기적으로 획득하여 학습 데이터를 수집한다. 구체적으로, 복수의 영상촬영 장치로부터 수신되는 입력영상을 각 채널별로 구분하여 획득하고, 각 채널별 입력영상에 따른 탐지결과를 획득하여 각 채널에 대한 학습 데이터를 수집한다. 학습 데이터는 객체 탐지기가 기반으로 하는 신경망을 학습시키는 데에 이용되며, 입력영상 및 그에 대응되는 탐지결과 쌍으로 구성된다.In step S530, learning data is collected by periodically acquiring the input image for each channel in step S510 and the object detection result in step S520. Specifically, input images received from a plurality of imaging devices are obtained by dividing them into respective channels, and learning data for each channel is collected by obtaining a detection result according to the input images of each channel. The training data is used to train the neural network based on the object detector, and is composed of an input image and a pair of detection results corresponding thereto.

이 때의 학습 데이터를 수집하는 시간주기는 각 채널별로 서로 다르게 설정될 수 있으며, 설정된 시간에 대한 학습 데이터를 수집하거나 일정 시간 간격으로 학습 데이터를 수집할 수 있다. 또한, 입력영상에서 변화를 감지하고, 변화가 감지될 때마다 학습 데이터를 수집할 수 있다. A time period for collecting learning data at this time may be set differently for each channel, and learning data for a set time may be collected or learning data may be collected at regular time intervals. In addition, a change in the input image may be detected, and learning data may be collected whenever a change is detected.

마지막으로, 수집한 학습 데이터를 이용하여 객체 탐지기를 학습시킨다(S540). 학습기는 채널별로 수집한 학습 데이터를 이용하여 복수의 채널에 대해 개별적으로 동작하는 복수의 객체 탐지기를 개별적으로 학습시킬 수 있다. 객체 탐지기를 학습시킨다는 것은 신경망에 포함된 노드들 사이의 기 학습된 파라미터(또는 가중치)들을 업데이트시킨다는 것이다. 학습기는 학습에 적절하다고 판단되어 미리 설정된 수만큼 학습 데이터가 수집되면, 객체 탐지기를 학습시킬 수도 있고, 일정한 기간동안 수집된 학습 데이터를 이용하여 객체 탐지기를 학습시킬 수 있다.Finally, the object detector is trained using the collected learning data (S540). The learner may individually train a plurality of object detectors that operate individually for a plurality of channels using learning data collected for each channel. Learning the object detector means updating pre-learned parameters (or weights) between nodes included in the neural network. The learner may learn the object detector when a preset number of learning data is collected as it is determined to be appropriate for learning, or the object detector may be trained using the learning data collected for a certain period of time.

학습기는 32비트 연산 기반의 별도의 신경망을 이용하여 학습 데이터를 학습하고, 이후 8비트 연산에 최적화하는 캘리브레이션(calibration)을 수행하여 객체 탐지기를 학습시킨다. 즉, 본 실시예에서 수집된 학습 데이터에 대해서는 32비트 연산을 기반으로 학습하고, 실시간으로 이루어지는 객체탐지에서는 8비트 연산을 기반으로 추론한다.The learner learns training data using a separate neural network based on 32-bit operation, and then performs calibration optimized for 8-bit operation to learn the object detector. That is, the learning data collected in this embodiment is learned based on 32-bit operation, and object detection performed in real time is inferred based on 8-bit operation.

32비트 연산 기반 학습을 위한 별도의 신경망에서는 수집한 학습 데이터를 이용하여 입력영상과 그에 따른 탐지결과를 학습한다. 이후, 학습된 32비트 기반의 별도의 신경망을 8비트 기반의 신경망에 최적화하는 연산을 수행한다. 기존에 32비트로 표현되었던 파라미터를 8비트로 양자화(quantization)하게 되면, 정보 손실이 발생할 수 밖에 없기 때문에, 32비트 연산의 데이터 범위 분포를 8비트 연산의 데이터 범위에 근사화(approximation)하여, 8비트 기반 신경망에서도 적절히 동작할 수 있도록 최적화하는 캘리브레이션을 수행하여 객체 탐지기를 학습시킨다.A separate neural network for 32-bit arithmetic-based learning learns the input image and its detection result using the collected learning data. Thereafter, an operation to optimize the learned 32-bit-based separate neural network to an 8-bit-based neural network is performed. If a parameter that was previously expressed in 32 bits is quantized into 8 bits, information loss inevitably occurs, so the data range distribution of 32-bit operations is approximated to the data range of 8-bit operations, The object detector is trained by performing calibration that is optimized to operate properly in the neural network.

도 6은 본 발명의 실시예에 따른 영상분석 방법의 일례를 나타내는 흐름도이다.6 is a flowchart illustrating an example of an image analysis method according to an embodiment of the present invention.

단계 S610에서 입력영상을 획득한 후, 입력영상 내 객체를 탐지 및 인식하여 탐지결과를 생성한다(S620). 그리고 생성된 탐지결과를 이용하여 상황 판단을 할 수 있다(S630). 즉, 탐지된 객체를 이용하여 입력영상 내에서 발생한 상황을 감지할 수 있다. 예를 들어, 감지하고자 하는 상황이 사람이 나타나는 상황인 경우 사람이 탐지되는지를 판단하고, 사람이 차량에서 내리는 상황을 감지하고자 하는 경우에는 사람과 차량이 동시에 탐지되는지를 판단할 수 있다. 또한, 탐지된 각각의 객체의 신뢰도 값에 따라 상황 판단을 할 수도 있다.After obtaining an input image in step S610, an object in the input image is detected and recognized to generate a detection result (S620). Then, a situation may be judged using the generated detection result (S630). That is, a situation occurring in the input image can be detected using the detected object. For example, if a situation to be detected is a situation in which a person appears, it may be determined whether a person is detected, and if a situation in which a person is getting out of a vehicle is to be detected, it may be determined whether a person and a vehicle are simultaneously detected. In addition, a situation may be determined according to the reliability value of each detected object.

한편, 객체 탐지기를 재학습시킬지 여부를 판단하기 위하여 우선 채널별 입력영상 및 객체 탐지결과를 획득하여 설정된 시간주기에 따른 학습 데이터를 수집한다(S640). 단계 S640에서는 학습 데이터를 주기적으로 수집하는데, 수집된 학습 데이터의 수가 설정값에 도달한 경우에는(S650) 학습에 충분한 정도로 학습 데이터가 수집되었으므로 채널별 객체 탐지기를 학습시킨다(S660). Meanwhile, in order to determine whether to re-learn the object detector, first, input images and object detection results for each channel are obtained, and learning data according to a set time period is collected (S640). In step S640, learning data is periodically collected. When the number of collected learning data reaches a set value (S650), since the learning data is collected to a sufficient extent for learning, the object detector for each channel is trained (S660).

도 5 및 도 6에서는 각 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 반드시 이에 한정되는 것은 아니다. 다시 말해, 도 5 및 도 6에 기재된 과정을 변경하여 실행하거나 하나 이상의 과정을 병렬적으로 실행하는 것으로 적용 가능할 것이므로, 도 5 및 도 6는 시계열적인 순서로 한정되는 것은 아니다.5 and 6 describe that each process is sequentially executed, but is not necessarily limited thereto. In other words, since the processes described in FIGS. 5 and 6 may be changed and executed or one or more processes may be applied in parallel, FIGS. 5 and 6 are not limited to a time-series sequence.

도 5 및 도 6에 기재된 본 실시예에 따른 영상분석방법은 프로그램으로 구현되고 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다. 본 실시예에 따른 영상분석방법을 구현하기 위한 프로그램이 기록되고 컴퓨터가 읽을 수 있는 기록매체는 컴퓨팅 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다.The image analysis method according to the present embodiment described in FIGS. 5 and 6 may be implemented as a program and recorded on a computer-readable recording medium. A computer-readable recording medium on which a program for implementing the image analysis method according to the present embodiment is recorded includes all kinds of recording devices storing data that can be read by a computing system.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an example of the technical idea of the present embodiment, and various modifications and variations can be made to those skilled in the art without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment, but to explain, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

100: 영상감시시스템 100: video surveillance system

Claims

a plurality of object detection units for each channel based on a neural network trained to detect and recognize an object from an input image for each channel; and
A learning data collection unit that collects learning data for each channel by acquiring the input image for each channel and the detection result output from the object detection unit for each of the plurality of channels. consists of pairs of detection results for it;
A learning unit that individually relearns the object detection unit for each of the plurality of channels using the collected learning data.
including,
The learning unit,
Using the learning data, a separate neural network is trained based on 32-bit operation, and calibration is performed to optimize the separate neural network for a neural network based on 8-bit operation, thereby individually re-learning the object detection unit for each of the plurality of channels. Image analysis server device characterized by.

delete

According to claim 1,
The learning unit,
Approximating the data distribution range of the separate neural network to an 8-bit data range and optimizing the 8-bit operation-based neural network.

According to claim 1,
The learning data collection unit,
Collecting learning data for each channel at a predetermined time period for each channel;
The learning unit,
When a predetermined number of learning data for each channel is collected, the object detection unit for each of the plurality of channels is individually re-learned.

According to claim 1,
The learning data collection unit,
The video analysis server device, characterized in that for detecting a change in the input image for each channel, and collecting learning data for each channel by obtaining the input image for each channel and the detection result whenever a change is detected.

Acquiring an input image for each channel;
a process of generating a detection result by a plurality of object detectors for each channel learned to detect and recognize an object from an input image for each channel;
Acquiring the input image for each channel and the detection results output from the object detectors for each of the plurality of channels to collect learning data for each channel, the learning data for a given channel is the input image of the corresponding channel and the detection result for it Consists of pairs of; and
Process of individually re-learning the object detector for each of the plurality of channels using the collected learning data
including,
The process of individually learning the object detector,
A process of individually retraining the object detector for each of the plurality of channels by training a separate neural network based on a 32-bit operation using the learning data and performing calibration to optimize the separate neural network for a neural network based on an 8-bit operation Image analysis method comprising a.

delete

According to claim 6,
The process of individually relearning the object detector,
The image analysis method characterized in that the data distribution range of the separate neural network is approximated to an 8-bit data range and optimized for the 8-bit operation-based neural network.

According to claim 6,
The process of collecting the learning data,
Collecting learning data for each channel at a predetermined time period for each channel;
The process of individually learning the object detector,
When a predetermined number of learning data for each channel is collected, individually re-learning the object detector for each of the plurality of channels.

According to claim 6,
The process of collecting the learning data,
and detecting a change in the input image for each channel, acquiring the input image for each channel and the detection result whenever a change is detected, and collecting learning data for each channel.