KR20210024935A

KR20210024935A - Apparatus for monitoring video and apparatus for analyzing video, and on-line machine learning method

Info

Publication number: KR20210024935A
Application number: KR1020190104761A
Authority: KR
Inventors: 김태완; 강충헌; 김범준; 김용성; 양승지
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2019-08-26
Filing date: 2019-08-26
Publication date: 2021-03-08

Abstract

An image monitoring device for performing online machine learning of the disclosed invention comprises: a photographing unit generating a photographed image by photographing a monitoring area; an object detection unit configured to detect a monitoring target object from the photographed image inputted from the photographing unit in a state initially learned to detect the monitoring target object in the image; and a learning unit additionally learning the object detection unit to reflect the photographing environment for the monitoring area using the selected background image after selecting at least one background image from among the photographed images.

Description

Video surveillance device, video analysis device, and online machine learning method {APPARATUS FOR MONITORING VIDEO AND APPARATUS FOR ANALYZING VIDEO, AND ON-LINE MACHINE LEARNING METHOD}

본 발명은 영상을 촬영하여 감시하는 장치와 촬영 영상을 분석하는 장치 및 이들 장치들이 영상으로부터 객체를 탐지하기 위해 기계 학습하는 온라인 기계 학습 방법에 관한 것이다.The present invention relates to an apparatus for capturing and monitoring an image, an apparatus for analyzing a captured image, and an online machine learning method in which these devices perform machine learning to detect an object from an image.

CCTV(Closed Circuit Television), 휴대폰 카메라 등을 활용한 다양한 컴퓨터 비젼 기술이 사진(이미지) 및 동영상(비디오)에 적용되어 사진이나 동영상으로부터 사람, 자동차, 동물 등의 원하는 객체를 검출하고 인식할 수 있게 되었고, 딥 러닝(Deep Learning) 등의 기계 학습법과 접목하여 객체 검출 및 인식의 정확성 향상을 도모하고 있으나, 만족할만한 수준에 이르지 못하였다.Various computer vision technologies using CCTV (Closed Circuit Television) and mobile phone cameras are applied to photos (images) and videos (videos) to detect and recognize desired objects such as people, cars, and animals from photos or videos. It is trying to improve the accuracy of object detection and recognition by combining it with machine learning methods such as deep learning, but it has not reached a satisfactory level.

등록특허공보 제10-1910542호, 등록일자 2018년 10월 16일.Registered Patent Publication No. 10-1910542, registration date October 16, 2018.

일 실시예에 따르면, 영상으로부터 객체를 탐지하도록 사전에 초기 학습된 객체 탐지기에 대하여 감시 영역에 대한 촬영 영상을 이용해 감시 영역의 촬영 환경을 적응적으로 추가 학습시키는 영상 감시장치 및 그 학습 방법을 제공한다.According to an embodiment, there is provided an image monitoring apparatus and a learning method for adaptively additionally learning a photographing environment of a monitoring region by using a photographed image of a monitoring region with respect to an object detector that has been initially learned to detect an object from an image. do.

또한, 다른 실시예에 따르면, 영상으로부터 객체를 탐지하도록 사전에 초기 학습된 서버단의 객체 탐지기에 대하여 영상 감시장치의 촬영 영상을 이용하여 감시 영역의 촬영을 추가로 학습시키는 영상 분석 장치 및 그 학습 방법을 제공한다.In addition, according to another embodiment, an image analysis device for additionally learning the shooting of a surveillance area using a photographed image of an image monitoring device for an object detector of a server side, which is initially learned to detect an object from an image, and learning the same. Provides a way.

본 발명의 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The problem to be solved of the present invention is not limited to those mentioned above, and another problem to be solved that is not mentioned will be clearly understood by those of ordinary skill in the art from the following description.

본 발명의 제 1 관점에 따라 온라인 기계 학습을 수행하는 영상 감시장치는, 감시 영역을 촬영하여 촬영 영상을 생성하는 촬영부와, 영상 내에서 감시 대상 객체를 탐지하도록 초기 학습된 상태에서, 상기 촬영부로부터 입력되는 촬영 영상에서 상기 감시 대상 객체를 탐지하는 객체 탐지부와, 상기 촬영 영상 중 적어도 하나 이상의 배경 영상을 선정한 후, 상기 선정된 배경 영상을 이용하여 상기 감시 영역에 대한 촬영 환경이 반영되도록 상기 객체 탐지부를 추가 학습시키는 학습부를 포함할 수 있다.An image monitoring apparatus for performing online machine learning according to a first aspect of the present invention includes a photographing unit for generating a photographed image by photographing a monitoring region, and in a state in which the photographing unit is initially learned to detect an object to be monitored within the image. After selecting at least one background image among the object detection unit for detecting the object to be monitored from the captured image input from the unit and the captured image, the shooting environment for the surveillance area is reflected by using the selected background image. It may include a learning unit for additionally learning the object detection unit.

본 발명의 제 2 관점에 따라 영상으로부터 감시 대상 객체를 탐지하도록 초기 학습된 객체 탐지기의 온라인 기계 학습 방법은, 감시 영역을 촬영한 촬영 영상 중 배경 영상을 선정하는 단계와, 상기 선정된 배경 영상을 이용하여 상기 감시 영역에 대한 촬영 환경이 반영되도록 상기 객체 탐지기를 추가 학습시키는 단계를 포함할 수 있다.According to a second aspect of the present invention, an online machine learning method of an object detector initially learned to detect an object to be monitored from an image includes the steps of selecting a background image from a photographed image of a surveillance area, and selecting the selected background image. And further learning the object detector to reflect the photographing environment for the surveillance area.

본 발명의 제 3 관점에 따라 컴퓨터 프로그램을 저장하고 있는 컴퓨터 판독 가능 기록매체의 상기 컴퓨터 프로그램은, 상기 온라인 기계 학습 방법을 프로세서가 수행하도록 하기 위한 명령어를 포함한다.According to the third aspect of the present invention, the computer program of the computer-readable recording medium storing the computer program includes instructions for causing the processor to perform the online machine learning method.

본 발명의 제 4 관점에 따른 영상 분석 장치는, 영상으로부터 감시 대상 객체를 탐지하도록 사전 학습된 객체 탐지부와, 감시 영역을 촬영한 촬영 영상 중 배경 영상을 선정한 후, 상기 선정된 배경 영상을 이용하여 상기 감시 영역에 대한 촬영 환경이 반영되도록 상기 객체 탐지부를 추가 학습시키는 학습부를 포함할 수 있다.An image analysis apparatus according to a fourth aspect of the present invention includes an object detection unit pre-trained to detect an object to be monitored from an image, and after selecting a background image from a photographed image of a surveillance area, the selected background image is used. Thus, it may include a learning unit for additionally learning the object detection unit so that the photographing environment for the monitoring area is reflected.

본 발명의 실시예에 의하면, 영상으로부터 객체를 탐지하도록 사전에 초기 학습된 객체 탐지기에 대하여 감시 영역에 대한 촬영 영상을 이용해 감시 영역의 촬영을 적응적으로 추가 학습시킨다. 이로써, 객체 검출 및 인식 기능이 감시 영역의 촬영에 최적화되어 객체 탐지의 정확성이 크게 향상되는 효과가 있다.According to an embodiment of the present invention, an object detector, which is initially learned to detect an object from an image, is adaptively additionally learned to capture a surveillance area by using a captured image for the surveillance area. As a result, object detection and recognition functions are optimized for capturing the surveillance area, thereby greatly improving the accuracy of object detection.

도 1은 본 발명의 실시예들에 따른 영상 감시 및 분석 시스템의 구성도이다.
도 2는 도 1에 도시된 영상 감시장치의 구성도이다.
도 3은 본 발명의 제 1 실시예에 따른 영상 감시장치의 기계 학습 방법을 설명하기 위한 흐름도이다.
도 4는 본 발명의 제 2 실시예에 따른 영상 감시장치의 기계 학습 방법을 설명하기 위한 흐름도이다.
도 5는 도 4에 나타낸 영상 감시장치의 기계 학습 방법을 부연 설명하기 위하여 배경 영상을 선정하는 과정을 나타낸 도면이다.
도 6은 도 5에 나타낸 배경 영상을 선정하는 과정 중 획득되는 프레임간 차분 영상의 예시도이다.
도 7은 도 1에 도시된 영상 분석 서버장치의 구성도이다.
도 8은 본 발명의 제 3 실시예에 따른 영상 분석 서버장치의 기계 학습 방법을 설명하기 위한 흐름도이다.
도 9는 본 발명의 제 4 실시예에 따른 영상 분석 서버장치의 기계 학습 방법을 설명하기 위한 흐름도이다.
도 10은 본 발명의 실시예들에 따른 기계 학습 방법에 의한 객체 탐지 결과를 예시한 도면이다.
도 11 및 도 12는 본 발명의 실시예들에 따른 기계 학습 방법에서 추가 기계 학습에 이용되는 영상을 예시한 도면이다.1 is a block diagram of an image monitoring and analysis system according to embodiments of the present invention.
FIG. 2 is a block diagram of the video surveillance apparatus shown in FIG. 1.
3 is a flowchart illustrating a machine learning method of an image monitoring apparatus according to a first embodiment of the present invention.
4 is a flowchart illustrating a machine learning method of an image monitoring apparatus according to a second embodiment of the present invention.
5 is a diagram illustrating a process of selecting a background image in order to further explain the machine learning method of the video surveillance apparatus shown in FIG. 4.
6 is an exemplary diagram of an inter-frame difference image acquired during a process of selecting a background image shown in FIG. 5.
7 is a configuration diagram of the image analysis server device shown in FIG. 1.
8 is a flowchart illustrating a machine learning method of an image analysis server device according to a third embodiment of the present invention.
9 is a flowchart illustrating a machine learning method of an image analysis server device according to a fourth embodiment of the present invention.
10 is a diagram illustrating an object detection result by a machine learning method according to embodiments of the present invention.
11 and 12 are diagrams illustrating images used for additional machine learning in a machine learning method according to embodiments of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명의 범주는 청구항에 의해 정의될 뿐이다.Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various forms, and only these embodiments make the disclosure of the present invention complete, and those skilled in the art to which the present invention pertains. It is provided to fully inform the person of the scope of the invention, and the scope of the invention is only defined by the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명은 본 발명의 실시예들을 설명함에 있어 실제로 필요한 경우 외에는 생략될 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, detailed descriptions of known functions or configurations will be omitted except when actually necessary in describing the embodiments of the present invention. In addition, terms to be described later are terms defined in consideration of functions in an embodiment of the present invention, which may vary according to the intention or custom of users or operators. Therefore, the definition should be made based on the contents throughout the present specification.

도 1은 본 발명의 실시예들에 따른 영상 감시 및 분석 시스템(100)의 구성도이다.1 is a block diagram of an image monitoring and analysis system 100 according to embodiments of the present invention.

도 1을 참조하면 영상 감시 및 분석 시스템(100)은 영상 감시장치(110), 영상 분석 서버장치(120), 모니터링 장치(130)를 포함한다.Referring to FIG. 1, an image monitoring and analysis system 100 includes an image monitoring device 110, an image analysis server device 120, and a monitoring device 130.

이러한 영상 감시 및 분석 시스템(100)은 영상으로부터 감시 대상 객체를 탐지하도록 사전에 초기 학습된 객체 탐지기를 영상 감시장치(110) 또는 영상 분석 서버장치(120)에 포함하며, 영상 감시장치(110) 또는 영상 분석 서버장치(120)는 영상 감시장치(110)에 의한 감시 영역의 촬영 영상을 이용해 객체 탐지기에 감시 영역의 촬영을 적응적으로 추가 학습시킨다. 예를 들어, 영상 감시 및 분석 시스템(100)은 감시 영역에 설치되기 전에 오프라인 기계 학습(Offline Machine Learning)을 통해 객체를 탐지할 수 있도록 객체 탐지기가 사전에 초기 학습되고, 감시 영역에 설치된 후에 온라인 기계 학습(Online Machine Learning)을 통해 감시 영역의 촬영을 영상 감시장치(110) 또는 영상 분석 서버장치(120)의 객체 탐지기에 추가 학습시킬 수 있다. 이를 위해, 영상 감시장치(110) 또는 영상 분석 서버장치(120)는 마이크로프로세서(microprocessor) 등과 같은 컴퓨팅 연산장치를 포함할 수 있다. 여기서, 영상 분석 서버장치(120)는 영상 감시 장치(110)에 대하여 영상 분석 서비스를 제공한다는 의미로 서버장치로 기재하였고, 개시된 영상 분석 서버장치(120)는 영상 분석 장치의 일종이라 할 수 있다.Such an image monitoring and analysis system 100 includes an object detector initially learned in advance to detect an object to be monitored from an image in the image monitoring device 110 or the image analysis server device 120, and the image monitoring device 110 Alternatively, the image analysis server device 120 adaptively additionally learns the photographing of the surveillance area to the object detector by using the captured image of the surveillance area by the video monitoring device 110. For example, the video surveillance and analysis system 100 has an object detector that is initially learned in advance so that an object can be detected through offline machine learning before being installed in the surveillance area, and then online after being installed in the surveillance area. Through machine learning (Online Machine Learning), the photographing of the surveillance area may be additionally learned by the object detector of the image monitoring device 110 or the image analysis server device 120. To this end, the image monitoring device 110 or the image analysis server device 120 may include a computing computing device such as a microprocessor. Here, the image analysis server device 120 is described as a server device in the sense of providing an image analysis service to the image monitoring device 110, and the disclosed image analysis server device 120 may be referred to as a kind of image analysis device. .

도 2는 영상 감시장치(110)에 객체 탐지기로서 객체 탐지부(112)가 포함된 실시예를 나타낸 것이고, 이러한 실시예에 따르면 영상 분석 서버장치(120) 또한 객체 탐지기를 포함할 수 있다.FIG. 2 shows an embodiment in which the image monitoring apparatus 110 includes an object detector 112 as an object detector. According to this embodiment, the image analysis server device 120 may also include an object detector.

도 1 및 도 2를 참조하여 본 발명의 실시예에 따른 영상 감시 및 분석 시스템(100)에 대하여 살펴보기로 한다.An image monitoring and analysis system 100 according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2.

영상 감시장치(110)는 감시 영역을 촬영하여 촬영 영상을 생성하는 촬영부(111)를 포함하고, 촬영 영상을 포함한 각종 정보를 영상 분석 서버장치(120) 및 모니터링 장치(130)와 송수신하는 통신부(114)를 포함한다. 예를 들어, 영상 감시장치(110)는 CCTV(Closed Circuit Television)용 카메라로 구현될 수 있으나, 이에 한정되는 것은 아니고 감시 영역을 촬영할 수 있는 모든 장치를 포함할 수 있다.The video monitoring device 110 includes a photographing unit 111 that generates a photographed image by photographing a monitoring area, and a communication unit that transmits and receives various information including the photographed image to and from the image analysis server device 120 and the monitoring device 130 Including (114). For example, the image monitoring device 110 may be implemented as a CCTV (Closed Circuit Television) camera, but is not limited thereto and may include any device capable of photographing a surveillance area.

이러한 영상 감시장치(110)는 촬영 영상으로부터 감시 대상 객체를 탐지하도록 사전에 초기 학습된 객체 탐지부(112)를 포함할 수 있다. 이러한 객체 탐지부(112)는 학습부(113)에 의하여 감시 영역의 촬영이 적응적으로 추가 학습될 수 있다. 예를 들어, 감시 영역의 촬영에 대한 추가 학습에는 감시 영역에 대한 촬영 영상이 이용될 수 있다. 객체 탐지부(112)는 감시 영역의 촬영 영상을 실시간으로 분석하여 객체를 검출 및 인식하고, 객체 인식 결과를 영상 분석 서버장치(120) 및 모니터링 장치(130)로 전송하도록 통신부(114)를 제어한다. 예를 들어, 객체 탐지부(112)는 객체를 탐지하도록 미리 설계된 신경망을 학습하고, 학습된 신경망을 기반으로 촬영 영상 내의 객체를 검출 및 인식할 수 있다.Such an image monitoring apparatus 110 may include an object detection unit 112 that is initially learned in advance to detect an object to be monitored from a captured image. The object detection unit 112 may adaptively additionally learn the photographing of the surveillance area by the learning unit 113. For example, a captured image for the surveillance area may be used for additional learning about the capture of the surveillance area. The object detection unit 112 controls the communication unit 114 to detect and recognize an object by analyzing the captured image in the monitoring area in real time, and transmit the object recognition result to the image analysis server device 120 and the monitoring device 130 do. For example, the object detection unit 112 may learn a neural network designed in advance to detect an object, and detect and recognize an object in a captured image based on the learned neural network.

영상 감시장치(110)는 감시 영역에 대한 촬영 영상을 이용해 객체 탐지부(112)에 대하여 감시 영역의 촬영을 적응적으로 추가 학습시키는 학습부(113)를 포함할 수 있다.The image monitoring apparatus 110 may include a learning unit 113 that adaptively additionally learns the photographing of the surveillance area for the object detection unit 112 by using the captured image for the surveillance area.

영상 감시장치(110)의 학습부(113)는 기 설정된 실행 조건에 기초하여 객체 탐지부(112)의 추가 학습이 필요한지 여부를 판단할 수 있는데, 실행 조건으로는 초기화 후 촬영부(111)의 감시 영역에 대한 촬영 영상의 수, 촬영 시간 또는 객체 탐지부(112)에 의한 객체의 검출 횟수 중 하나 이상이 기 설정될 수 있다. 예를 들어, 감시 영역에 대한 촬영 영상의 수가 "1"일 때, 객체의 검출 횟수가 "0"일 때, 감시 영역에 외부의 객체가 들어오지 않을 가능성이 높은 시각(예컨대, 야간 등) 등이 추가 학습이 필요한지 여부를 판단할 수 있는 실행 조건으로서 미리 설정될 수 있다.The learning unit 113 of the video monitoring device 110 may determine whether additional learning of the object detection unit 112 is required based on a preset execution condition. One or more of the number of captured images for the surveillance area, the shooting time, or the number of times the object is detected by the object detector 112 may be preset. For example, when the number of captured images in the surveillance area is "1", when the number of detections of an object is "0", there is a high possibility that an external object does not enter the surveillance area (e.g., at night), etc. It may be set in advance as an execution condition capable of determining whether additional learning is required.

또한, 영상 감시장치(110)의 학습부(113)는 촬영 영상 중 감시 대상 객체가 존재하지 않는 영상을 추출하여 배경 영상을 선정한 후, 선정된 배경 영상을 이용하여 감시 영역에 대한 촬영이 반영되도록 객체 탐지부(112)를 추가 학습시킬 수 있다.In addition, the learning unit 113 of the image monitoring device 110 extracts an image in which the object to be monitored does not exist among the photographed images, selects a background image, and then uses the selected background image to reflect the photographing of the surveillance area. The object detection unit 112 may be additionally trained.

학습부(113)는 촬영 영상을 프레임 단위로 비교하여 변화가 발생한 프레임의 이전 프레임을 후보 영상으로 선정한 후, 후보 영상과 기 설정된 기준 영상과의 비교 결과에 기초하여 후보 영상 중 신규 배경 영상을 선정할 수 있다. 아울러, 학습부(113)는 신규 배경 영상의 선정을 배경 영상이 기 설정된 개수만큼 선정될 때까지 촬영 영상의 프레임 시간순서에 따라 반복하고, 신규 배경 영상은 다음의 프레임 시간순서에서 신규 배경 영상의 선정시 기준 영상으로 설정될 수 있다. 예를 들어, 학습부(113)는 촬영 영상으로부터 가장 최근에 선정된 배경 영상을 기준 영상으로 설정하고, 후보 영상과 기준 영상을 추가로 대비하여 후보 영상을 배경 영상으로 선정할 수 있다. 즉, 학습부(113)는 촬영 영상을 기 설정된 간격의 프레임 단위로 분석하여, 변화가 발생한 프레임의 이전 프레임 중에서 배경 영상을 선정하되, 배경 영상이 기 설정된 개수만큼 선정될 때까지 촬영 영상을 시간 순서에 따라 분석하여 배경 영상을 추가 선정할 수 있다. 학습부(113)는 변화가 발생한 프레임의 이전 프레임을 후보 영상으로 선정하고, 후보 영상을 기준으로 기 설정된 시간 동안의 평균 움직임 값에 기초하여 후보 영상을 배경 영상으로 선정할 수 있다. 예를 들어, 학습부(113)는 촬영 영상에서 흑백 성분(luminance)을 추출하거나 촬영 영상을 흑백 영상으로 변환한 후 매 프레임마다 프레임 단위로 비교하기 위하여 흑백 영상에 대하여 프레임간 차분 영상을 획득할 수 있다. 그리고, 학습부(113)는 임의 프레임의 프레임간 차분 영상 내에서 객체의 움직임 발생이 파악되면 임의 프레임의 직전 프레임을 후보 영상으로 선정할 수 있다. 그리고, 학습부(113)는 후보 영상이 선정되기 전에 기 설정된 시간 동안의 평균 움직임 값을 계산하고, 계산된 평균 움직임 값과 기 설정된 임계 움직임 값의 비교 결과에 기초하여 신규 배경 영상으로의 선정 여부를 결정할 수 있다. 또한, 학습부(113)는 후보 영상이 선정되면 기 설정된 시간 동안 다음 후보 영상을 선정하지 않고 휴지할 수 있다.The learning unit 113 compares the captured image frame by frame and selects the previous frame of the frame in which the change has occurred as a candidate image, and then selects a new background image from among the candidate images based on the comparison result between the candidate image and a preset reference image. can do. In addition, the learning unit 113 repeats the selection of a new background image according to the frame time order of the captured image until the background image is selected as many as a preset number, and the new background image is selected from the new background image in the next frame time order. When selected, it can be set as a reference image. For example, the learning unit 113 may set the background image most recently selected from the captured image as the reference image, and may additionally compare the candidate image and the reference image to select the candidate image as the background image. That is, the learning unit 113 analyzes the captured image in units of frames at a preset interval, and selects a background image from among the previous frames of the frame in which the change has occurred, and takes the captured image until a preset number of background images is selected. Background images can be additionally selected by analyzing them in sequence. The learning unit 113 may select a previous frame of the frame in which the change has occurred as a candidate image, and may select the candidate image as a background image based on an average motion value for a preset time based on the candidate image. For example, the learning unit 113 extracts black and white luminance from the captured image or converts the captured image to a black and white image, and then acquires an inter-frame difference image for a black and white image in order to compare frame by frame. I can. In addition, the learning unit 113 may select a frame immediately preceding the arbitrary frame as a candidate image when it is determined that the motion of the object occurs in the inter-frame difference image of the arbitrary frame. Then, the learning unit 113 calculates an average motion value for a preset time before a candidate image is selected, and whether to select a new background image based on a comparison result of the calculated average motion value and a preset threshold motion value. Can be determined. In addition, when a candidate image is selected, the learning unit 113 may pause without selecting a next candidate image for a preset time.

한편, 영상 감시장치(110)의 학습부(113)는 촬영부(111)의 촬영 영상으로부터 획득한 감시 영역에 대한 촬영을 네거티브 세트(negative set)로 학습을 진행하여 과적합(overfitting)하는 학습법을 수행할 수 있다. 예를 들어, 촬영부(111)에 의한 감시 영역에 대한 촬영 영상의 수가 "1"이면 영상 감시 및 분석 시스템(100)을 감시 영역에 설치하고 있는 중 또는 설치한 직후일 가능성이 매우 높기 때문에 최초의 촬영 영상에는 객체 검출 및 인식의 대상이 되는 원하는 객체가 없는 상태로 간주함으로써, 추가 학습을 통해 객체 탐지부(112)의 딥 러닝 웨이트 파일을 영상 감시장치(110)에 의한 감시 영역의 촬영에 최적화되도록 미세하게 변형시킬 수 있다. 또는, 학습부(113)는 배경 영상을 선정한 경우, 배경 영상은 감지 대상 객체가 없기 때문에 배경 영상을 이용한 추가 학습을 통해 객체 탐지부(112)의 딥 러닝 웨이트 파일을 영상 감시장치(110)에 의한 감시 영역의 촬영에 최적화되도록 미세하게 변형시킬 수 있다.On the other hand, the learning unit 113 of the image monitoring device 110 is a learning method of overfitting by learning to capture the surveillance area acquired from the captured image of the photographing unit 111 as a negative set. You can do it. For example, if the number of captured images for the surveillance area by the photographing unit 111 is "1", it is very likely that the video surveillance and analysis system 100 is being installed in the surveillance area or immediately after the installation. By considering that the desired object to be object detection and recognition is not present in the captured image, the deep learning weight file of the object detection unit 112 is applied to the photographing of the surveillance area by the image monitoring device 110 through additional learning. It can be finely modified to optimize. Alternatively, when the learning unit 113 selects a background image, the background image does not have an object to be detected, so the deep learning weight file of the object detection unit 112 is transmitted to the image monitoring device 110 through additional learning using the background image. It can be finely modified to be optimized for shooting of the surveillance area by.

영상 분석 서버장치(120)는 영상 감시장치(110)로부터 전송된 객체 인식 결과에 기초하여 사전에 정의된 하나 이상의 이벤트를 감지할 수 있고, 감지된 이벤트에 따른 정보를 모니터링 장치(130)로 전송할 수 있다. 영상 감시장치(110)가 복수의 감시 영역에서 각각 운용되는 경우에 영상 분석 서버장치(120)는 복수의 영상 감시장치(110)를 통합 운용하거나 중앙 제어할 수 있다.The image analysis server device 120 may detect one or more predefined events based on the object recognition result transmitted from the image monitoring device 110, and transmit information according to the detected event to the monitoring device 130. I can. When the image monitoring device 110 is operated in each of a plurality of monitoring areas, the image analysis server device 120 may integrally operate or centrally control the plurality of image monitoring devices 110.

모니터링 장치(130)는 영상 감시장치(110)로부터 촬영된 영상을 수신하여 실시간으로 디스플레이 할 수 있다. 그리고, 모니터링 장치(130)는 영상 분석 서버장치(120)로부터 수신한 정보에 따라 경보 발생 등과 같은 적절한 이벤트를 발생시킬 수 있다. 또한, 모니터링 장치(130)는 영상 감시장치(110)로부터 전송된 객체 인식 결과를 실시간으로 디스플레이 할 수 있고, 객체 인식 결과에 대한 운영자 등에 의한 판단 정보를 객체의 인식 결과에 대한 피드백으로서 영상 감시장치(110) 또는 영상 분석 서버장치(120)로 전송할 수 있다.The monitoring device 130 may receive an image captured from the image monitoring device 110 and display it in real time. In addition, the monitoring device 130 may generate an appropriate event such as an alarm according to information received from the image analysis server device 120. In addition, the monitoring device 130 may display the object recognition result transmitted from the image monitoring device 110 in real time, and the image monitoring device provides information on the determination of the object recognition result by an operator or the like as a feedback on the recognition result of the object. (110) Or it may be transmitted to the image analysis server device 120.

도 3은 본 발명의 제 1 실시예에 따른 영상 감시장치(100)의 기계 학습 방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating a machine learning method of the image monitoring apparatus 100 according to the first embodiment of the present invention.

도 1 내지 도 3을 참조하여 제 1 실시예에 따른 영상 감시장치(100)의 기계 학습 방법에 대해 자세히 살펴보기로 한다.The machine learning method of the image monitoring apparatus 100 according to the first embodiment will be described in detail with reference to FIGS. 1 to 3.

먼저, 영상 감시 및 분석 시스템(100)을 유통 및 설치하기 전에 영상 감시장치(110)의 객체 탐지부(112)가 추후 감시 영역에서 객체를 탐지하고 인식할 수 있도록 하기 위해 미리 설계된 신경망을 사전에 초기 학습시킨다. 예를 들어, 컨볼루션 신경망(CNN: Convolutional Neural Network)이나 순환 신경망(RNN: Recurrent Neural Network) 또는 컨볼루션 신경망과 순환 신경망의 조합 등 다양한 신경망을 이용하여 오프라인 기계 학습을 수행할 수 있고, 학습 결과로서 딥 러닝 웨이트 파일이 생성될 수 있다. 이처럼, 객체를 탐지 및 인식할 수 있도록 사전에 초기 학습시킨 객체 탐지부(112)를 포함하는 영상 감시장치(110)가 공장 등에서 제조되어 유통될 수 있고, 객체 탐지부(112)가 사전에 초기 학습된 상태가 초기화 상태이다(S310).First, before distribution and installation of the video monitoring and analysis system 100, the object detection unit 112 of the video monitoring device 110 may detect and recognize an object in the surveillance area in advance. Early learning. For example, offline machine learning can be performed using various neural networks such as a convolutional neural network (CNN), a recurrent neural network (RNN), or a combination of a convolutional neural network and a recurrent neural network. As a deep learning weight file may be generated. In this way, the image monitoring device 110 including the object detection unit 112, which was initially learned in advance to detect and recognize the object, can be manufactured and distributed in a factory, etc., and the object detection unit 112 is initially The learned state is an initialization state (S310).

이후, 영상 감시장치(110)를 소정의 감시 영역에 설치하고, 영상 분석 서버장치(120) 및 모니터링 장치(130)와 통신망을 통해 연결하여 온라인 상태를 구축한다.Thereafter, the video monitoring device 110 is installed in a predetermined monitoring area, and an online state is established by connecting the video analysis server device 120 and the monitoring device 130 through a communication network.

그러면, 영상 감시장치(110)의 촬영부(111)는 감시 영역에 대하여 영상을 촬영할 수 있는 상태에 놓이고, 학습부(113)는 기 설정된 실행 조건에 기초하여 객체 탐지부(112)의 추가 학습이 필요한지 여부를 판단한다. 여기서, 실행 조건으로는 초기화 후 촬영부(111)의 감시 영역에 대한 촬영 영상의 수, 촬영 시간 또는 객체 탐지부(112)에 의한 객체의 검출 횟수 중 하나 이상이 기 설정될 수 있다. 예를 들어, 감시 영역에 대한 촬영 영상의 수가 "1"일 때가 실행 조건으로 사전에 설정된 경우, 촬영부(111)가 감시 영역을 최초로 촬영(S320)한 때에 학습부(113)는 실행 조건이 만족하는 것으로 파악하여 객체 탐지부(112)에 대한 추가 학습이 필요한 것으로 판단할 수 있다(S330).Then, the photographing unit 111 of the video monitoring device 110 is placed in a state capable of capturing an image for the surveillance area, and the learning unit 113 adds the object detection unit 112 based on a preset execution condition. Determine whether learning is necessary. Here, as an execution condition, one or more of the number of captured images for the surveillance area of the photographing unit 111 after initialization, a photographing time, or the number of times the object is detected by the object detection unit 112 may be preset. For example, when the number of captured images for the surveillance area is set to "1" as the execution condition, when the shooting unit 111 first photographs the surveillance area (S320), the learning unit 113 sets the execution condition. It may be determined that the object detection unit 112 needs additional learning by grasping that it is satisfied (S330).

학습부(113)가 객체 탐지부(112)에 대하여 수행하는 추가 학습은 촬영부(111)에 의해 촬영된 감시 영역의 촬영 영상이 이용된다. 예를 들어, 학습부(113)는 촬영부(111)의 촬영 영상으로부터 획득한 감시 영역에 대한 촬영을 네거티브 세트로 학습을 진행하여 과적합하는 온라인 러닝을 수행할 수 있다. 예를 들어, 촬영 영상으로부터 추출될 수 있는 감시 영역의 실내 구조, 실외 조명 상황, 실내 가구 배치 등과 같은 환경 정보가 객체 탐지부(112)에 적응적으로 추가 학습될 수 있다. 영상 감시 및 분석 시스템(100)을 감시 영역에 설치하고 있는 중 또는 설치한 직후에 촬영된 영상 내에는 객체 탐지 및 인식의 대상이 되는 원하는 객체가 없는 상태로 간주하고, 네거티브 세트를 이용한 추가 학습을 통해 객체 탐지부(112)의 사전 초기 학습에 의해 생성되어 있던 딥 러닝 웨이트 파일을 영상 감시장치(110)에 의한 감시 영역의 촬영에 최적화되도록 미세하게 변형시킬 수 있다. 여기서, 영상 감시장치(110)가 복수의 감시 영역에서 각각 운용되는 경우에 서로 다른 감시 영역에서 운용 중인 복수의 객체 탐지부(112)는 서로 다른 감시 영역에 의한 서로 다른 실내 구조, 서로 다른 실외 조명 환경, 서로 다른 실내 가구 배치 등이 추가로 학습될 수 있다(S340).The additional learning performed by the learning unit 113 with respect to the object detection unit 112 uses a photographed image of the surveillance area photographed by the photographing unit 111. For example, the learning unit 113 may perform online learning in which overfitting is performed by learning the photographing of a surveillance region acquired from the photographed image of the photographing unit 111 as a negative set. For example, environmental information such as an indoor structure of a surveillance area, an outdoor lighting situation, an indoor furniture arrangement, etc. that can be extracted from a captured image may be adaptively additionally learned by the object detector 112. The video surveillance and analysis system 100 is regarded as a state in which the desired object to be detected and recognized does not exist in the image captured during or immediately after the installation in the surveillance area, and additional learning using the negative set is performed. Through this, the deep learning weight file generated by the pre-initial learning of the object detection unit 112 may be finely transformed so as to be optimized for photographing the surveillance area by the image monitoring device 110. Here, when the video surveillance device 110 is operated in a plurality of surveillance areas, the plurality of object detection units 112 operating in different surveillance areas are provided with different indoor structures and different outdoor lighting by different surveillance areas. Environment, different indoor furniture arrangements, etc. may be additionally learned (S340).

이후, 영상 감시장치(110)의 객체 탐지부(112)는 단계 S340를 통해 감시 영역의 촬영에 최적화된 상태에서 촬영부(111)에 의한 감시 영역의 촬영 영상에서 원하는 객체를 탐지 및 인식하고, 객체 인식 결과를 영상 분석 서버장치(120) 및 모니터링 장치(130)로 전송하도록 통신부(114)를 제어한다(S350).Thereafter, the object detection unit 112 of the video monitoring device 110 detects and recognizes a desired object in the captured image of the surveillance area by the capture unit 111 in a state optimized for shooting of the surveillance area through step S340, The communication unit 114 is controlled to transmit the object recognition result to the image analysis server device 120 and the monitoring device 130 (S350).

도 4는 본 발명의 제 2 실시예에 따른 영상 감시장치(110)의 기계 학습 방법을 설명하기 위한 흐름도이고, 도 5는 도 4에 나타낸 영상 감시장치(110)의 기계 학습 방법을 부연 설명하기 위하여 배경 영상을 선정하는 과정을 나타낸 도면이다.4 is a flowchart illustrating a machine learning method of the video monitoring device 110 according to the second embodiment of the present invention, and FIG. 5 is a further explanation of the machine learning method of the video monitoring device 110 shown in FIG. 4. It is a diagram showing a process of selecting a background image for this purpose.

도 1, 도 2, 도 4 및 도 5를 참조하여 본 발명의 제 2 실시예에 따른 영상 감시 및 분석 시스템(100)에 대하여 살펴보기로 한다.An image monitoring and analysis system 100 according to a second embodiment of the present invention will be described with reference to FIGS. 1, 2, 4, and 5.

먼저, 영상 감시 및 분석 시스템(100)을 유통 및 설치하기 전에 영상 감시장치(110)의 객체 탐지부(112)가 추후 감시 영역에서 객체를 탐지할 수 있도록 하기 위해 미리 설계된 신경망을 사전에 초기 학습시킨다. 예를 들어, 컨볼루션 신경망(CNN)이나 순환 신경망(RNN) 또는 컨볼루션 신경망과 순환 신경망의 조합 등 다양한 신경망을 이용하여 오프라인 러닝을 수행할 수 있고, 학습 결과로서 초기 학급 웨이트 파일이 생성될 수 있다. 이처럼, 객체를 탐지할 수 있도록 사전 초기 학습시킨 객체 탐지부(112)를 포함하는 영상 감시장치(110)가 공장 등에서 제조되어 유통될 수 있다.First, before distributing and installing the video monitoring and analysis system 100, the object detection unit 112 of the video monitoring device 110 initially learns in advance a neural network designed in advance in order to detect an object in a later monitoring area. Let it. For example, offline learning can be performed using various neural networks such as a convolutional neural network (CNN), a recurrent neural network (RNN), or a combination of a convolutional neural network and a recurrent neural network, and an initial class weight file can be generated as a learning result. have. In this way, the image monitoring device 110 including the object detection unit 112 that has been initially learned in advance to detect the object may be manufactured and distributed in a factory or the like.

그러면, 영상 감시장치(110)의 촬영부(111)는 감시 영역을 촬영해 촬영 영상을 생성하여 객체 탐지부(112) 및 학습부(113)에 제공한다(S410).Then, the photographing unit 111 of the image monitoring apparatus 110 photographs the surveillance area, generates a photographed image, and provides it to the object detection unit 112 and the learning unit 113 (S410).

여기서, 학습부(113)는 촬영부(111)에 의한 촬영 영상 중 감시 영역에 대한 촬영이 반영되도록 추가 학습을 시킬 때에 이용하기 위한 배경 영상을 선정하는 과정을 수행하게 된다. 여기서, 배경 영상은 감시 대상 객체가 존재하지 않는 영상을 의미할 수 있다. 촬영 영상 중 감시 영역에 대한 촬영이 반영되도록 추가 학습하는 과정은 온라인 기계 학습 기술이라 할 수 있고, 이러한 온라인 기계 학습 기술은 소수의 학습 데이터베이스가 필요하다. 그 이유는 학습의 빈도가 캐태스트로픽 포게팅(catastrophic forgetting) 문제를 막기 위해 매우 적은 횟수여야 하며, 이 때 의미 없는 학습 영상의 수가 많을 경우 반드시 필요한 학습 이미지가 사용되지 않을 확률이 매우 높기 때문이다. 예를 들어, 학습에 반드시 필요한 영상은 100장의 영상 중 50번째 영상과 90번째 영상일 경우, 학습에 100장의 영상을 모두 사용하는 경우 적은 학습 과정 중에 50번째 영상과 90번째의 영상이 단 한번도 사용되지 않을 수 있다. 하지만 100장의 영상 중 50번째 영상과 90번째 영상을 정확하게 샘플링을 하여 학습에 사용하면 그 성능 향상이 눈에 띄게 높아질 수 있는 것이다. 그러므로, 네거티브 온라인 학습을 적용하기 위해서는 감시 대상 객체(예컨대, 사람)가 존재하지 않는 배경 영상을 선정할 필요가 있고, 감시 대상 객체가 존재하지 않는 영상 중에서 한 장을 배경 영상을 선정할 필요가 있다. 그 이유는 촬영 영상 내에 어떠한 움직임이 있기 전에 촬영된 구간의 영상은 모두 다 같기 때문이다. 예를 들어, 오전 6시에 촬영부(111)가 "ON" 상태가 된 후 오전 9시 이후에 사람이 등장하였다고 가정하면, 3시간 동안의 영상은 움직임이 발견되지 않을 경우에 거의 같다. 따라서 3시간 동안의 모든 영상을 학습에 사용하기 보다는 이 중에 한 장만 사용하는 것이 매우 효과적이고, 이처럼 네거티브 온라인 학습에 사용할 영상을 찾는 과정이 배경 영상을 선정하는 과정이다.Here, the learning unit 113 performs a process of selecting a background image to be used when additional learning is performed so that the photographing of the surveillance region is reflected among the captured images by the photographing unit 111. Here, the background image may mean an image in which the object to be monitored does not exist. The process of additional learning to reflect the shooting of the surveillance area among the captured images can be referred to as online machine learning technology, and such online machine learning technology requires a small number of learning databases. The reason is that the frequency of learning must be very small to prevent catastrophic forgetting problems, and in this case, if there are a large number of meaningless learning images, there is a very high probability that the necessary learning images will not be used. to be. For example, if the video that is essential for learning is the 50th and 90th video among 100 videos, when all 100 videos are used for learning, the 50th and 90th video are used only once during the few learning processes. It may not be. However, if the 50th image and the 90th image among 100 images are accurately sampled and used for learning, the performance improvement can be remarkably improved. Therefore, in order to apply negative online learning, it is necessary to select a background image in which an object to be monitored (e.g., a person) does not exist, and it is necessary to select a background image from the images in which the object to be monitored does not exist. . The reason is that all the images of the section captured before any movement in the captured image are the same. For example, assuming that a person appears after 9 am after the photographing unit 111 is turned "ON" at 6 am, the image for 3 hours is almost the same when no movement is found. Therefore, rather than using all the videos for 3 hours for learning, it is very effective to use only one of them, and the process of finding the video to be used for negative online learning is the process of selecting a background video.

이러한 배경 영상을 선정하는 과정을 살펴보면, 학습부(113)는 먼저 앞선 배경 영상 선정 과정 중에 배경 영상으로 선정된 영상의 개수가 기 설정된 배경 영상 개수를 만족하는지를 판단한다(S421). 이러한 단계 S421에서, 기 설정된 배경 영상의 개수보다 앞서 선정된 배경 영상의 개수가 작으면 배경 영상을 추가로 선정하는 과정을 수행하게 된다. 즉, 배경 영상의 추가 선정이 필요하다고 판단되는 경우에 이후 설명될 단계 S430까지를 수행하게 된다.Looking at the process of selecting such a background image, the learning unit 113 first determines whether the number of images selected as background images during the previous background image selection process satisfies a preset number of background images (S421). In step S421, if the number of background images selected before the preset number of background images is smaller, a process of additionally selecting a background image is performed. That is, when it is determined that additional selection of a background image is necessary, up to step S430, which will be described later, is performed.

학습부(113)는 단계 S410의 촬영 영상으로부터 흑백 성분(luminance)을 추출하거나 촬영 영상을 흑백 영상으로 변환(S422)한 후 기 설정된 간격의 프레임 단위로, 예컨대 매 프레임마다 프레임 단위로 비교하기 위하여 흑백 영상에 대하여 프레임간 차분 영상을 획득할 수 있다. 도 5에는 현재 프레임 영상(F_n)과 바로 이전 프레임의 영상(F_n _- ₁)의 차분 영상(F_Dn)을 계산하는 예를 나타내었다. 예를 들어, 도 6에 예시한 바와 같이 현재 프레임과 이전 프레임의 차이, 즉 움직이는 객체의 윤곽선 정도만 포함하는 프레임간 차분 영상이 획득될 수 있다(S423).The learning unit 113 extracts luminance from the photographed image in step S410 or converts the photographed image into a monochrome image (S422), and then compares the photographed image in a frame unit at a preset interval, for example, every frame. An inter-frame difference image may be obtained for a black and white image. FIG. 5 shows an example of calculating a difference image (F _Dn ) between the current frame image (F _n ) and the image of the immediately previous frame (F _n _- _{1 ).} For example, as illustrated in FIG. 6, a difference image between frames including only the difference between the current frame and the previous frame, that is, an outline degree of a moving object, may be obtained (S423).

그리고, 학습부(113)는 임의 프레임의 프레임간 차분 영상 내에서 객체의 움직임 발생이 파악(S424)되면 임의 프레임의 이전 프레임들 중 샘플링을 통해 어느 한 프레임을 배경 영상의 후보 영상으로 선정할 수 있다. 예컨대, 학습부(113)는 객체의 움직임 발생이 파악된 임의 프레임의 직전 프레임을 배경 영상의 후보 영상으로 선정할 수 있다(S425).And, the learning unit 113 may select a frame as a candidate image of the background image through sampling from the previous frames of the arbitrary frame when the occurrence of motion of the object in the inter-frame difference image of an arbitrary frame is identified (S424). have. For example, the learning unit 113 may select a frame immediately preceding an arbitrary frame in which motion of an object is detected as a candidate image of the background image (S425).

여기서, 단계 S425에서 선정된 배경 영상의 후보 영상은 배경 영상으로 선정하기에 부적합한 영상이 포함되어 있을 수 있다. 예를 들어, 근시간 내에 객체의 움직임이 없다가 잠시 움직임이 발생 후 다시 움직임이 없는 패턴이 반복되었을 경우 단계 S425에서 객체가 포함된 영상이 후보 영상으로 선정될 수 있기 때문에, 이러한 후보 영상은 배경 영상으로 선정하기에 부적합하다고 할 수 있다. 이러한 부적합 영상을 필터링하기 위하여 학습부(113)는 단계 S425에서 배경 영상의 후보 영상이 선정된 경우, 후보 영상이 선정되기 전에 기 설정된 시간 동안(예컨대, 150초 동안)의 평균 움직임 값을 계산하고, 계산된 평균 움직임 값과 기 설정된 임계 움직임 값을 비교하며, 평균 움직임 값이 임계 움직임 값을 초과하는 경우에는 단계 S425에서 선정된 영상을 배경 영상의 후보 영상에서 제외시킬 수 있다(S426).Here, the candidate image of the background image selected in step S425 may include an image inappropriate to be selected as a background image. For example, if there is no movement of an object within a short period of time and then a pattern without movement is repeated again after a moment of movement, an image containing an object may be selected as a candidate image in step S425, so such a candidate image is used as a background image. It can be said that it is not suitable to be selected as a video. In order to filter out such a nonconforming image, when a candidate image of the background image is selected in step S425, the learning unit 113 calculates an average motion value for a preset time (for example, for 150 seconds) before the candidate image is selected. , The calculated average motion value and a preset threshold motion value are compared, and when the average motion value exceeds the threshold motion value, the image selected in step S425 may be excluded from the candidate image of the background image (S426).

그리고, 학습부(113)는 단계 S426을 통과한 후보 영상이 존재하면, 기 설정된 시간 동안 다음 후보 영상을 선정하지 않고 휴지할 수 있다. 이는 휴지 시간(sleep time)을 갖지 않고 다음 후보 영상을 선정하게 되면, 이후 단계 S426을 다시 수행할 때에 평균 움직임 값이 임계 움직임 값을 초과하지 않더라도 매우 유사할 확률이 높기 때문에, 이처럼 배경 영상으로 선정하기에 부적합한 영상이 단계 S426을 통과하는 것을 방지하기 위한 것이다(S427).In addition, if there is a candidate image that has passed step S426, the learning unit 113 may pause without selecting a next candidate image for a preset time. This is because if the next candidate image is selected without having a sleep time, it is highly likely that the average motion value will be very similar even if the average motion value does not exceed the threshold motion value when performing step S426 again. This is to prevent an image inappropriate for the following from passing through step S426 (S427).

다음으로, 학습부(113)는 단계 S426을 통과한 후보 영상과 기 설정된 기준 영상을 비교하고(S428), 후보 영상과 기준 영상의 차이가 기 설정된 임계값 이상 차이가 있는 경우(S429)에 단계 S426을 통과한 후보 영상을 신규 배경 영상으로 선정할 수 있다(S430). 여기서, 신규 배경 영상으로 선정된 영상은 다음의 프레임 시간순서에서 신규 배경 영상의 선정시 기준 영상으로 설정될 수 있다. 즉, 단계 S426을 통과한 후보 영상과 비교되는 기준 영상은 이미 선정된 배경 영상 중에서 가장 최근에 배경 영상으로 선정된 것이 이용될 수 있다. 예를 들어, 단계 S428 및 S429가 최초로 수행되는 경우에 기준 영상은 존재하지 않을 수 있고, 이 경우에는 단계 S426을 통과한 후보 영상이 필연적으로 배경 영상으로 선정될 수 있다. 예를 들어, 단계 S429에서 후보 영상과 기준 영상을 비교할 때에 영상 비교를 위하여 SSIM(structural similarity) 알고리즘을 사용할 수 있다. SSIM 알고리즘은 기준 영상(F_o)과 후보 영상, 즉 단계 S426을 통과한 현재 프레임 영상(F_n)의 밝기가 다르거나 콘트라스트(contrast)가 다르거나 또는 영상 내 구조가 다른 경우에 그 값이 0에 가까워 지고, 일치할수록 1에 가까워 진다.Next, the learning unit 113 compares the candidate image that has passed step S426 with a preset reference image (S428), and when there is a difference between the candidate image and the reference image by more than a preset threshold (S429). A candidate image that has passed S426 may be selected as a new background image (S430). Here, the image selected as the new background image may be set as the reference image when the new background image is selected in the following frame time order. That is, as the reference image to be compared with the candidate image passing through step S426, the most recently selected background image among the previously selected background images may be used. For example, when steps S428 and S429 are performed for the first time, the reference image may not exist, and in this case, the candidate image passing through step S426 may inevitably be selected as the background image. For example, when comparing the candidate image and the reference image in step S429, a structural similarity (SSIM) algorithm may be used for image comparison. The SSIM algorithm has a value of 0 when the brightness of the reference image (F _o ) and the candidate image, that is, the current frame image (F _n ) that has passed step S426, differs in contrast, or the structure within the image is different. It gets closer to, and the more it matches, the closer it gets to 1.

단계 S422 내지 단계 S430은 필요한 만큼 반복될 수 있다. 즉, 촬영 영상을 기 설정된 간격의 프레임 단위로 분석하여, 변화가 발생한 프레임의 이전 프레임 중에서 배경 영상을 선정하되, 배경 영상이 기 설정된 개수만큼 선정될 때까지 촬영 영상을 시간 순서에 따라 분석하여 배경 영상을 추가로 선정할 수 있다.Steps S422 to S430 may be repeated as needed. In other words, the captured image is analyzed in units of frames at a preset interval, and a background image is selected from the previous frames of the frame in which the change has occurred, but the captured image is analyzed in chronological order until the background image is selected as many times as the background image. Additional videos can be selected.

단계 S421에서 배경 영상으로 선정된 영상의 개수가 기 설정된 배경 영상 개수를 만족하면, 학습부(113)는 선정된 배경 영상을 이용하여 감시 영역에 대한 촬영이 반영되도록 객체 탐지부(112)를 추가 학습시킨다. 예를 들어, 학습부(113)는 감시 대상 객체가 존재하지 않는 배경 영상으로부터 획득한 감시 영역에 대한 촬영을 네거티브 세트로 학습을 진행하여 과적합하는 온라인 러닝을 수행할 수 있다. 예를 들어, 배경 영상으로부터 추출될 수 있는 감시 영역의 실내 구조, 실외 조명 상황, 실내 가구 배치 등과 같은 환경 정보가 객체 탐지부(112)에 적응적으로 추가 학습될 수 있다. 이로써, 사전 초기 학습에 의해 생성되어 있던 초기 학습 웨이트 파일이 영상 감시장치(110)에 의한 감시 영역의 촬영에 최적화되도록 미세하게 변형되어 추가 학습 웨이트 파일이 생성될 수 있다. 여기서, 영상 감시장치(110)가 복수의 감시 영역에서 각각 운용되는 경우에 서로 다른 감시 영역에서 운용 중인 복수의 객체 탐지부(112)는 서로 다른 감시 영역에 의한 서로 다른 실내 구조, 서로 다른 실외 조명 환경, 서로 다른 실내 가구 배치 등이 추가로 학습될 수 있다(S440).When the number of images selected as the background image in step S421 satisfies the preset number of background images, the learning unit 113 adds the object detection unit 112 so that the photographing of the surveillance area is reflected using the selected background image. To learn. For example, the learning unit 113 may perform online learning in which the monitoring target object is acquired from a background image in which the object to be monitored does not exist, and learns the photographing of the surveillance region as a negative set. For example, environmental information such as an indoor structure of a surveillance area, an outdoor lighting situation, and an indoor furniture arrangement, which may be extracted from the background image, may be adaptively additionally learned by the object detector 112. As a result, the initial learning weight file generated by the pre-initial learning may be finely deformed so as to be optimized for photographing the surveillance area by the image monitoring device 110 to generate an additional learning weight file. Here, when the video surveillance device 110 is operated in a plurality of surveillance areas, the plurality of object detection units 112 operating in different surveillance areas are provided with different indoor structures and different outdoor lighting by different surveillance areas. Environment, different indoor furniture arrangements, etc. may be additionally learned (S440).

이후, 영상 감시장치(110)의 객체 탐지부(112)는 단계 S440를 통해 감시 영역의 촬영에 최적화된 상태에서 촬영부(111)에 의한 감시 영역의 촬영 영상에서 감지 대상 객체를 검출 및 인식하고, 객체 인식 결과에 의한 탐지 정보를 영상 분석 서버장치(120) 및 모니터링 장치(130)로 전송하도록 통신부(114)를 제어한다(S450).Thereafter, the object detection unit 112 of the video monitoring device 110 detects and recognizes the object to be detected in the captured image of the surveillance area by the capture unit 111 in a state optimized for shooting of the surveillance area through step S440. , Controls the communication unit 114 to transmit detection information based on the object recognition result to the image analysis server device 120 and the monitoring device 130 (S450).

도 7은 영상 분석 서버장치(120)에 객체 탐지기로서 객체 탐지부(121)가 포함된 실시예를 나타낸 것이고, 이러한 실시예에 따르면 영상 감시장치(110) 또한 객체 탐지기를 포함할 수 있다.7 shows an embodiment in which the image analysis server device 120 includes an object detection unit 121 as an object detector, and according to this embodiment, the image monitoring device 110 may also include an object detector.

도 1 및 도 7을 참조하여 본 발명의 실시예에 따른 영상 감시 및 분석 시스템(100)에 대하여 살펴보기로 한다.An image monitoring and analysis system 100 according to an embodiment of the present invention will be described with reference to FIGS. 1 and 7.

영상 감시장치(110)는 감시 영역의 영상을 촬영하고, 촬영 영상을 포함한 각종 정보를 영상 분석 서버장치(120) 및 모니터링 장치(130)와 송수신할 수 있다. 예를 들어, 영상 감시장치(110)는 CCTV용 카메라로 구현될 수 있으나, 이에 한정되는 것은 아니고 감시 영역의 영상을 촬영할 수 있는 모든 장치를 포함할 수 있다.The image monitoring device 110 may capture an image of a surveillance area and transmit and receive various types of information including the captured image to and from the image analysis server device 120 and the monitoring device 130. For example, the image monitoring device 110 may be implemented as a CCTV camera, but is not limited thereto and may include any device capable of capturing an image of a surveillance area.

이러한 영상 감시장치(110)는 촬영 영상으로부터 객체를 탐지하도록 사전에 초기 학습된 객체 탐지기를 포함할 수 있다. 영상 감시장치(110)는 객체 탐지기를 이용해 감시 영역의 촬영 영상을 실시간으로 분석하여 객체를 탐지 및 인식한 후, 객체 인식 결과를 영상 분석 서버장치(120) 및 모니터링 장치(130)로 전송할 수 있다. 객체 탐지기는 객체를 검출하고 인식하도록 미리 설계된 신경망을 학습하고, 학습된 신경망을 기반으로 촬영 영상 내의 객체를 검출 및 인식할 수 있다.Such an image monitoring apparatus 110 may include an object detector initially learned in advance to detect an object from a captured image. The image monitoring device 110 may detect and recognize an object by analyzing the captured image of the surveillance area in real time using an object detector, and then transmit the object recognition result to the image analysis server device 120 and the monitoring device 130. . The object detector may learn a neural network designed in advance to detect and recognize an object, and detect and recognize an object in a captured image based on the learned neural network.

영상 분석 서버장치(120)는 영상 감시장치(110)로부터 전송된 감시 영역에 대한 촬영 영상으로부터 객체를 탐지하도록 사전에 초기 학습된 객체 탐지부(121)를 포함할 수 있고, 영상 감시장치(110) 및 모니터링 장치(130)와 각종 정보를 송수신할 수 있는 통신부(123)를 포함할 수 있다. 이러한 객체 탐지부(121)는 학습부(122)에 의하여 감시 영역의 촬영이 적응적으로 추가 학습될 수 있다. 예를 들어, 감시 영역의 촬영에 대한 추가 학습에는 감시 영역에 대한 촬영 영상이 이용될 수 있다. 객체 탐지부(121)는 감시 영역의 촬영 영상을 실시간으로 분석하여 객체를 검출 및 인식하고, 객체 인식 결과를 영상 감시장치(110) 및 모니터링 장치(130)로 전송하도록 통신부(123)를 제어한다. 객체 탐지부(121)는 객체를 검출하고 인식하도록 미리 설계된 신경망을 학습하고, 학습된 신경망을 기반으로 촬영 영상 내의 객체를 검출 및 인식할 수 있다. 또한, 객체 탐지부(121)는 영상 감시장치(110)로부터 전송된 객체 인식 결과에 기초하여 사전에 정의된 하나 이상의 이벤트를 감지할 수 있고, 감지된 이벤트에 따른 정보를 모니터링 장치(130)로 전송하도록 통신부(123)를 제어할 수 있다. 영상 감시장치(110)가 복수의 감시 영역에서 각각 운용되는 경우에 객체 탐지부(121)는 복수의 영상 감시장치(110)를 통합 운용하거나 중앙 제어할 수 있다.The image analysis server device 120 may include an object detection unit 121 initially learned in advance to detect an object from a captured image for a surveillance area transmitted from the image monitoring device 110, and the image monitoring device 110 ) And a communication unit 123 capable of transmitting and receiving various types of information with the monitoring device 130. The object detection unit 121 may adaptively additionally learn the photographing of the surveillance area by the learning unit 122. For example, a captured image for the surveillance area may be used for additional learning about the capture of the surveillance area. The object detection unit 121 controls the communication unit 123 to detect and recognize an object by analyzing the captured image of the monitoring area in real time, and transmit the object recognition result to the image monitoring device 110 and the monitoring device 130. . The object detection unit 121 may learn a neural network designed in advance to detect and recognize an object, and detect and recognize an object in the captured image based on the learned neural network. In addition, the object detection unit 121 may detect one or more predefined events based on the object recognition result transmitted from the video monitoring device 110, and transmit information according to the detected event to the monitoring device 130. It is possible to control the communication unit 123 to transmit. When the image monitoring device 110 is operated in each of a plurality of surveillance areas, the object detection unit 121 may integrally operate or centrally control the plurality of image monitoring devices 110.

영상 분석 서버장치(120)는 감시 영역에 대한 촬영 영상을 이용해 객체 탐지부(121)에 대하여 감시 영역의 촬영을 적응적으로 추가 학습시키는 학습부(122)를 포함할 수 있다. 여기서, 영상 감시장치(110)가 복수의 감시 영역에서 각각 운용되는 경우에 서로 다른 감시 영역에 의한 서로 다른 실내 구조, 서로 다른 실외 조명 환경, 서로 다른 실내 가구 배치 등이 추가로 학습될 수 있다. 학습부(122)는 기 설정된 실행 조건에 기초하여 객체 탐지부(121)의 추가 학습이 필요한지 여부를 판단할 수 있는데, 실행 조건으로는 초기화 후 영상 감시장치(110)의 감시 영역에 대한 촬영 영상의 수, 촬영 시간 또는 객체의 인식 결과에 대한 모니터링 장치(130)의 피드백 결과 중 하나 이상이 기 설정될 수 있다. 예를 들어, 감시 영역에 대한 촬영 영상의 수가 "1"일 때, 감시 영역에 외부의 객체가 들어오지 않을 가능성이 높은 시각(예컨대, 야간 등), 객체의 인식 결과에 대한 피드백 결과가 "거짓(false)"일 때 등이 추가 학습이 필요한지 여부를 판단할 수 있는 실행 조건으로서 미리 설정될 수 있다.The image analysis server device 120 may include a learning unit 122 that adaptively additionally learns capturing of the surveillance area with respect to the object detection unit 121 using the captured image of the surveillance area. Here, when the image monitoring apparatus 110 is operated in a plurality of surveillance areas, different indoor structures, different outdoor lighting environments, different indoor furniture arrangements, etc. by different surveillance areas may be additionally learned. The learning unit 122 may determine whether additional learning of the object detection unit 121 is required based on a preset execution condition. As an execution condition, a captured image of the surveillance area of the video monitoring device 110 after initialization One or more of a feedback result of the monitoring device 130 for the number of pictures, a photographing time, or an object recognition result may be preset. For example, when the number of captured images for the surveillance area is "1", when there is a high possibility that an external object does not enter the surveillance area (e.g., at night, etc.), the feedback result for the recognition result of the object is "false ( false)", etc. may be set in advance as an execution condition capable of determining whether additional learning is required.

또는, 영상 분석 서버장치(120)의 학습부(122)는 촬영 영상 중 감시 대상 객체가 존재하지 않는 배경 영상을 선정한 후, 선정된 배경 영상을 이용하여 감시 영역에 대한 촬영이 반영되도록 객체 탐지부(121)를 추가 학습시킬 수 있다. 여기서, 학습부(122)는 촬영 영상을 프레임 단위로 비교하여 변화가 발생한 프레임의 이전 프레임을 후보 영상으로 선정한 후, 후보 영상과 기 설정된 기준 영상과의 비교 결과에 기초하여 후보 영상 중 신규 배경 영상을 선정할 수 있다. 아울러, 학습부(122)는 신규 배경 영상의 선정을 배경 영상이 기 설정된 개수만큼 선정될 때까지 촬영 영상의 프레임 시간순서에 따라 반복하고, 신규 배경 영상은 다음의 프레임 시간순서에서 신규 배경 영상의 선정시 기준 영상으로 설정될 수 있다. 즉, 배경 영상이 기 설정된 개수만큼 선정될 때까지 촬영 영상을 시간 순서에 따라 분석하여 배경 영상을 추가로 선정할 수 있다. 예를 들어, 학습부(122)는 촬영 영상에서 흑백 성분을 추출하거나 촬영 영상을 흑백 영상으로 변환한 후 기 설정된 간격의 프레임 단위로 분석하기 위하여, 예컨대 매 프레임마다 프레임 단위로 비교하기 위하여 흑백 영상에 대하여 프레임간 차분 영상을 획득할 수 있다. 그리고, 학습부(122)는 임의 프레임의 프레임간 차분 영상 내에서 객체의 움직임 발생이 파악되면 임의 프레임의 직전 프레임을 후보 영상으로 선정할 수 있다. 그리고, 학습부(122)는 후보 영상이 선정되기 전에 기 설정된 시간 동안의 평균 움직임 값을 계산하고, 계산된 평균 움직임 값과 기 설정된 임계 움직임 값의 비교 결과에 기초하여 신규 배경 영상으로의 선정 여부를 결정할 수 있다. 또한, 학습부(122)는 후보 영상이 선정되면 기 설정된 시간 동안 다음 후보 영상을 선정하지 않고 휴지할 수 있다.Alternatively, the learning unit 122 of the image analysis server device 120 selects a background image in which the object to be monitored does not exist among the captured images, and then uses the selected background image to reflect the shooting of the surveillance area. (121) can be further studied. Here, the learning unit 122 compares the captured image frame by frame, selects the previous frame of the frame in which the change has occurred as a candidate image, and then, based on the comparison result between the candidate image and a preset reference image, a new background image among the candidate images Can be selected. In addition, the learning unit 122 repeats the selection of a new background image according to the frame time order of the captured image until the background image is selected as many as a preset number, and the new background image is selected from the new background image in the next frame time order. When selected, it can be set as a reference image. That is, a background image may be additionally selected by analyzing captured images according to a time sequence until a predetermined number of background images are selected. For example, the learning unit 122 extracts a black-and-white component from a captured image or converts a captured image to a black-and-white image, and then analyzes it in a frame unit of a preset interval, for example, in order to compare a black and white image in a frame unit A difference image between frames may be obtained for. In addition, the learning unit 122 may select a frame immediately preceding the arbitrary frame as a candidate image when it is determined that the motion of the object occurs in the inter-frame difference image of the arbitrary frame. Then, the learning unit 122 calculates an average motion value for a preset time before a candidate image is selected, and whether to select a new background image based on a comparison result of the calculated average motion value and a preset threshold motion value. Can be determined. In addition, when a candidate image is selected, the learning unit 122 may pause without selecting a next candidate image for a preset time.

한편, 영상 분석 서버장치(120)의 학습부(122)는 영상 감시장치(110)의 촬영 영상으로부터 획득한 감시 영역에 대한 촬영을 네거티브 세트(negative set)로 학습을 진행하여 과적합(overfitting)하는 학습법을 수행할 수 있다. 예를 들어, 영상 감시장치(110)에 의한 감시 영역에 대한 촬영 영상의 수가 "1"이면 영상 감시 및 분석 시스템(100)을 감시 영역에 설치하고 있는 중 또는 설치한 직후일 가능성이 매우 높기 때문에 최초의 촬영 영상에는 객체 탐지 및 인식의 대상이 되는 원하는 객체가 없는 상태로 간주함으로써, 추가 학습을 통해 객체 탐지부(121)의 딥 러닝 웨이트 파일을 영상 감시장치(110)에 의한 감시 영역의 촬영에 최적화되도록 미세하게 변형시킬 수 있다. 또는, 학습부(122)는 배경 영상을 선정한 경우, 배경 영상은 감지 대상 객체가 없기 때문에 배경 영상을 이용한 추가 학습을 통해 객체 탐지부(121)의 딥 러닝 웨이트 파일을 영상 감시장치(110)에 의한 감시 영역의 촬영에 최적화되도록 미세하게 변형시킬 수 있다.On the other hand, the learning unit 122 of the image analysis server device 120 learns to capture the surveillance area acquired from the captured image of the image monitoring device 110 as a negative set, thereby overfitting. You can perform the learning method that you do. For example, if the number of captured images for the surveillance area by the video surveillance device 110 is "1", it is very likely that the video surveillance and analysis system 100 is being installed in the surveillance area or immediately after installation. The first captured image assumes that there is no desired object to be detected and recognized, so that the deep learning weight file of the object detection unit 121 is captured by the image monitoring device 110 through additional learning. It can be finely modified to optimize it. Alternatively, when the learning unit 122 selects the background image, the background image does not have an object to be detected, and thus, the deep learning weight file of the object detection unit 121 is transferred to the image monitoring device 110 through additional learning using the background image. It can be finely modified to be optimized for shooting of the surveillance area by.

모니터링 장치(130)는 영상 감시장치(110)로부터 촬영된 영상을 수신하여 실시간으로 디스플레이 할 수 있다. 그리고, 모니터링 장치(130)는 영상 분석 서버장치(120)로부터 수신한 정보에 따라 경보 발생 등과 같은 적절한 이벤트를 발생시킬 수 있다. 또한, 모니터링 장치(130)는 영상 감시장치(110) 또는 영상 분석 서버장치(120)로부터 전송된 객체 인식 결과를 실시간으로 디스플레이 할 수 있고, 객체 인식 결과에 대한 운영자 등에 의한 판단 정보를 객체의 인식 결과에 대한 피드백으로서 영상 감시장치(110) 또는 영상 분석 서버장치(120)로 전송할 수 있다.The monitoring device 130 may receive an image captured from the image monitoring device 110 and display it in real time. In addition, the monitoring device 130 may generate an appropriate event such as an alarm according to information received from the image analysis server device 120. In addition, the monitoring device 130 can display the object recognition result transmitted from the image monitoring device 110 or the image analysis server device 120 in real time, and recognize the object recognition information about the object recognition result by an operator, etc. As feedback on the result, it may be transmitted to the image monitoring device 110 or the image analysis server device 120.

도 8은 본 발명의 제 3 실시예에 따른 영상 분석 서버장치(120)의 기계 학습 방법을 설명하기 위한 흐름도이다.8 is a flowchart illustrating a machine learning method of the image analysis server device 120 according to the third embodiment of the present invention.

도 1, 도 7 및 도 8을 참조하여 제 3 실시예에 따른 영상 분석 서버장치(120)의 기계 학습 방법에 대해 자세히 살펴보기로 한다.A machine learning method of the image analysis server device 120 according to a third embodiment will be described in detail with reference to FIGS. 1, 7 and 8.

먼저, 영상 감시 및 분석 시스템(100)을 유통 및 설치하기 전에 영상 감시장치(110)의 객체 탐지기 및 영상 분석 서버장치(120)의 객체 탐지부(121)가 영상 내에 포함된 객체를 탐지하도록 하기 위해 미리 설계된 신경망을 사전에 초기 학습시킨다. 예를 들어, 컨볼루션 신경망이나 순환 신경망 또는 컨볼루션 신경망과 순환 신경망의 조합 등 다양한 신경망을 이용하여 오프라인 러닝을 수행할 수 있고, 학습 결과로서 딥 러닝 웨이트 파일이 생성될 수 있다. 이처럼, 객체를 탐지하도록 사전 초기 학습시킨 객체 탐지기를 포함하는 영상 감시장치(110) 및 객체 탐지부(121)를 포함하는 영상 분석 서버장치(120)가 공장 등에서 제조되어 유통될 수 있고, 오프라인 러닝에 의해 사전 초기 학습된 상태가 초기화 상태이다.First, before distribution and installation of the image monitoring and analysis system 100, the object detector of the image monitoring device 110 and the object detection unit 121 of the image analysis server device 120 detect objects included in the image. For this, the neural network designed in advance is initially trained. For example, offline learning may be performed using various neural networks such as a convolutional neural network, a recurrent neural network, or a combination of a convolutional neural network and a recurrent neural network, and a deep learning weight file may be generated as a learning result. In this way, the image monitoring device 110 including the object detector that was initially trained to detect the object and the image analysis server device 120 including the object detection unit 121 may be manufactured and distributed in a factory, etc., and offline running The state that was initially learned in advance by is the initializing state.

그러면, 영상 감시장치(110)는 감시 영역에 대하여 영상을 촬영할 수 있는 상태에 놓이고, 영상 분석 서버장치(120)의 학습부(122)는 기 설정된 실행 조건에 기초하여 객체 탐지부(121)의 추가 학습이 필요한지 여부를 판단한다. 여기서, 실행 조건으로는 초기화 후 영상 감시장치(110)의 감시 영역에 대한 촬영 영상의 수, 촬영 시간 또는 객체의 인식 결과에 대한 모니터링 장치(130)의 피드백 결과 중 하나 이상이 기 설정될 수 있다. 예를 들어, 감시 영역에 대한 촬영 영상의 수가 "1"일 때가 실행 조건으로 사전에 설정된 경우, 영상 감시장치(110)가 감시 영역을 최초로 촬영(S810)하여 영상 분석 서버장치(120) 및 모니터링 장치(130)에 전송하였을 때에 학습부(122)는 실행 조건이 만족하는 것으로 파악하여 객체 탐지부(121)에 대한 추가 학습이 필요한 것으로 판단할 수 있다(S820).Then, the image monitoring device 110 is placed in a state capable of capturing an image for the surveillance area, and the learning unit 122 of the image analysis server device 120 is the object detection unit 121 based on a preset execution condition. Determine whether additional learning is necessary. Here, as an execution condition, one or more of the number of captured images for the surveillance area of the video monitoring device 110 after initialization, a shooting time, or a feedback result of the monitoring device 130 for the recognition result of the object may be preset. . For example, when the number of captured images for the surveillance area is set to "1" as the execution condition, the video surveillance device 110 first captures the surveillance area (S810), and the image analysis server device 120 and monitoring When the data is transmitted to the device 130, the learning unit 122 may determine that the execution condition is satisfied and thus further learning of the object detection unit 121 is required (S820 ).

학습부(122)가 객체 탐지부(121)에 대하여 수행하는 추가 학습은 영상 감시장치(110)에 의해 촬영된 감시 영역의 촬영 영상이 이용된다. 예를 들어, 촬영 영상으로부터 추출될 수 있는 감시 영역의 실내 구조, 실외 조명 상황, 실내 가구 배치 등과 같은 환경 정보가 객체 탐지부(121)에 적응적으로 추가 학습될 수 있다. 여기서, 학습부(122)는 영상 감시장치(110)의 촬영 영상으로부터 획득한 감시 영역에 대한 촬영을 네거티브 세트로 학습을 진행하여 과적합하는 온라인 러닝을 수행함으로써 새로운 과적합 모델을 생성할 수 있다. 영상 감시 및 분석 시스템(100)을 감시 영역에 설치하고 있는 중 또는 설치한 직후에 촬영된 영상 내에는 객체 탐지 및 인식의 대상이 되는 원하는 객체가 없는 상태로 간주하고, 네거티브 세트를 이용한 추가 학습을 통해 객체 탐지부(121)의 사전 초기 학습에 의해 생성되어 있던 딥 러닝 웨이트 파일을 영상 감시장치(110)에 의한 감시 영역의 촬영에 최적화되도록 미세하게 변형시킬 수 있다(S830).The additional learning performed by the learning unit 122 with respect to the object detection unit 121 uses a captured image of a surveillance area captured by the video monitoring device 110. For example, environmental information such as an indoor structure of a surveillance area, an outdoor lighting situation, and an indoor furniture arrangement that can be extracted from a captured image may be adaptively additionally learned by the object detector 121. Here, the learning unit 122 may generate a new overfitting model by performing online learning in which overfitting is performed by learning the photographing of the surveillance region acquired from the photographed image of the video surveillance device 110 as a negative set. The video surveillance and analysis system 100 is regarded as a state in which the desired object to be detected and recognized does not exist in the image captured during or immediately after the installation in the surveillance area, and additional learning using the negative set is performed. Through this, the deep learning weight file generated by the pre-initial learning of the object detection unit 121 may be finely transformed so as to be optimized for photographing the surveillance area by the image monitoring device 110 (S830).

그리고, 객체 탐지부(121)는 단계 S830의 추가 학습에 의해 생성된 과적합 모델을 영상 감시장치(110)로 전송하도록 통신부(123)를 제어할 수 있고, 통신부(123)는 객체 탐지부(121)에 의하여 단계 S830에서 생성된 과적합 모델을 객체 탐지부(121)의 제어에 따라 영상 감시장치(110)로 송신할 수 있다(S840).In addition, the object detection unit 121 may control the communication unit 123 to transmit the overfitting model generated by the additional learning in step S830 to the image monitoring device 110, and the communication unit 123 is an object detection unit ( In step 121), the overfit model generated in step S830 may be transmitted to the image monitoring apparatus 110 under the control of the object detection unit 121 (S840).

그러면, 영상 감시장치(110)는 영상 분석 서버장치(120)로부터 추가 학습의 결과물인 과적합 모델을 수신할 수 있고, 사전 초기 학습된 상태인 객체 탐지기는 추가 학습된 과적합 모델을 적용하는 온라인 러닝을 수행할 수 있다. 이로써, 영상 감시장치(110)는 감시 영역의 촬영에 최적화된다. 여기서, 영상 감시장치(110)가 복수의 감시 영역에서 각각 운용되는 경우에 서로 다른 감시 영역에 의한 서로 다른 실내 구조, 서로 다른 실외 조명 환경, 서로 다른 실내 가구 배치 등이 추가로 학습될 수 있다. 이후, 영상 감시장치(110)는 감시 영역의 촬영에 최적화된 상태에서 감시 영역의 영상을 촬영하고, 촬영 영상에서 원하는 객체를 탐지 및 인식하며, 객체 인식 결과를 영상 분석 서버장치(120) 및 모니터링 장치(130)로 전송할 수 있다.Then, the image monitoring device 110 may receive an overfitting model that is a result of the additional learning from the image analysis server device 120, and the object detector in a state initially learned in advance is online to apply the additionally learned overfitting model. You can run. Thereby, the video monitoring device 110 is optimized for photographing the surveillance area. Here, when the image monitoring apparatus 110 is operated in a plurality of surveillance areas, different indoor structures, different outdoor lighting environments, different indoor furniture arrangements, etc. by different surveillance areas may be additionally learned. Thereafter, the image monitoring device 110 captures an image of the surveillance area in a state optimized for shooting of the surveillance area, detects and recognizes a desired object in the captured image, and monitors the image analysis server device 120 and the object recognition result. It can be transmitted to the device 130.

한편, 단계 S820에서 영상 분석 서버장치(120)의 학습부(122)는 영상 감시장치(110)에 의한 객체의 인식 결과에 대한 모니터링 장치(130)의 피드백 결과에 기초하여 객체 탐지부(121)의 추가 학습이 필요한 실행 조건이 만족하는지를 판단할 수 있다. 예를 들어, 모니터링 장치(130)로부터 객체의 인식 결과에 대한 피드백 결과로서, "거짓(false)"이 수신(S810)될 때에 기 설정된 실행 조건이 만족하는 것으로 파악하여 추가 학습이 필요한 것으로 판단할 수 있다(S820). 이후, 앞서 설명한 바와 같이 단계 S830 및 S840이 수행될 수 있다.On the other hand, in step S820, the learning unit 122 of the image analysis server device 120 is based on the feedback result of the monitoring device 130 for the recognition result of the object by the image monitoring device 110, the object detection unit 121 It can be determined whether the execution condition that requires additional learning of is satisfied. For example, when a "false" is received (S810) as a feedback result for the recognition result of an object from the monitoring device 130, it is determined that a preset execution condition is satisfied and further learning is necessary. Can be (S820). Thereafter, as described above, steps S830 and S840 may be performed.

도 9는 본 발명의 제 4 실시예에 따른 영상 분석 서버장치(120)의 기계 학습 방법을 설명하기 위한 흐름도이다.9 is a flowchart illustrating a machine learning method of the image analysis server device 120 according to the fourth embodiment of the present invention.

도 1, 도 7 및 도 9를 참조하여 본 발명의 제 4 실시예에 따른 영상 감시 및 분석 시스템(100)에 대하여 살펴보기로 한다.An image monitoring and analysis system 100 according to a fourth embodiment of the present invention will be described with reference to FIGS. 1, 7 and 9.

먼저, 영상 감시 및 분석 시스템(100)을 유통 및 설치하기 전에 영상 감시장치(110)의 객체 탐지기 및 영상 분석 서버장치(120)의 객체 탐지부(121)가 영상 내에 포함된 객체를 탐지하도록 하기 위해 미리 설계된 신경망을 사전에 초기 학습시킨다. 예를 들어, 컨볼루션 신경망이나 순환 신경망 또는 컨볼루션 신경망과 순환 신경망의 조합 등 다양한 신경망을 이용하여 오프라인 기계 학습을 수행할 수 있고, 학습 결과로서 딥 러닝 웨이트 파일이 생성될 수 있다. 이처럼, 객체를 탐지하도록 사전 초기 학습시킨 객체 탐지기를 포함하는 영상 감시장치(110) 및 객체 탐지부(121)를 포함하는 영상 분석 서버장치(120)가 공장 등에서 제조되어 유통될 수 있고, 오프라인 기계 학습에 의해 사전 초기 학습된 상태가 초기화 상태이다.First, before distribution and installation of the image monitoring and analysis system 100, the object detector of the image monitoring device 110 and the object detection unit 121 of the image analysis server device 120 detect objects included in the image. For this, the neural network designed in advance is initially trained. For example, offline machine learning may be performed using various neural networks, such as a convolutional neural network, a recurrent neural network, or a combination of a convolutional neural network and a recurrent neural network, and a deep learning weight file may be generated as a learning result. In this way, the image monitoring device 110 including an object detector that has been initially learned to detect an object and the image analysis server device 120 including the object detection unit 121 can be manufactured and distributed in a factory, etc. The state initially learned in advance by learning is the initial state.

그러면, 영상 감시장치(110)는 감시 영역에 대하여 영상을 촬영할 수 있는 상태에 놓이고, 영상 감시장치(110)는 감시 영역을 촬영하여 촬영 영상을 생성하며, 감시 영역에 대한 촬영 영상을 영상 분석 서버장치(120) 및 모니터링 장치(130)에 제공하고, 영상 분석 서버장치(120)의 통신부(123)는 영상 감시 장치(110)로부터 촬영 영상을 수신하여 객체 탐지부(121) 및 학습부(122)에 제공한다(S910).Then, the video monitoring device 110 is placed in a state capable of capturing an image for the surveillance area, the video monitoring device 110 generates a captured image by capturing the surveillance area, and analyzes the captured image for the surveillance area. It is provided to the server device 120 and the monitoring device 130, and the communication unit 123 of the image analysis server device 120 receives the captured image from the image monitoring device 110, and the object detection unit 121 and the learning unit ( 122) (S910).

그리고, 학습부(122)는 영상 감시 장치(110)에 의한 촬영 영상 중 감시 영역에 대한 촬영이 반영되도록 추가 학습을 시킬 때에 이용하기 위한 배경 영상을 선정하는 과정을 수행하게 된다. 배경 영상을 선정하는 과정이 필요한 이유에 대해서는 도 4를 참조하여 앞서 설명하였다.In addition, the learning unit 122 performs a process of selecting a background image to be used when additional learning is performed so that the photographing of the surveillance region is reflected among the photographed images by the video monitoring device 110. The reason why the process of selecting a background image is necessary has been described above with reference to FIG. 4.

이러한 배경 영상을 선정하는 과정을 살펴보면, 학습부(122)는 먼저 앞선 배경 영상 선정 과정 중에 배경 영상으로 선정된 영상의 개수가 기 설정된 배경 영상 개수를 만족하는지를 판단한다(S921). 이러한 단계 S921에서, 기 설정된 배경 영상의 개수보다 앞서 선정된 배경 영상의 개수가 작으면 배경 영상을 추가로 선정하는 과정을 수행하게 된다. 즉, 배경 영상의 추가 선정이 필요하다고 판단되는 경우, 예를 들어 배경 영상이 기 설정된 개수만큼 선정될 때까지 촬영 영상을 시간 순서에 따라 분석하여 배경 영상을 추가로 선정하는 절차, 즉 이후 설명될 단계 S930까지를 수행하게 된다.Looking at the process of selecting such a background image, the learning unit 122 first determines whether the number of images selected as background images during the previous background image selection process satisfies a preset number of background images (S921). In this step S921, if the number of background images selected before the preset number of background images is smaller, a process of additionally selecting a background image is performed. That is, when it is determined that additional selection of background images is necessary, for example, a procedure of additionally selecting a background image by analyzing captured images in chronological order until a preset number of background images is selected, that is, to be described later. Up to step S930 is performed.

학습부(122)는 단계 S910의 촬영 영상으로부터 흑백 성분을 추출하거나 촬영 영상을 흑백 영상으로 변환(S922)한 후 기 설정된 간격의 프레임 단위로 분석하기 위하여, 예컨대 매 프레임마다 프레임 단위로 비교하기 위하여 흑백 영상에 대하여 프레임간 차분 영상을 획득할 수 있다. 예를 들어, 도 6에 예시한 바와 같이 현재 프레임과 이전 프레임의 차이, 즉 움직이는 객체의 윤곽선 정도만 포함하는 프레임간 차분 영상이 획득될 수 있다(S923).The learning unit 122 extracts a black-and-white component from the captured image in step S910 or converts the captured image into a black-and-white image (S922), and then analyzes it in units of frames at preset intervals, for example, in order to compare frame by frame. An inter-frame difference image may be obtained for a black and white image. For example, as illustrated in FIG. 6, a difference image between frames including only the difference between the current frame and the previous frame, that is, the outline of the moving object, may be obtained (S923).

그리고, 학습부(122)는 임의 프레임의 프레임간 차분 영상 내에서 객체의 움직임 발생이 파악(S924)되면 임의 프레임의 이전 프레임들 중 샘플링을 통해 어느 한 프레임을 배경 영상의 후보 영상으로 선정할 수 있다. 예컨대, 학습부(122)는 객체의 움직임 발생이 파악된 임의 프레임의 직전 프레임을 배경 영상의 후보 영상으로 선정할 수 있다(S925).In addition, when the motion of the object is detected in the inter-frame difference image of the arbitrary frame (S924), the learning unit 122 may select one frame as a candidate image of the background image through sampling from the previous frames of the arbitrary frame. have. For example, the learning unit 122 may select a frame immediately preceding an arbitrary frame in which motion of an object is detected as a candidate image of the background image (S925).

여기서, 단계 S925에서 선정된 배경 영상의 후보 영상은 배경 영상으로 선정하기에 부적합한 영상이 포함되어 있을 수 있다. 예를 들어, 근시간 내에 객체의 움직임이 없다가 잠시 움직임이 발생 후 다시 움직임이 없는 패턴이 반복되었을 경우 단계 S925에서 객체가 포함된 영상이 후보 영상으로 선정될 수 있기 때문에, 이러한 후보 영상은 배경 영상으로 선정하기에 부적합하다고 할 수 있다. 이러한 부적합 영상을 필터링하기 위하여 학습부(122)는 단계 S925에서 배경 영상의 후보 영상이 선정된 경우, 후보 영상이 선정되기 전에 기 설정된 시간 동안(예컨대, 150초 동안)의 평균 움직임 값을 계산하고, 계산된 평균 움직임 값과 기 설정된 임계 움직임 값을 비교하며, 평균 움직임 값이 임계 움직임 값을 초과하는 경우에는 단계 S925에서 선정된 영상을 배경 영상의 후보 영상에서 제외시킬 수 있다(S926).Here, the candidate image of the background image selected in step S925 may include an image inappropriate to be selected as the background image. For example, if there is no movement of an object within a short period of time and then a motionless pattern is repeated after a brief movement occurs, an image containing an object may be selected as a candidate image in step S925. It can be said that it is not suitable to be selected as a video. In order to filter out such a nonconforming image, when the candidate image of the background image is selected in step S925, the learning unit 122 calculates an average motion value for a preset time (eg, for 150 seconds) before the candidate image is selected. , The calculated average motion value and a preset threshold motion value are compared, and when the average motion value exceeds the threshold motion value, the image selected in step S925 may be excluded from the candidate image of the background image (S926).

그리고, 학습부(122)는 단계 S926을 통과한 후보 영상이 존재하면, 기 설정된 시간 동안 다음 후보 영상을 선정하지 않고 휴지할 수 있다. 이는 휴지 시간(sleep time)을 갖지 않고 다음 후보 영상을 선정하게 되면, 이후 단계 S926을 다시 수행할 때에 평균 움직임 값이 임계 움직임 값을 초과하지 않더라도 매우 유사할 확률이 높기 때문에, 이처럼 배경 영상으로 선정하기에 부적합한 영상이 단계 S926을 통과하는 것을 방지하기 위한 것이다(S927).In addition, if there is a candidate image that has passed step S926, the learning unit 122 may pause without selecting a next candidate image for a preset time. This is because if the next candidate image is selected without having a sleep time, it is highly likely that the average motion value is very similar even if the average motion value does not exceed the threshold motion value when performing step S926 again. This is to prevent an image inappropriate for the following from passing through step S926 (S927).

다음으로, 학습부(122)는 단계 S926을 통과한 후보 영상과 기 설정된 기준 영상을 비교하고(S928), 후보 영상과 기준 영상의 차이가 기 설정된 임계값 이상 차이가 있는 경우(S929)에 단계 S926을 통과한 후보 영상을 신규 배경 영상으로 선정할 수 있다(S930). 여기서, 신규 배경 영상으로 선정된 영상은 다음의 프레임 시간순서에서 신규 배경 영상의 선정시 기준 영상으로 설정될 수 있다. 즉, 단계 S926을 통과한 후보 영상과 비교되는 기준 영상은 이미 선정된 배경 영상 중에서 가장 최근에 배경 영상으로 선정된 것이 이용될 수 있다. 예를 들어, 단계 S928 및 S929가 최초로 수행되는 경우에 기준 영상은 존재하지 않을 수 있고, 이 경우에는 단계 S926을 통과한 후보 영상이 필연적으로 배경 영상으로 선정될 수 있다. 예를 들어, 단계 S929에서 후보 영상과 기준 영상을 비교할 때에 영상 비교를 위하여 SSIM(structural similarity) 알고리즘을 사용할 수 있다.Next, the learning unit 122 compares the candidate image that has passed step S926 with a preset reference image (S928), and if there is a difference between the candidate image and the reference image by more than a preset threshold (S929), A candidate image that has passed S926 may be selected as a new background image (S930). Here, the image selected as the new background image may be set as the reference image when the new background image is selected in the following frame time order. That is, the reference image to be compared with the candidate image passing through step S926 may be the one selected as the most recently selected background image among the previously selected background images. For example, when steps S928 and S929 are performed for the first time, the reference image may not exist, and in this case, the candidate image passing through step S926 may inevitably be selected as the background image. For example, when comparing the candidate image and the reference image in step S929, a structural similarity (SSIM) algorithm may be used for image comparison.

단계 S922 내지 단계 S930이 필요한 만큼 반복되어, 단계 S921에서 배경 영상으로 선정된 영상의 개수가 기 설정된 배경 영상 개수를 만족하면, 학습부(122)는 선정된 배경 영상을 이용하여 감시 영역에 대한 촬영이 반영되도록 객체 탐지부(121)를 추가 학습시킨다. 예를 들어, 학습부(122)는 감시 대상 객체가 존재하지 않는 배경 영상으로부터 획득한 감시 영역에 대한 촬영을 네거티브 세트로 학습을 진행하여 과적합하는 온라인 러닝을 수행할 수 있다. 예를 들어, 배경 영상으로부터 추출될 수 있는 감시 영역의 실내 구조, 실외 조명 상황, 실내 가구 배치 등과 같은 환경 정보가 객체 탐지부(121)에 적응적으로 추가 학습될 수 있다. 이로써, 사전 초기 학습에 의해 생성되어 있던 초기 학습 웨이트 파일이 영상 감시장치(110)에 의한 감시 영역의 촬영에 최적화되도록 미세하게 변형되어 추가 학습 웨이트 파일이 생성될 수 있다(S940).Steps S922 to S930 are repeated as many times as necessary, and if the number of images selected as the background image in step S921 satisfies the preset number of background images, the learning unit 122 photographs the surveillance area using the selected background image. The object detection unit 121 is additionally trained to reflect this. For example, the learning unit 122 may perform online learning in which the monitoring target object is acquired from a background image in which the object to be monitored does not exist, and learns the photographing of the surveillance region as a negative set. For example, environmental information such as an indoor structure of a surveillance area, an outdoor lighting situation, an indoor furniture arrangement, etc. that can be extracted from a background image may be adaptively additionally learned by the object detector 121. As a result, the initial learning weight file generated by the pre-initial learning may be finely transformed to be optimized for photographing the surveillance area by the image monitoring device 110 to generate an additional learning weight file (S940).

그리고, 객체 탐지부(121)는 단계 S940의 추가 학습에 의해 생성된 과적합 모델을 영상 감시장치(110)로 전송하도록 통신부(123)를 제어할 수 있고, 통신부(123)는 객체 탐지부(121)에 의하여 단계 S940에서 생성된 과적합 모델을 객체 탐지부(121)의 제어에 따라 영상 감시장치(110)로 송신할 수 있다(S950).In addition, the object detection unit 121 may control the communication unit 123 to transmit the overfit model generated by the additional learning in step S940 to the image monitoring device 110, and the communication unit 123 is an object detection unit ( In operation 121), the overfit model generated in step S940 may be transmitted to the image monitoring apparatus 110 under the control of the object detection unit 121 (S950).

도 10은 본 발명의 실시예들에 따른 기계 학습 방법에 의한 객체 탐지 결과를 예시한 도면으로서, 왼쪽은 사전 초기 학습(오프라인 기계 학습) 상태의 객체 탐지 결과이고, 오른쪽은 사전 초기 학습(오프라인 기계 학습) 및 추가 학습(온라인 기계 학습) 상태의 객체 탐지 결과이다. 사전 초기 학습 시에는 수십 만장의 사람 영상을 이용해 딥 러닝을 수행하였고, 추가 학습 시에는 감시 영역에 대한 1장의 촬영 영상을 이용해 약 10번에 걸쳐서 반복 학습하는 딥 러닝을 수행하였다. 사전 초기 학습만 수행한 경우에는 왼쪽 하단에서 사람이 아닌 객체에 대해 사람으로 오류 인식하였으나 추가 학습을 수행한 경우에는 이러한 오류 인식이 발생하지 않았다.10 is a diagram illustrating an object detection result by the machine learning method according to embodiments of the present invention, the left is the object detection result in the pre-initial learning (offline machine learning) state, the right is the pre-initial learning (offline machine learning) Learning) and additional learning (online machine learning). During the initial preliminary learning, deep learning was performed using hundreds of thousands of human images, and when additional learning was performed, deep learning was performed over 10 times using one photographed image for the surveillance area. In the case of performing only the preliminary initial learning, an object other than a human was recognized as a human in the lower left corner, but when additional learning was performed, such error recognition did not occur.

도 11 및 도 12는 본 발명의 실시예들에 따른 기계 학습 방법에 이용된 영상을 예시한 도면이다. 도 3을 참조하여 설명한 제 1 실시예에 따른 기계 학습 방법에 따라 도 11에 예시한 것과 동일한 5장의 영상을 이용하여 추가 기계 학습을 수행하고, 도 4를 참조하여 설명한 제 2 실시예에 따른 기계 학습 방법에 따라 도 11에 예시한 것과 동일한 1장의 영상과 도 12에 예시한 것과 동일한 1장의 영상을 이용하여 추가 기계 학습을 수행하였다. 객체 탐지 성능을 지표라 할 수 있는 AP(Average Precision) 성능이 제 1 실시예는 0.8311로 측정되었고 제 2 실시예는 0.9105로 측정되었다.11 and 12 are diagrams illustrating images used in a machine learning method according to embodiments of the present invention. According to the machine learning method according to the first embodiment described with reference to FIG. 3, additional machine learning is performed using the same five images as illustrated in FIG. 11, and the machine according to the second embodiment described with reference to FIG. Depending on the learning method, additional machine learning was performed using the same image as that illustrated in FIG. 11 and the same image as that illustrated in FIG. 12. The average precision (AP) performance, which can be referred to as the object detection performance as an index, was measured as 0.8311 in the first embodiment and 0.9105 in the second embodiment.

지금까지 설명한 바와 같이 본 발명의 실시예들에 의하면, 영상으로부터 객체를 탐지하도록 사전 초기 학습된 객체 탐지기에 대하여 감시 영역에 대한 촬영 영상을 이용해 감시 영역의 촬영을 추가로 학습시킨다. 이로써, 객체 탐지 기능이 감시 영역의 촬영에 최적화되어 객체 검출 및 인식의 정확성이 크게 향상되는 효과가 있다.As described so far, according to the embodiments of the present invention, the object detector, which is initially learned to detect an object from an image, is additionally learned to capture the surveillance area by using the captured image for the surveillance area. As a result, the object detection function is optimized for photographing the surveillance area, and thus the accuracy of object detection and recognition is greatly improved.

본 발명에 첨부된 각 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 기록매체에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 기록매체에 저장된 인스트럭션들은 흐름도의 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.Combinations of each step in each flowchart attached to the present invention may be performed by computer program instructions. Since these computer program instructions can be mounted on the processor of a general purpose computer, special purpose computer or other programmable data processing equipment, the instructions executed by the processor of the computer or other programmable data processing equipment are the functions described in each step of the flowchart. Will create a means of doing things. These computer program instructions can also be stored on a computer-usable or computer-readable recording medium that can be directed to a computer or other programmable data processing equipment to implement a function in a specific manner, so that the computer-readable or computer-readable medium. It is also possible to produce an article of manufacture containing instruction means for performing the functions described in each step of the flow chart with instructions stored on the recording medium. Since computer program instructions can also be mounted on a computer or other programmable data processing equipment, a series of operating steps are performed on a computer or other programmable data processing equipment to create a computer-executable process to create a computer or other programmable data processing equipment. It is also possible for the instructions to perform the processing equipment to provide steps for executing the functions described in each step of the flowchart.

또한, 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시예들에서는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.In addition, each step may represent a module, segment, or part of code that contains one or more executable instructions for executing the specified logical function(s). In addition, it should be noted that in some alternative embodiments, the functions mentioned in the steps may occur out of order. For example, two steps shown in succession may in fact be performed substantially simultaneously, or the steps may sometimes be performed in the reverse order depending on the corresponding function.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and those of ordinary skill in the art to which the present invention pertains will be able to make various modifications and variations without departing from the essential characteristics of the present invention. Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention, but to explain the technical idea, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

본 발명의 실시예에 의하면, 객체의 탐지를 사전에 초기 학습된 객체 탐지기에 대하여 감시 영역의 촬영을 추가로 학습시켜 객체 탐지 기능을 감시 영역의 촬영에 최적화시킴으로써, 객체 검출 및 인식의 정확성이 크게 향상된다. 이러한 본 발명의 실시예들은 CCTV(Closed Circuit Television) 감시 시스템을 포함하여 침입 감지 및 보안 기술분야에 적용 및 이용할 수 있다.According to an embodiment of the present invention, by optimizing the object detection function to the photographing of the monitoring region by additionally learning the photographing of the monitoring region for the object detector, which has been initially learned to detect the object, the accuracy of object detection and recognition is greatly improved. Improves. The embodiments of the present invention can be applied and used in the field of intrusion detection and security technology, including a CCTV (Closed Circuit Television) monitoring system.

100: 영상 감시 및 분석 시스템
110: 영상 감시장치 111: 촬영부
112: 객체 탐지부 113: 학습부
114: 통신부
120: 영상 분석 서버장치 121: 객체 탐지부
122: 학습부 123: 통신부
130: 모니터링 장치100: video surveillance and analysis system
110: video monitoring device 111: photographing unit
112: object detection unit 113: learning unit
114: communication department
120: image analysis server device 121: object detection unit
122: Learning Department 123: Communication Department
130: monitoring device

Claims

As a video surveillance device that performs online machine learning,
A photographing unit that photographs the surveillance area and generates a photographed image;
An object detection unit that detects the monitoring target object in a captured image input from the photographing unit in a state in which it is initially learned to detect a monitoring target object in an image;
After selecting at least one background image among the captured images, comprising a learning unit for additionally learning the object detection unit so that the shooting environment for the surveillance area is reflected by using the selected background image.
Video surveillance device.

The method of claim 1,
The learning unit extracts an image in which the object to be monitored does not exist from the photographed image and selects it as the background image.
Video surveillance device.

The method of claim 1,
The learning unit analyzes the captured image in units of frames at a preset interval, selects the background image from the previous frames of the frame in which the change has occurred, and time the captured image until the background image is selected by a preset number. To further select the background image by analyzing according to the sequence
Video surveillance device.

The method of claim 3,
The learning unit selects a previous frame of the frame in which the change has occurred as a candidate image, and selects the candidate image as the background image based on an average motion value for a preset time based on the candidate image.
Video surveillance device.

The method of claim 4,
The learning unit is configured to detect motion in a difference image obtained by comparing black and white components of the photographed image to identify the frame in which the change has occurred.
Video surveillance device.

The method of claim 4,
The learning unit sets the background image most recently selected from the photographed image as a reference image, and selects the candidate image as the background image by additionally comparing the candidate image and the reference image.
Video surveillance device.

The method of claim 4,
When the candidate image is selected, the learning unit pauses without selecting a next candidate image for a preset time.
Video surveillance device.

The method of claim 1,
The learning unit learns the photographing environment for the surveillance area as a negative set using the background image to perform overfitting.
Video surveillance device.

As an online machine learning method of an object detector initially learned to detect an object to be monitored from an image,
The step of selecting a background image from among the photographed images captured in the surveillance area, and
And additionally learning the object detector to reflect the photographing environment for the surveillance area by using the selected background image.
Online machine learning method.

The method of claim 9,
The selecting of the background image includes extracting an image in which the object to be monitored does not exist from the photographed image and selecting it as the background image.
Online machine learning method.

The method of claim 9,
In the selecting of the background image, the captured image is analyzed in units of frames at a preset interval, and the background image is selected from the previous frames of the frame in which the change has occurred, until the background image is selected as many as a preset number. To further select the background image by analyzing the captured image according to a time sequence
Online machine learning method.

The method of claim 11,
In the selecting of the background image, the previous frame of the frame in which the change has occurred is selected as a candidate image, and the candidate image is selected as the background image based on an average motion value for a preset time based on the candidate image. doing
Online machine learning method.

The method of claim 12,
The selecting of the background image includes detecting motion in a difference image obtained by comparing the black and white components of the captured image to identify the frame in which the change has occurred.
Online machine learning method.

The method of claim 12,
The selecting of the background image includes setting the most recently selected background image from the captured image as a reference image, and selecting the candidate image as the background image by additionally comparing the candidate image with the reference image.
Online machine learning method.

The method of claim 12,
In the selecting of the background image, when the candidate image is selected, the next candidate image is not selected for a preset time and is paused.
Online machine learning method.

The method of claim 9,
The step of additionally learning the object detector may include overfitting by learning a photographing environment for the surveillance area in a negative set using the background image.
Online machine learning method.

As a computer-readable recording medium storing a computer program,
The computer program,
A computer-readable recording medium comprising instructions for causing a processor to perform the method according to any one of claims 9 to 16.

An object detection unit that has been pre-trained to detect an object to be monitored from an image,
After selecting a background image from the photographed images photographed in the surveillance region, the object detection unit further learns to reflect the photographing environment for the surveillance region by using the selected background image.
Video analysis device.

The method of claim 18,
The learning unit compares the captured image on a frame-by-frame basis and selects a previous frame of the frame in which the change has occurred as a candidate image, and then selects a new background image among the candidate images based on a result of comparing the candidate image with a preset reference image. Elected
Video analysis device.

The method of claim 18,
The learning unit performs learning of the environment for the surveillance area in a negative set using the background image to overfitting.
Video analysis device.