KR102415223B1

KR102415223B1 - Method for segmenting pedestrian from video, and apparatus for performing the same

Info

Publication number: KR102415223B1
Application number: KR1020200066860A
Authority: KR
Inventors: 조남익; 이상훈; 안석현; 정혜수
Original assignee: 서울대학교산학협력단
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2022-06-29
Also published as: KR20210150009A

Abstract

동영상으로부터 보행자를 검출하기 위한 방법 및 장치를 제공하며, 보행자 검출 방법은, 동영상을 구성하는 복수의 프레임들 중 적어도 일부에 대해서는 인공 신경망을 이용하여 보행자 영역을 검출하고, 나머지 프레임들에 대해서는 이전 프레임에 기초하여 보행자 영역을 추적하고 추적 결과를 이용하여 배경 중 일부를 제거하는 전처리를 수행한 후 인공 신경망을 이용하여 보행자 영역을 검출하고, 상기 검출된 보행자 영역을 모든 프레임들에 표시하는 단계 및 상기 보행자 영역이 표시된 프레임들로 구성된 동영상을 출력하는 단계를 포함한다.A method and apparatus for detecting a pedestrian from a video are provided, wherein the pedestrian detection method detects a pedestrian area using an artificial neural network for at least some of a plurality of frames constituting a video, and a previous frame for the remaining frames After performing pre-processing of tracking the pedestrian area based on the tracking result and removing a part of the background, detecting the pedestrian area using an artificial neural network, and displaying the detected pedestrian area in all frames; and and outputting a video composed of frames in which the pedestrian area is displayed.

Description

Method for detecting the area of a pedestrian from a video and an apparatus for performing the same

본 명세서에서 개시되는 실시예들은 CCTV 영상 등과 같은 동영상으로부터 보행자를 검출하는 방법 및 장치에 관한 것이다.Embodiments disclosed herein relate to a method and apparatus for detecting a pedestrian from a moving picture such as a CCTV image.

보행자 검출 기술은 영상 내에서 서 있거나 걷고 있는 사람의 영역을 검출하는 기술이다. 종래 보행자를 검출하는 기술들은 여러 가지가 있지만 크게 분류하면 직사각형 형태의 윈도우를 영상의 전체 영역에 순차적으로 통과시키며 보행자를 검출하는 방법들과, 콘볼루션 신경망을 이용하여보다 복잡한 과정으로 보행자를 검출하는 방법들이 있다. 전자의 경우 보행자 검출의 정확도가 낮아서 실용성이 떨어지고, 후자의 경우 검출 정확도는 매우 높으나 복잡한 콘볼루션 신경망을 통과시켜야 하므로 연산량이 많고 검출 속도가 낮아지는 단점이 있다.Pedestrian detection technology is a technology that detects the area of a person standing or walking in an image. There are various conventional techniques for detecting pedestrians, but if broadly classified, there are methods of sequentially passing a rectangular window through the entire area of an image to detect a pedestrian, and a more complex process using a convolutional neural network to detect a pedestrian. There are ways. In the former case, the accuracy of pedestrian detection is low, making it less practical. In the latter case, the detection accuracy is very high, but it has to pass through a complex convolutional neural network, so there is a disadvantage in that the amount of calculation is large and the detection speed is low.

최근에는 검출에 많은 시간이 걸리더라도 데이터셋을 통해 학습된 인공 신경망을 이용하여 동영상의 모든 프레임에 대하여 보행자를 검출하는 방법이 주류이다. 현재 신경망을 간단히 하여 연산량을 줄여 검출 속도를 높이면서도 높은 정확도를 유지하기 위한 많은 기술들이 개발되었고, 최근 뛰어난 성능의 GPU들이 개발되면서 인공 신경망의 느린 수행 속도를 많이 극복해왔지만, 여전히 현장 수요자들의 CCTV를 이용한 실시간 범죄수사의 요구를 만족시키기에는 부족한 수준이다.Recently, although the detection takes a lot of time, a method of detecting a pedestrian for all frames of a video using an artificial neural network learned through a dataset is mainstream. Currently, many technologies have been developed to simplify the neural network and reduce the amount of computation to increase the detection speed while maintaining high accuracy. It is insufficient to satisfy the needs of real-time criminal investigation using

관련하여 선행기술 문헌인 한국공개특허 제10-2016-0035463호에는 영상 데이터에서 보행자 후보군을 추출하고, 보행자 후보군 중 보행자를 검출하며, 이전 프레임에서 검출된 보행자와 현재 프레임에서 검출된 보행자를 비교하여 추적할 추적 보행자를 선정하는 내용이 개시되어 있다.In relation to this, in Korea Patent Publication No. 10-2016-0035463, a prior art document, a pedestrian candidate group is extracted from image data, a pedestrian is detected from among the pedestrian candidate group, and a pedestrian detected in the previous frame is compared with a pedestrian detected in the current frame. The content of selecting a tracked pedestrian to be tracked is disclosed.

한편, 전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.On the other hand, the above-mentioned background art is technical information that the inventor possessed for the purpose of derivation of the present invention or acquired during the derivation process of the present invention, and it cannot be said that it is necessarily known technology disclosed to the general public before the filing of the present invention. .

본 명세서에서 개시되는 실시예들은, 인공 신경망을 이용하여 CCTV 영상 등과 같은 동영상으로부터 보행자를 검출함에 있어서 연산량을 줄여 검출 속도는 높이면서도 검출 정확도를 일정 수준 이상으로 유지할 수 있는 방법 및 장치를 제공하고자 한다.Embodiments disclosed in the present specification provide a method and apparatus capable of maintaining detection accuracy above a certain level while increasing detection speed by reducing the amount of computation in detecting pedestrians from moving images such as CCTV images using an artificial neural network. .

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 일 실시예에 따르면, 보행자 검출 방법은, 동영상을 구성하는 복수의 프레임들 중 적어도 일부에 대해서는 인공 신경망을 이용하여 보행자 영역을 검출하고, 나머지 프레임들에 대해서는 이전 프레임에 기초하여 보행자 영역을 추적하고 추적 결과를 이용하여 배경 중 일부를 제거하는 전처리를 수행한 후 인공 신경망을 이용하여 보행자 영역을 검출하고, 상기 검출된 보행자 영역을 모든 프레임들에 표시하는 단계 및 상기 보행자 영역이 표시된 프레임들로 구성된 동영상을 출력하는 단계를 포함할 수 있다.As a technical means for achieving the above-described technical problem, according to an embodiment, a pedestrian detection method detects a pedestrian area using an artificial neural network for at least some of a plurality of frames constituting a video, and the remaining frames For example, after preprocessing is performed to track the pedestrian area based on the previous frame and remove a part of the background using the tracking result, the pedestrian area is detected using an artificial neural network, and the detected pedestrian area is displayed in all frames. and outputting a video composed of frames in which the pedestrian area is displayed.

다른 실시예에 따르면, 보행자 검출 방법을 수행하기 위한 컴퓨터 프로그램으로서, 보행자 검출 방법은, 동영상을 구성하는 복수의 프레임들 중 적어도 일부에 대해서는 인공 신경망을 이용하여 보행자 영역을 검출하고, 나머지 프레임들에 대해서는 이전 프레임에 기초하여 보행자 영역을 추적하고 추적 결과를 이용하여 배경 중 일부를 제거하는 전처리를 수행한 후 인공 신경망을 이용하여 보행자 영역을 검출하고, 상기 검출된 보행자 영역을 모든 프레임들에 표시하는 단계 및 상기 보행자 영역이 표시된 프레임들로 구성된 동영상을 출력하는 단계를 포함할 수 있다.According to another embodiment, as a computer program for performing a pedestrian detection method, the pedestrian detection method detects a pedestrian area using an artificial neural network for at least some of a plurality of frames constituting a moving picture, As for the pedestrian area based on the previous frame, preprocessing is performed to remove a part of the background using the tracking result, the pedestrian area is detected using an artificial neural network, and the detected pedestrian area is displayed in all frames. and outputting a video composed of frames in which the pedestrian area is displayed.

또 다른 실시예에 따르면, 보행자 검출 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체로서, 보행자 검출 방법은, 동영상을 구성하는 복수의 프레임들 중 적어도 일부에 대해서는 인공 신경망을 이용하여 보행자 영역을 검출하고, 나머지 프레임들에 대해서는 이전 프레임에 기초하여 보행자 영역을 추적하고 추적 결과를 이용하여 배경 중 일부를 제거하는 전처리를 수행한 후 인공 신경망을 이용하여 보행자 영역을 검출하고, 상기 검출된 보행자 영역을 모든 프레임들에 표시하는 단계 및 상기 보행자 영역이 표시된 프레임들로 구성된 동영상을 출력하는 단계를 포함할 수 있다.According to another embodiment, as a computer-readable recording medium on which a program for performing a method for detecting a pedestrian is recorded, the method for detecting a pedestrian uses an artificial neural network to detect a pedestrian area for at least some of a plurality of frames constituting a moving picture. detection, and for the remaining frames, the pedestrian area is tracked based on the previous frame, and preprocessing is performed to remove a part of the background using the tracking result. Then, the pedestrian area is detected using an artificial neural network, and the detected pedestrian area is It may include displaying in all frames and outputting a video composed of frames in which the pedestrian area is displayed.

또 다른 실시예에 따르면, 보행자 검출 장치는, 입출력부, 동영상으로부터 보행자를 검출하기 위한 프로그램이 저장되는 저장부 및 상기 프로그램을 실행함으로써 동영상으로부터 보행자를 검출하는 제어부를 포함하며, 상기 제어부는 상기 동영상을 구성하는 복수의 프레임들 중 적어도 일부에 대해서는 인공 신경망을 이용하여 보행자 영역을 검출하고, 나머지 프레임들에 대해서는 이전 프레임에 기초하여 보행자 영역을 추적하고 추적 결과를 이용하여 배경 중 일부를 제거하는 전처리를 수행한 후 인공 신경망을 이용하여 보행자 영역을 검출하고, 상기 검출된 보행자 영역을 모든 프레임들에 표시한 후, 상기 보행자 영역이 표시된 프레임들로 구성된 동영상을 상기 입출력부에 표시할 수 있다.According to another embodiment, the pedestrian detection apparatus includes an input/output unit, a storage unit storing a program for detecting a pedestrian from a moving image, and a controller configured to detect a pedestrian from a moving image by executing the program, wherein the controller includes the moving image. Pre-processing that detects a pedestrian area using an artificial neural network for at least some of a plurality of frames constituting After performing , a pedestrian area is detected using an artificial neural network, the detected pedestrian area is displayed on all frames, and a moving picture composed of frames in which the pedestrian area is displayed may be displayed on the input/output unit.

전술한 과제 해결 수단 중 어느 하나에 의하면, 동영상을 구성하는 복수의 프레임들 중 일부 프레임들에 대해서는 인공 신경망을 이용하여 보행자 영역을 검출함으로써 정확도를 높이고, 나머지 프레임들에 대해서는 추적 기법 및 전처리를 통해 배경을 제거한 후 인공 신경망에 통과시켜 보행자 영역을 검출함으로써 연산량을 줄여 처리 속도가 향상되는 효과를 기대할 수 있다.According to any one of the above-mentioned problem solving means, the accuracy is increased by detecting the pedestrian area using an artificial neural network for some of the plurality of frames constituting the moving picture, and for the remaining frames, the tracking technique and pre-processing are used to increase the accuracy. After removing the background, it is passed through an artificial neural network to detect the pedestrian area, thereby reducing the amount of computation and improving processing speed.

개시되는 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 개시되는 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.Effects obtainable in the disclosed embodiments are not limited to the above-mentioned effects, and other effects not mentioned are clear to those of ordinary skill in the art to which the embodiments disclosed from the description below belong. can be understood clearly.

도 1은 일 실시예에 따른 보행자 검출 장치의 구성을 도시한 도면이다.
도 2는 일 실시예에 따른 보행자 검출 방법에서 보행자 영역을 검출하는 두 가지 방법인 경계 박스 검출과 마스크 검출에 대해서 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 보행자 검출 방법에 따라서 추적 프레임에 대해서 보행자 영역을 검출하고 이를 프레임에 표시하는 과정을 설명하기 위한 도면이다.
도 4 및 도 5는 일 실시예에 따른 보행자 검출 방법을 설명하기 위한 순서도들이다.1 is a diagram illustrating a configuration of a pedestrian detection apparatus according to an embodiment.
FIG. 2 is a diagram for describing two methods of detecting a pedestrian area, a bounding box detection and a mask detection, in the pedestrian detection method according to an exemplary embodiment.
3 is a view for explaining a process of detecting a pedestrian area with respect to a tracking frame according to a pedestrian detection method according to an embodiment and displaying the pedestrian area on the frame.
4 and 5 are flowcharts for explaining a method of detecting a pedestrian according to an embodiment.

아래에서는 첨부한 도면을 참조하여 다양한 실시예들을 상세히 설명한다. 아래에서 설명되는 실시예들은 여러 가지 상이한 형태로 변형되어 실시될 수도 있다. 실시예들의 특징을 보다 명확히 설명하기 위하여, 이하의 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 관해서 자세한 설명은 생략하였다. 그리고, 도면에서 실시예들의 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, various embodiments will be described in detail with reference to the accompanying drawings. The embodiments described below may be modified and implemented in various different forms. In order to more clearly describe the characteristics of the embodiments, detailed descriptions of matters widely known to those of ordinary skill in the art to which the following embodiments belong are omitted. In addition, in the drawings, parts irrelevant to the description of the embodiments are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 구성이 다른 구성과 "연결"되어 있다고 할 때, 이는 ‘직접적으로 연결’되어 있는 경우뿐 아니라, ‘그 중간에 다른 구성을 사이에 두고 연결’되어 있는 경우도 포함한다. 또한, 어떤 구성이 어떤 구성을 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한, 그 외 다른 구성을 제외하는 것이 아니라 다른 구성들을 더 포함할 수도 있음을 의미한다.Throughout the specification, when a component is said to be "connected" with another component, it includes not only the case where it is 'directly connected', but also the case where it is 'connected with another component in between'. In addition, when a component "includes" a component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

이하 첨부된 도면을 참고하여 실시예들을 상세히 설명하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 보행자 검출 장치의 구성을 도시한 도면이다. 도 1을 참조하면, 일 실시예에 따른 보행자 검출 장치(100)는, 입출력부(110), 제어부(120) 및 저장부(130)를 포함할 수 있다.1 is a diagram illustrating a configuration of a pedestrian detection apparatus according to an embodiment. Referring to FIG. 1 , a pedestrian detection apparatus 100 according to an embodiment may include an input/output unit 110 , a control unit 120 , and a storage unit 130 .

보행자 검출 장치(100)는 입력 영상을 수신하여, 입력 영상에 포함된 보행자를 검출할 수 있다. 이때, 입력 영상은 예를 들어 CCTV 영상 등과 같은 동영상일 수 있고, 그 밖에도 다양한 종류의 영상일 수 있다.The pedestrian detection apparatus 100 may receive an input image and detect a pedestrian included in the input image. In this case, the input image may be, for example, a moving image such as a CCTV image, or various other types of images.

입출력부(110)는 사용자 입력 및 데이터 등을 수신하거나, 영상 및 음성 등을 출력하기 위한 구성이다. 일 실시예에 따르면, 입출력부(110)는 CCTV 카메라 등의 외부 장치로부터 촬영 영상 데이터를 수신하거나, 사용자로부터 보행자 영역 검출을 요청하는 입력을 수신할 수 있다. 또한 일 실시예에 따르면, 입출력부(110)는 보행자 영역을 검출하여 표시한 영상을 화면에 표시할 수 있다.The input/output unit 110 is configured to receive user input and data, or output images and audio. According to an embodiment, the input/output unit 110 may receive captured image data from an external device such as a CCTV camera or receive an input requesting detection of a pedestrian area from a user. Also, according to an embodiment, the input/output unit 110 may detect a pedestrian area and display the displayed image on the screen.

제어부(120)는 CPU 등과 같은 적어도 하나의 프로세서를 포함하는 구성으로서, 보행자 검출 장치(100)의 전반적인 동작을 제어한다. 특히, 제어부(120)는 저장부(130)에 저장된 보행자 검출 프로그램을 실행시킴으로써, 입출력부(110)를 통해 수신된 영상 데이터 또는 저장부(130)에 미리 저장된 영상 데이터로부터 보행자 영역을 검출하고, 검출된 보행자 영역을 영상에 표시하여 출력할 수 있다. 또한, 제어부(120)는 저장부(130)에 저장된 프로그램을 실행시킴으로써, 보행자 영역 검출에 사용하기 위한 인공 신경망을 형성할 수도 있다.The controller 120 is a configuration including at least one processor such as a CPU, and controls the overall operation of the pedestrian detection apparatus 100 . In particular, the controller 120 detects a pedestrian area from image data received through the input/output unit 110 or image data stored in advance in the storage unit 130 by executing the pedestrian detection program stored in the storage unit 130 , The detected pedestrian area can be displayed and outputted on the image. In addition, the control unit 120 may form an artificial neural network for use in detecting a pedestrian area by executing a program stored in the storage unit 130 .

일 실시예에 따르면, 제어부(120)는 동영상으로부터 보행자를 검출함에 있어서, 인공 신경망 및 추적 기법을 모두 활용하고, 배경 제거를 위한 전처리 또한 수행함으로써 검출 속도를 높이면서도 검출 정확도를 일정 수준 이상으로 유지할 수 있다. 제어부(120)가 동영상으로부터 보행자를 검출하는 구체적인 프로세스에 대해서는 아래에서 도 3을 참조하여 자세하게 설명한다.According to an embodiment, in detecting a pedestrian from a moving image, the controller 120 utilizes both an artificial neural network and a tracking technique, and also performs pre-processing for background removal, thereby increasing the detection speed while maintaining the detection accuracy above a certain level. can A detailed process by which the controller 120 detects a pedestrian from a video will be described in detail with reference to FIG. 3 below.

저장부(130)에는 다양항 종류의 프로그램 및 데이터가 저장될 수 있다. 특히, 저장부(130)에는 보행자 검출 대상이 되는 영상 데이터가 저장될 수 있으며, 또한 보행자 검출 프로그램이 저장되어 제어부(120)에 의해 실행될 수 있다.Various types of programs and data may be stored in the storage unit 130 . In particular, image data to be detected as a pedestrian may be stored in the storage unit 130 , and a pedestrian detection program may be stored and executed by the controller 120 .

보행자 검출을 위한 구체적인 실시예를 설명하기에 앞서 보행자 영역을 검출하는 구체적인 방법으로서 경계 박스(bounding box) 검출과 마스크(mask) 검출에 대해서 설명하고, 이어서 인공 신경망을 이용한 보행자 영역 검출 및 추적 기법을 이용한 보행자 영역 검출에 대해서 설명한다.Before describing a specific embodiment for pedestrian detection, bounding box detection and mask detection will be described as specific methods for detecting a pedestrian area, and then a pedestrian area detection and tracking technique using an artificial neural network will be described. The pedestrian area detection used will be described.

이하에서 ‘보행자 영역’을 검출한다고 함은 영상에서 보행자가 포함된 직사각형 형태의 영역을 검출하는 것과, 보행자를 픽셀 단위 영역으로 검출하는 것을 포함한다. 직사각형 형태의 영역을 검출하는 것을 ‘경계 박스’를 검출한다고 하고, 픽셀 단위 영역을 검출하는 것을 ‘마스크’를 검출한다고 한다.Hereinafter, detecting the 'pedestrian area' includes detecting a rectangular area including a pedestrian in an image and detecting the pedestrian as a pixel unit area. Detecting a rectangular area is said to detect a 'bounding box', and detecting a pixel unit area is said to detect a 'mask'.

도 2는 일 실시예에 따른 보행자 검출 방법에서 보행자 영역을 검출하는 두 가지 방법인 경계 박스 검출과 마스크 검출에 대해서 설명하기 위한 도면이다. FIG. 2 is a diagram for explaining two methods of detecting a pedestrian area, a bounding box detection and a mask detection, in the pedestrian detection method according to an exemplary embodiment.

도 2의 제1 영상(210)은 경계 박스를 검출한 결과가 표시된 영상이다. 제1 영상(210)에는 영상에 포함된 각각의 보행자들을 완전히 포함하면서, 주변의 배경도 일부 포함하는, 노란색 선으로 표시된 직사각형들이 표시되어 있는데 이러한 각각의 직사각형들을 경계 박스라고 한다. 즉, 경계 박스란 보행자를 완전히 포함하면서 배경도 일부 포함하는 직사각형 형태의 박스를 의미한다.The first image 210 of FIG. 2 is an image in which a result of detecting a bounding box is displayed. In the first image 210 , rectangles indicated by yellow lines that completely include each pedestrian included in the image and also partially include a surrounding background are displayed. Each of these rectangles is called a bounding box. That is, the bounding box refers to a rectangular box that completely includes pedestrians and also partially includes a background.

도 2의 제2 영상(220)은 마스크를 검출한 결과가 표시된 영상이다. 제2 영상(220)에서는 영상에 포함된 각각의 보행자들을 픽셀 단위로 검출하고, 픽셀 단위로 검출된 영역을 다양한 색상으로 표시하였다. 이때, 다양한 색상으로 표시된 영역 각각을 마스크라고 한다. 즉, 마스크란 보행자만을 포함하는 픽셀 단위의 영역을 의미한다.The second image 220 of FIG. 2 is an image in which a mask detection result is displayed. In the second image 220 , each pedestrian included in the image is detected in units of pixels, and areas detected in units of pixels are displayed in various colors. In this case, each of the regions displayed in various colors is referred to as a mask. That is, the mask means a pixel unit area including only pedestrians.

보행자를 인식하기 위한 특징을 미리 설정하고 영상을 구성하는 픽셀들을 분석함으로써 설정된 특징에 대응되는 객체를 보행자로 검출할 수 있다. 이러한 보행자 검출 기술은 다양한 방식이 존재하는데, 예를 들어 보행자의 특징에 관하여 미리 학습한 인공 신경망에 영상을 통과시킴으로써 보행자 영역을 검출할 수도 있다.An object corresponding to the set characteristic may be detected as a pedestrian by presetting a characteristic for recognizing a pedestrian and analyzing pixels constituting an image. There are various methods for detecting such a pedestrian. For example, a pedestrian area may be detected by passing an image through an artificial neural network previously learned about the pedestrian's characteristics.

객체 분할(object segmentation) 기능을 갖는 인공 신경망(e.g. Mask R-CNN)을 이용할 경우 영상에 포함된 보행자 영역을 픽셀 단위로 검출할 수 있다(마스크 검출). 보행자 영역을 픽셀 단위로 검출했다면, 해당 보행자를 포함하는 직사각형 형태의 경계 박스를 검출하는 것도 가능하므로, 객체 분할 기능을 갖는 인공 신경망을 이용할 경우 보행자 영역 검출 시 경계 박스 및 마스크를 모두 검출할 수 있다.When using an artificial neural network (e.g. Mask R-CNN) having an object segmentation function, the pedestrian area included in the image can be detected in units of pixels (mask detection). If the pedestrian area is detected in units of pixels, it is also possible to detect a rectangular bounding box including the corresponding pedestrian, so when an artificial neural network with object segmentation function is used, both the bounding box and the mask can be detected when detecting the pedestrian area. .

그런데 이와 같이 인공 신경망을 이용하여 동영상으로부터 보행자를 검출하는 경우, 보행자 영역을 픽셀 단위로 검출할 수 있고 검출 정확도도 높은 장점을 갖는 반면, 동영상을 구성하는 복수의 프레임들 각각에 대해서 인공 신경망이 픽셀 정보를 연산해야 하므로 많은 시간이 소요되는 단점이 있다.However, when a pedestrian is detected from a video using the artificial neural network as described above, the pedestrian area can be detected in units of pixels and has high detection accuracy, while the artificial neural network uses pixels for each of a plurality of frames constituting the video. There is a disadvantage in that it takes a lot of time because information has to be calculated.

추적 기법이란 과거의 위치 정보를 이용하여 보행자의 움직임을 추적하는 기법을 의미한다. 다시 말해, 연속된 프레임들 사이에서 보행자의 이동이 있는 경우, 프레임들간의 유사성에 기초하여 보행자를 추적하는 기법이다. 동영상에 포함된 보행자는 연속된 프레임간에 위치 또는 동작의 변화가 다소 있을 수는 있지만 그 변화의 정도가 크지 않으므로, 연속된 두 프레임들을 비교하여 유사한 영역을 찾아냄으로써 보행자를 추적하는 것이 가능하다.The tracking technique refers to a technique for tracking the movement of a pedestrian using past location information. In other words, when there is movement of the pedestrian between consecutive frames, it is a technique for tracking the pedestrian based on the similarity between the frames. A pedestrian included in a video may have some change in position or motion between consecutive frames, but the degree of change is not large. Therefore, it is possible to track a pedestrian by comparing two consecutive frames to find a similar area.

구체적으로 예를 들면, 어느 한 프레임에서 보행자 영역을 검출했다면, 검출된 보행자 영역의 위치 정보를 저장했다가, 바로 다음 프레임에 대한 보행자 영역 검출 시 이전 프레임에서 검출된 보행자 영역의 위치 정보를 바탕으로 가장 유사도(correlation)가 높은 영역을 보행자 영역으로 검출할 수 있다.Specifically, for example, if a pedestrian area is detected in one frame, the detected location information of the pedestrian area is stored, and when the pedestrian area is detected for the next frame, based on the location information of the pedestrian area detected in the previous frame An area having the highest correlation may be detected as a pedestrian area.

이하에서는 추적 기법을 이용한 보행자 영역 검출 시 경계 박스를 검출한다고 가정한다. 또한, 이하에서 설명되는 실시예들에서는 어느 특정한 추적 기법에 한정되지 않고, 기존에 알려진 모든 추적 기법이 사용될 수 있다.Hereinafter, it is assumed that a bounding box is detected when a pedestrian area is detected using a tracking technique. In addition, in the embodiments described below, it is not limited to any specific tracking technique, and all known tracking techniques may be used.

일반적으로 어느 하나의 프레임에 대해서 인공 신경망을 적용하여 보행자 영역을 검출하는 데에는, 추적 기법을 적용하여 이전 프레임과의 비교를 통해 보행자 영역을 검출하는 것보다 많은 시간이 소요된다. 따라서, 일 실시예에 따른 보행자 검출 장치(100)의 제어부(120)는 동영상을 구성하는 복수의 프레임들 중 일부의 프레임들에 대해서만 인공 신경망을 이용하여 보행자 영역에 대한 마스크를 검출하고, 나머지 프레임들에 대해서는 추적 기법을 적용하여 보행자 영역에 대한 경계 박스를 검출하고 전처리를 통해 배경 일부를 제거한 후 인공 신경망에 통과시켜 보행자 영역에 대한 마스크를 검출함으로써 처리 속도를 향상시킬 수 있다. 특히, 유사도가 높은 인접 프레임들 모두에 대해서 직접 인공 신경망을 이용하여 마스크를 검출함으로써 시간이 낭비되는 것을 방지할 수 있다.In general, it takes more time to detect a pedestrian area by applying an artificial neural network to one frame than to detect a pedestrian area by applying a tracking technique and comparing it with the previous frame. Accordingly, the controller 120 of the pedestrian detection apparatus 100 according to an embodiment detects a mask for the pedestrian area using an artificial neural network for only some frames among a plurality of frames constituting a moving picture, and the remaining frames The processing speed can be improved by applying a tracking technique to detect the bounding box for the pedestrian area, removing a part of the background through preprocessing, and passing it through an artificial neural network to detect the mask for the pedestrian area. In particular, it is possible to prevent wasting time by directly detecting a mask using an artificial neural network for all of the adjacent frames with high similarity.

일 실시예에 따르면, 제어부(120)는 미리 검출 주기를 설정하고, 동영상을 구성하는 복수의 프레임들을 검출 주기에 기초하여 검출 프레임과 추적 프레임으로 분류할 수 있다. 예를 들어 검출 주기가 10이라면, 제어부(120)는 제1 프레임은 검출 프레임, 제2 프레임 내지 제10 프레임은 추적 프레임, 제11 프레임은 검출 프레임, 제12 내지 제20 프레임은 추적 프레임과 같은 식으로 분류할 수 있다.According to an embodiment, the controller 120 may set a detection period in advance and classify a plurality of frames constituting the moving picture into a detection frame and a tracking frame based on the detection period. For example, if the detection period is 10, the control unit 120 controls the first frame as a detection frame, the second to tenth frames as the tracking frame, the 11th frame as the detection frame, and the 12th to 20th frames as the tracking frame. can be classified in this way.

이때, ‘검출 프레임’이란 프레임 자체를 인공 신경망에 통과시킴으로써 보행자 영역을 검출하는 프레임을 의미하고, ‘추적 프레임’이란 추적 기법 및 전처리를 통해 배경 일부를 제거한 후 인공 신경망에 통과시킴으로써 보행자 영역을 검출하는 프레임을 의미한다.In this case, the 'detection frame' means a frame that detects a pedestrian area by passing the frame itself through an artificial neural network, and the 'tracking frame' means a pedestrian area by passing it through an artificial neural network after removing a part of the background through a tracking technique and pre-processing. frame means.

제어부(120)는 각 프레임들에 대해서 보행자 영역을 검출함에 있어서, 순서대로 프레임을 읽고, 읽은 현재 프레임이 검출 프레임인지 아니면 추적 프레임인지 여부를 판단한 후, 판단 결과에 따라서 처리를 달리한다.In detecting the pedestrian area for each frame, the control unit 120 reads the frames in order, determines whether the read current frame is a detection frame or a tracking frame, and then performs different processing according to the determination result.

판단 결과, 현재 프레임이 검출 프레임이라면, 제어부(120)는 현재 프레임을 인공 신경망에 통과시킴으로써 보행자 영역을 검출할 수 있고, 이 경우 보행자 영역에 대한 경계 박스와 마스크를 모두 검출할 수 있다.As a result of the determination, if the current frame is a detection frame, the controller 120 may detect the pedestrian area by passing the current frame through the artificial neural network, and in this case, both the boundary box and the mask for the pedestrian area may be detected.

제어부(120)는 검출된 마스크를 현재 프레임에 표시할 수 있으며, 이후 프레임에 대한 추적 기법 적용 시 이용하기 위해 경계 박스의 위치 정보를 저장부(130)에 저장할 수 있다.The controller 120 may display the detected mask in the current frame, and may store location information of the bounding box in the storage 130 to be used when a tracking technique is applied to a subsequent frame.

한편 판단 결과, 현재 프레임이 추적 프레임이라면, 제어부(120)는 현재 프레임에 추적 기법을 적용하고 전처리를 수행한 후 인공 신경망에 통과시켜 보행자 영역을 검출할 수 있다. 제어부(120)가 추적 프레임으로부터 보행자 영역을 검출하는 구체적인 프로세스에 대해서는 아래에서 도 3을 참조하여 자세히 설명한다.Meanwhile, as a result of the determination, if the current frame is a tracking frame, the controller 120 may detect a pedestrian area by applying a tracking technique to the current frame, performing pre-processing, and passing it through an artificial neural network. A detailed process by which the controller 120 detects the pedestrian area from the tracking frame will be described in detail with reference to FIG. 3 below.

도 3은 일 실시예에 따른 보행자 검출 방법에 따라서 추적 프레임에 대해서 보행자 영역을 검출하고 이를 프레임에 표시하는 과정을 설명하기 위한 도면이다.3 is a view for explaining a process of detecting a pedestrian area with respect to a tracking frame according to a pedestrian detection method according to an embodiment and displaying the pedestrian area on the frame.

도 3을 참조하면, 제1 영상(310)이 현재 프레임으로 입력되었고, 제어부(120)는 현재 프레임은 추적 프레임이라고 판단했다면 현재 프레임에 추적 기법을 적용하여 제2 영상(320)에 표시된 바와 같이 경계 박스들을 검출한다. 예를 들어, 제어부(120)는 현재 프레임의 바로 이전 프레임에서 검출된 보행자 영역(경계 박스)의 위치 정보를 참고하여 슬라이딩 윈도우(sliding window) 방식으로 색상 정보가 가장 유사한 경계 박스를 검출할 수 있다. 일 실시예에 따르면, 제어부(120)는 Kernelized Correlation Filter(KCF) 기법을 이용하여 보행자를 추적할 수도 있다.Referring to FIG. 3 , if the first image 310 is input as a current frame and the controller 120 determines that the current frame is a tracking frame, the tracking technique is applied to the current frame as shown in the second image 320 . Detect bounding boxes. For example, the controller 120 may detect a bounding box having the most similar color information in a sliding window method with reference to location information of a pedestrian area (bounding box) detected in a frame immediately preceding the current frame. . According to an embodiment, the controller 120 may track the pedestrian using a Kernelized Correlation Filter (KCF) technique.

제어부(120)는 경계 박스들을 검출했으면, 이어서 현재 프레임으로부터 보행자 패치들(331-334)을 추출할 수 있다. 이때, ‘보행자 패치’란 프레임으로부터 경계 박스 부분만을 추출한 것을 의미한다.After detecting the bounding boxes, the controller 120 may extract pedestrian patches 331-334 from the current frame. In this case, the 'pedestrian patch' means extracting only the bounding box portion from the frame.

제어부(120)는 추출된 보행자 패치(경계 박스)들을 병합하여 하나의 이미지를 생성한다. 보행자 패치들을 병합하여 이미지를 생성하는 방법에 대해서 좀 더 자세히 설명하면, 추출된 보행자 패치들 중에서 세로 길이가 가장 긴 보행자 패치의 세로 길이를 기준으로 보행자 패치들을 병합함으로써 배경이 제거된 이미지를 얻을 수 있다. 이때, 제어부(120)는 보행자 패치들을 병합하는 과정에서 특정 인공 신경망의 입력조건에 맞게 제로 패딩하는 방법을 이용할 수도 있다.The controller 120 generates one image by merging the extracted pedestrian patches (boundary boxes). In a more detailed description of the method of generating an image by merging pedestrian patches, an image with a background removed can be obtained by merging pedestrian patches based on the vertical length of the pedestrian patch with the longest vertical length among the extracted pedestrian patches. have. In this case, in the process of merging the pedestrian patches, the controller 120 may use a method of zero-padding according to the input condition of a specific artificial neural network.

제어부(120)는 위와 같은 방법으로 생성한 제3 이미지(340)를 인공 신경망(350)에 입력으로 인가한다. 제3 이미지(340)를 제1 이미지(310)와 비교하면, 제3 이미지(340)는 보행자 주변의 일부 영역을 제외하고는 대부분의 배경이 제거되었으므로 인공 신경망(350)이 처리해야 할 연산량이 크게 줄어들어 처리에 소요되는 시간 역시 크게 단축될 수 있다.The controller 120 applies the third image 340 generated by the above method to the artificial neural network 350 as an input. Comparing the third image 340 with the first image 310 , since most of the background is removed from the third image 340 except for some areas around the pedestrian, the amount of computation to be processed by the artificial neural network 350 is As it is greatly reduced, the time required for processing can also be greatly reduced.

제어부(120)는 제3 이미지(340)를 인공 신경망(350)에 통과시키면 제4 이미지(360)에 표시된 바와 같이 보행자 영역에 대한 마스크를 검출할 수 있다. 이어서, 제어부(120)는 추출된 마스크들(371-374)을 제4 이미지(360)로부터 분리하고, 이를 원본 영상인 현재 프레임에 표시함으로써 제5 이미지(380)와 같이 마스크가 표시된 영상을 얻을 수 있다.When the third image 340 passes through the artificial neural network 350 , the controller 120 may detect a mask for the pedestrian area as indicated in the fourth image 360 . Next, the controller 120 separates the extracted masks 371-374 from the fourth image 360 and displays them in the current frame, which is the original image, to obtain an image in which the mask is displayed as in the fifth image 380 . can

제어부(120)는 이상 설명한 방법에 따라서 추적 프레임으로부터 보행자 영역에 대한 마스크를 검출하고, 검출된 마스크를 추적 프레임에 표시함으로써 보행자 영역을 픽셀 단위로 검출하는 속도를 높일 수 있다.The controller 120 may detect a mask for the pedestrian area from the tracking frame according to the method described above, and display the detected mask on the tracking frame, thereby increasing the speed of detecting the pedestrian area in units of pixels.

한편, 제어부(120)는 만약 검출 프레임에서 보행자 영역을 하나도 검출하지 못했다면 다음 검출 프레임까지의 추적 프레임들에 대해서는 아무런 연산을 수행하지 않고 빠르게 넘어가도록 함으로써 처리 속도를 더 증가시킬 수도 있다.Meanwhile, if none of the pedestrian areas are detected in the detection frame, the controller 120 may further increase the processing speed by quickly passing through the tracking frames until the next detection frame without performing any operation.

제어부(120)는 각각의 프레임들에 대해서 검출된 마스크를 표시한 후 이들을 포함하는 동영상을 입출력부(110)에 표시할 수 있고, 따라서 사용자는 동영상에 포함된 보행자들에 마스크가 표시된 영상을 확인할 수 있다.After displaying the mask detected for each frame, the controller 120 may display a video including them on the input/output unit 110, so that the user can check the image in which the mask is displayed on pedestrians included in the video. can

이하에서는 상술한 바와 같은 보행자 검출 장치(100)를 이용한 보행자 검출 방법을 설명한다. 도 4 및 도 5는 일 실시예에 따른 보행자 검출 방법을 설명하기 위한 순서도들이다. 도 4 및 도 5에 도시된 실시예에 따른 보행자 검출 방법은 도 1에 도시된 보행자 검출 장치(100)에서 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하에서 생략된 내용이라고 하더라도 도 1에 도시된 보행자 검출 장치(100)에 관하여 이상에서 기술한 내용은 도 4 및 도 5에 도시된 실시예에 따른 보행자 검출 방법에도 적용될 수 있다.Hereinafter, a method of detecting a pedestrian using the pedestrian detection apparatus 100 as described above will be described. 4 and 5 are flowcharts for explaining a method of detecting a pedestrian according to an embodiment. The pedestrian detection method according to the embodiment shown in FIGS. 4 and 5 includes steps that are time-series processed by the pedestrian detection apparatus 100 shown in FIG. 1 . Therefore, even if omitted below, the description above with respect to the pedestrian detection apparatus 100 illustrated in FIG. 1 may also be applied to the pedestrian detection method according to the embodiments illustrated in FIGS. 4 and 5 .

도 4를 참조하면, 401 단계에서 보행자 검출 장치(100)의 제어부(120)는 동영상을 구성하는 복수의 프레임들 중 적어도 일부에 대해서는 인공 신경망을 이용하여 보행자 영역을 검출하고, 나머지 프레임들에 대해서는 이전 프레임에 기초하여 보행자 영역을 추적하고 추적 결과를 이용하여 배경 중 일부를 제거하는 전처리를 수행한 후 인공 신경망을 이용하여 보행자 영역을 검출하고, 검출된 보행자 영역을 모든 프레임들에 표시한다. 401 단계에 포함되는 세부 단계들은 아래에서 도 5를 참조하여 설명한다.Referring to FIG. 4 , in step 401 , the controller 120 of the pedestrian detection apparatus 100 detects a pedestrian area using an artificial neural network for at least some of a plurality of frames constituting a moving picture, and for the remaining frames, The pedestrian area is tracked based on the previous frame, and a preprocessing of removing a part of the background is performed using the tracking result, and then the pedestrian area is detected using an artificial neural network, and the detected pedestrian area is displayed in all frames. Detailed steps included in step 401 will be described below with reference to FIG. 5 .

도 5를 참조하면, 501 단계에서 제어부(120)는 동영상을 구성하는 복수의 프레임들을, 미리 설정된 검출 주기에 기초하여 검출 프레임과 추적 프레임으로 분류한다. 이때, 검출 주기는 필요에 따라 적절하게 설정될 수 있는데, 검출 주기가 증가할수록 처리 속도는 증가하는 반면 검출 정확도는 감소하게 되며, 특히 검출 주기가 어느 정도 이상 증가한 다음에는 검출 주기 증가 시 처리 속도가 증가하는 정도에 비해 검출 정확도가 감소하는 정도가 훨씬 커질 수 있다. 따라서, 상황에 따라 적절한 검출 주기가 설정될 수 있다.Referring to FIG. 5 , in step 501 , the controller 120 classifies a plurality of frames constituting a moving picture into a detection frame and a tracking frame based on a preset detection period. At this time, the detection period may be appropriately set as needed. As the detection period increases, the processing speed increases while the detection accuracy decreases. In particular, after the detection period increases to a certain extent, the processing speed increases when the detection period increases The decrease in detection accuracy may be much greater than the increase. Accordingly, an appropriate detection period may be set according to the situation.

502 단계에서 제어부(120)는 현재 프레임이 검출 프레임인지 여부를 판단한다. 현재 프레임이 검출 프레임인지 여부를 판단하는 방법에 대해서 좀 더 자세히 설명하면, 제어부(120)는 프레임이 입력되면 현재 프레임의 순서를 확인하고, 확인된 현재 프레임의 순서 값을 검출 주기로 나눈 후 남은 나머지 값이 미리 설정된 수와 일치하는지 여부를 판단한다. 판단 결과, 나머지 값이 미리 설정된 수와 일치한다면 제어부(120)는 현재 프레임을 검출 프레임이라고 판단하고, 일치하지 않는다면 제어부(120)는 현재 프레임을 추적 프레임이라고 판단할 수 있다.In step 502, the controller 120 determines whether the current frame is a detection frame. In a more detailed description of the method of determining whether the current frame is a detection frame, when a frame is input, the controller 120 checks the order of the current frame, divides the checked order value of the current frame by the detection period, and then the remaining Determines whether the value matches a preset number. As a result of the determination, if the remaining values match the preset number, the controller 120 may determine that the current frame is a detection frame, and if not, the controller 120 may determine that the current frame is a tracking frame.

502 단계의 판단 결과 현재 프레임이 검출 프레임이라면, 503 단계로 진행하여 제어부(120)는 현재 프레임을 인공 신경망에 통과시켜 보행자 영역에 대한 경계 박스 및 마스크를 검출한다. 제어부(120)는 이어서, 504 단계에서는 검출된 경계 박스의 위치 정보를 저장부(130)에 저장하고, 505 단계에서는 검출된 마스크를 현재 프레임에 표시한다.If it is determined in step 502 that the current frame is the detection frame, in step 503, the controller 120 passes the current frame through the artificial neural network to detect the boundary box and mask for the pedestrian area. Next, the controller 120 stores the location information of the detected bounding box in the storage unit 130 in step 504 , and displays the detected mask in the current frame in step 505 .

반면 502 단계의 판단 결과 현재 프레임이 추적 프레임이라면, 506 단계로 진행하여 제어부(120)는 이전 프레임에서 검출된 경계 박스의 위치 정보에 기초하여 보행자 영역을 추적하여 경계 박스를 검출한다. 제어부(120)는 이어서, 507 단계에서는 검출된 경계 박스들을 현재 프레임으로부터 추출한 뒤 병합하여 하나의 이미지를 생성하고, 508 단계에서는 생성된 이미지를 인공 신경망에 통과시켜 보행자 영역에 대한 마스크를 검출한다.On the other hand, if it is determined in step 502 that the current frame is a tracking frame, the controller 120 detects the bounding box by tracking the pedestrian area based on the location information of the bounding box detected in the previous frame in step 506 . Then, in step 507, the detected bounding boxes are extracted from the current frame and merged to generate one image, and in step 508, the generated image is passed through an artificial neural network to detect a mask for the pedestrian area.

제어부(120)는 도 5의 순서도에 포함된 단계들 중 502 단계 내지 505 단계를 동영상을 구성하는 모든 프레임들에 대해서 반복하여 수행할 수 있다.The controller 120 may repeat steps 502 to 505 among the steps included in the flowchart of FIG. 5 for all frames constituting the moving picture.

다시 도 4로 돌아와서, 402 단계에서 제어부(120)는 보행자 영역이 표시된 프레임들로 구성된 동영상을 출력한다.Returning to FIG. 4 , in step 402 , the controller 120 outputs a moving picture composed of frames in which the pedestrian area is displayed.

이상에서 설명한 실시예에 따르면, 동영상을 구성하는 복수의 프레임들 중 일부 프레임들에 대해서는 인공 신경망을 이용하여 보행자 영역을 검출함으로써 정확도를 높이고, 나머지 프레임들에 대해서는 추적 기법 및 전처리를 통해 배경을 제거한 후 인공 신경망에 통과시켜 보행자 영역을 검출함으로써 연산량을 줄여 처리 속도가 향상되는 효과를 기대할 수 있다.According to the embodiment described above, the accuracy is improved by detecting the pedestrian area using an artificial neural network for some of the plurality of frames constituting the video, and the background is removed for the remaining frames through the tracking technique and pre-processing. After passing through the artificial neural network to detect the pedestrian area, it is possible to expect the effect of improving the processing speed by reducing the amount of computation.

이상의 실시예들에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC 와 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램특허 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다.The term '~ unit' used in the above embodiments means software or hardware components such as field programmable gate array (FPGA) or ASIC, and '~ unit' performs certain roles. However, '-part' is not limited to software or hardware. '~' may be configured to reside on an addressable storage medium or may be configured to refresh one or more processors. Accordingly, as an example, '~' indicates components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program patent code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로부터 분리될 수 있다.The functions provided in the components and '~ units' may be combined into a smaller number of elements and '~ units' or separated from additional components and '~ units'.

뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU 들을 재생시키도록 구현될 수도 있다.In addition, components and '~ units' may be implemented to play one or more CPUs in a device or secure multimedia card.

도 4 및 도 5를 통해 설명된 실시예에 따른 보행자 검출 방법은 컴퓨터에 의해 실행 가능한 명령어 및 데이터를 저장하는, 컴퓨터로 판독 가능한 매체의 형태로도 구현될 수 있다. 이때, 명령어 및 데이터는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 소정의 프로그램 모듈을 생성하여 소정의 동작을 수행할 수 있다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터 기록 매체일 수 있는데, 컴퓨터 기록 매체는 컴퓨터 판독 가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함할 수 있다. 예를 들어, 컴퓨터 기록 매체는 HDD 및 SSD 등과 같은 마그네틱 저장 매체, CD, DVD 및 블루레이 디스크 등과 같은 광학적 기록 매체, 또는 네트워크를 통해 접근 가능한 서버에 포함되는 메모리일 수 있다.The pedestrian detection method according to the embodiment described with reference to FIGS. 4 and 5 may also be implemented in the form of a computer-readable medium for storing instructions and data executable by a computer. In this case, the instructions and data may be stored in the form of program codes, and when executed by the processor, a predetermined program module may be generated to perform a predetermined operation. In addition, computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may be a computer recording medium, which is a volatile and non-volatile and non-volatile storage medium implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. It may include both volatile, removable and non-removable media. For example, the computer recording medium may be a magnetic storage medium such as HDD and SSD, an optical recording medium such as CD, DVD, and Blu-ray disc, or a memory included in a server accessible through a network.

또한 도 4 및 도 5를 통해 설명된 실시예에 따른 보행자 검출 방법은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 컴퓨터 프로그램(또는 컴퓨터 프로그램 제품)으로 구현될 수도 있다. 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있다. 또한 컴퓨터 프로그램은 유형의 컴퓨터 판독가능 기록매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)에 기록될 수 있다. Also, the pedestrian detection method according to the embodiment described with reference to FIGS. 4 and 5 may be implemented as a computer program (or computer program product) including instructions executable by a computer. The computer program includes programmable machine instructions processed by a processor, and may be implemented in a high-level programming language, an object-oriented programming language, an assembly language, or a machine language. . In addition, the computer program may be recorded in a tangible computer-readable recording medium (eg, a memory, a hard disk, a magnetic/optical medium, or a solid-state drive (SSD), etc.).

따라서 도 4 및 도 5를 통해 설명된 실시예에 따른 보행자 검출 방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 머더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다.Accordingly, the pedestrian detection method according to the embodiment described with reference to FIGS. 4 and 5 may be implemented by executing the above-described computer program by the computing device. The computing device may include at least a portion of a processor, a memory, a storage device, a high-speed interface connected to the memory and the high-speed expansion port, and a low-speed interface connected to the low-speed bus and the storage device. Each of these components is connected to each other using various buses, and may be mounted on a common motherboard or in any other suitable manner.

여기서 프로세서는 컴퓨팅 장치 내에서 명령어를 처리할 수 있는데, 이런 명령어로는, 예컨대 고속 인터페이스에 접속된 디스플레이처럼 외부 입력, 출력 장치상에 GUI(Graphic User Interface)를 제공하기 위한 그래픽 정보를 표시하기 위해 메모리나 저장 장치에 저장된 명령어를 들 수 있다. 다른 실시예로서, 다수의 프로세서 및(또는) 다수의 버스가 적절히 다수의 메모리 및 메모리 형태와 함께 이용될 수 있다. 또한 프로세서는 독립적인 다수의 아날로그 및(또는) 디지털 프로세서를 포함하는 칩들이 이루는 칩셋으로 구현될 수 있다.Here, the processor may process a command within the computing device, such as, for example, to display graphic information for providing a graphic user interface (GUI) on an external input or output device, such as a display connected to a high-speed interface. Examples are instructions stored in memory or a storage device. In other embodiments, multiple processors and/or multiple buses may be used with multiple memories and types of memory as appropriate. In addition, the processor may be implemented as a chipset formed by chips including a plurality of independent analog and/or digital processors.

또한 메모리는 컴퓨팅 장치 내에서 정보를 저장한다. 일례로, 메모리는 휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 다른 예로, 메모리는 비휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 또한 메모리는 예컨대, 자기 혹은 광 디스크와 같이 다른 형태의 컴퓨터 판독 가능한 매체일 수도 있다.Memory also stores information within the computing device. As an example, the memory may be configured as a volatile memory unit or a set thereof. As another example, the memory may be configured as a non-volatile memory unit or a set thereof. The memory may also be another form of computer readable medium, such as, for example, a magnetic or optical disk.

그리고 저장장치는 컴퓨팅 장치에게 대용량의 저장공간을 제공할 수 있다. 저장 장치는 컴퓨터 판독 가능한 매체이거나 이런 매체를 포함하는 구성일 수 있으며, 예를 들어 SAN(Storage Area Network) 내의 장치들이나 다른 구성도 포함할 수 있고, 플로피 디스크 장치, 하드 디스크 장치, 광 디스크 장치, 혹은 테이프 장치, 플래시 메모리, 그와 유사한 다른 반도체 메모리 장치 혹은 장치 어레이일 수 있다.In addition, the storage device may provide a large-capacity storage space to the computing device. A storage device may be a computer-readable medium or a component comprising such a medium, and may include, for example, devices or other components within a storage area network (SAN), a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory, or other semiconductor memory device or device array similar thereto.

상술된 실시예들은 예시를 위한 것이며, 상술된 실시예들이 속하는 기술분야의 통상의 지식을 가진 자는 상술된 실시예들이 갖는 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 상술된 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above-described embodiments are for illustration, and those of ordinary skill in the art to which the above-described embodiments pertain can easily transform into other specific forms without changing the technical idea or essential features of the above-described embodiments. you will understand Therefore, it should be understood that the above-described embodiments are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 명세서를 통해 보호 받고자 하는 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태를 포함하는 것으로 해석되어야 한다.The scope to be protected through this specification is indicated by the claims described below rather than the above detailed description, and it should be construed to include all changes or modifications derived from the meaning and scope of the claims and their equivalents. .

100: 보행자 검출 장치 110: 입출력부
120: 제어부 130: 저장부100: pedestrian detection device 110: input/output unit
120: control unit 130: storage unit

Claims

In the method of detecting a pedestrian in a video,
For at least some of a plurality of frames constituting a video, a bounding box and a mask for the pedestrian area are detected using an artificial neural network, and for the remaining frames, the pedestrian area is based on the previous frame. After preprocessing is performed to track the bounding box for , and to remove a part of the background using the tracking result, the mask for the pedestrian area is detected using an artificial neural network, and the mask for the detected pedestrian area is applied to all frames. to appear on; and
and outputting a video composed of frames in which a mask for the pedestrian area is displayed,
The bounding box for the pedestrian area means a box including the pedestrian and a part of the background, and the mask for the pedestrian area means a pixel unit area including only the pedestrian.

According to claim 1,
The step of displaying the mask for the detected pedestrian area in all frames comprises:
classifying a plurality of frames constituting the moving picture into a detection frame and a tracking frame based on a detection period;
determining whether the current frame is a detection frame;
If the current frame is a detection frame, the current frame is passed through the artificial neural network to detect a bounding box and mask for the pedestrian area. If the current frame is a tracking frame, a tracking technique is applied to the current frame and the preprocessing is performed. and then passing it through the artificial neural network to detect a mask for the pedestrian area; and
and displaying a mask for the detected pedestrian area in the current frame.

3. The method of claim 2,
The detecting step is
If the current frame is a detection frame,
detecting a bounding box and a mask for the pedestrian area by passing the current frame through the artificial neural network; and
and storing location information of the detected bounding box.

4. The method of claim 3,
The detecting step is
If the current frame is a tracking frame,
detecting a bounding box by tracking a pedestrian area based on location information of a bounding box detected in a previous frame of the current frame;
generating a single image by extracting the detected bounding box from the current frame and merging; and
and passing the generated image through the artificial neural network to detect a mask for the pedestrian area.

delete

3. The method of claim 2,
The step of determining whether the current frame is a detection frame includes:
checking the order of the current frame;
determining whether the remaining values after dividing the checked order value of the current frame by the detection period match a preset number; and
and determining that the current frame is a detection frame if the determination result matches, and determining that the current frame is a tracking frame if it does not match.

A computer-readable recording medium in which a program for performing the method according to claim 1 is recorded.

A computer program stored in a medium for performing the method according to claim 1 performed by the pedestrian detection device.

In the pedestrian detection device,
input/output unit;
a storage unit storing a program for detecting a pedestrian from a moving picture; and
A control unit for detecting a pedestrian from a video by executing the program,
The control unit detects a bounding box and a mask for the pedestrian area using an artificial neural network for at least some of the plurality of frames constituting the moving picture, and for the remaining frames, it is applied to the previous frame. Based on the tracking of the bounding box for the pedestrian area and pre-processing to remove a part of the background using the tracking result, a mask for the pedestrian area is detected using an artificial neural network, and the mask for the detected pedestrian area is After the mask is displayed on all frames, a video composed of frames in which the mask for the pedestrian area is displayed is displayed on the input/output unit,
The bounding box for the pedestrian area means a box including the pedestrian and a part of the background, and the mask for the pedestrian area means a pixel unit area including only the pedestrian.

10. The method of claim 9,
The control unit is
In displaying the mask for the detected pedestrian area in all frames,
A plurality of frames constituting the moving picture is classified into a detection frame and a tracking frame based on a detection period, it is determined whether the current frame is a detection frame, and if the current frame is a detection frame, the current frame is transmitted to the artificial neural network. The boundary box and mask for the pedestrian area are detected by passing through, and if the current frame is a tracking frame, a tracking technique is applied to the current frame, the pre-processing is performed, and then the mask for the pedestrian area is passed through the artificial neural network. After detection, the device characterized in that the mask for the detected pedestrian area is displayed in the current frame.

11. The method of claim 10,
The control unit is
If the current frame is a detection frame,
A device characterized in that by passing the current frame through the artificial neural network to detect a bounding box and a mask for the pedestrian area, and to store location information of the detected bounding box in the storage unit .

12. The method of claim 11,
The control unit is
If the current frame is a tracking frame,
A bounding box is detected by tracking a pedestrian area based on location information of a bounding box detected in a previous frame of the current frame, and the detected bounding box is extracted from the current frame and merged to generate a single image, An apparatus for detecting a mask for the pedestrian area by passing the generated image through the artificial neural network.

delete

11. The method of claim 10,
The control unit is
In determining whether the current frame is a detection frame,
Check the order of the current frame, determine whether the remaining values after dividing the checked order value of the current frame by the detection period match a preset number, and if the determination result matches, determine that the current frame is a detection frame and, if they do not match, it is determined that the current frame is a tracking frame.