KR20180071947A

KR20180071947A - Apparatus and method for processing image

Info

Publication number: KR20180071947A
Application number: KR1020170166214A
Authority: KR
Inventors: 조남익; 추성권; 이상훈
Original assignee: 서울대학교산학협력단
Priority date: 2016-12-20
Filing date: 2017-12-05
Publication date: 2018-06-28
Also published as: KR102070956B1

Abstract

The present invention relates to an image processing apparatus and method. The image processing apparatus according to the present embodiment includes a feature extraction part for extracting time information by applying a convolution filter to an input frame, a patch dividing part for dividing a feature map including the time information into pixel units to acquire a feature patch, a temporal information processing part for extracting temporal information on the feature patch, a probability classifying part for extracting one probability result for each pixel from the temporal information, and a background separation part for separating the foreground and background of the input frame based on the extracted probability result.

Description

[0001] APPARATUS AND METHOD FOR PROCESSING IMAGE [0002]

본 명세서에서 개시되는 실시예들은 영상 처리 장치 및 방법에 관한 것이다. 보다 상세하게는, 인공 신경망을 이용하여 동영상의 전경과 배경을 분리하는 영상 처리 장치 및 방법에 관한 것이다.Embodiments disclosed herein relate to an image processing apparatus and method. More particularly, the present invention relates to an image processing apparatus and method for separating foreground and background of a moving image by using an artificial neural network.

기존의 영상 처리 방법은 전경과 배경을 분리하기 위해 입력된 영상에 대해 이전 영상과 현재 영상을 단순히 비교한 비교값을 사용한다. 관련하여 선행기술문헌인 한국특허 제10-2013-0063963호에서는 현재의 영상과 이전의 영상을 비교하여 영상 간 변화값을 누적하여 누적된 영상 변화값을 생성하고, 누적된 영상 변화값을 이용하여 객체 이동에 의한 영상의 변화값을 추정하는 영상 처리 방법을 기재하고 있다.In the conventional image processing method, the input image is compared with the previous image and the current image in order to separate the foreground and background from each other. Korean Patent No. 10-2013-0063963, which is related to the prior art, compares the current image with the previous image, accumulates the change value between the images, generates the accumulated image change value, and uses the accumulated image change value Describes an image processing method for estimating a change value of an image by moving an object.

이를 포함한 기존의 영상 처리 방법들은 동영상의 앞부분을 배경모델의 학습구간으로 설정하여 전경분리 없이 모든 영역을 배경모델로 학습하고, 학습 구간 이후에서 입력 영상에 대해 전경과 배경을 분리, 및 배경 모델 갱신 등을 수행한다. 이때, 배경에 대한 학습, 분리, 및 갱신에 있어서 노이즈를 감소하고 성능을 높이기 위해서 다양한 기법을 활용하고 있으며, 노이즈 감소를 위해 유저가 직접 설계하며, 영상 특성마다 다른 파라미터를 사용한다. 이로 인해, 영상 처리를 위한 모든 부분을 유저가 직접 설계해야 하며, 복잡한 메모리 갱신 동작을 수행하고 있으며, CPU 연산 위주이기 때문에 속도가 제한될 수 있다.Conventional image processing methods including this method set the front part of the moving picture as a learning section of the background model so that all regions are divided into a background model without separating the foreground and the foreground and background are separated from the input image after the learning section, And so on. At this time, various techniques are used to reduce noise and improve performance in background learning, separation, and update. The user directly designes for noise reduction and uses different parameters for each image characteristic. Because of this, all parts for image processing must be designed by the user, complicated memory update operation is performed, and the speed can be limited because it is mainly focused on CPU operation.

이와 같이, 기존의 영상 처리 방법들은 비학습 기반의 영상 처리 방법을 이용함에 따라 유저가 의해 직접 설계해야 하므로 구현이 용이하지 않고, 하드웨어 자원 사용의 비효율로 인해 성능의 한계가 존재하는 문제점이 있었다.In this way, existing image processing methods are not easy to implement because they are directly designed by a user using a non-learning-based image processing method, and there is a limitation in performance due to inefficiency of use of hardware resources.

따라서, 근래에는 이러한 문제점을 해결하기 위한 장치 및 방법이 요구되고 있는 실정이다.Therefore, in recent years, apparatuses and methods for solving such problems have been demanded.

한편, 전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.On the other hand, the background art described above is technical information acquired by the inventor for the derivation of the present invention or obtained in the derivation process of the present invention, and can not necessarily be a known technology disclosed to the general public before the application of the present invention .

본 명세서에서 개시되는 실시예들은, 인공 신경망을 이용하여 동영상의 전경과 배경을 분리하는 영상 처리 장치 및 방법을 제시하는 데에 목적이 있다.SUMMARY OF THE INVENTION [0008] Embodiments disclosed in the present application are directed to an image processing apparatus and method for separating a foreground and a background of a moving image by using an artificial neural network.

본 명세서에서 개시되는 실시예들은, 인공 신경망을 이용하여 동영상의 시간적 정보를 추출하고, 추출된 시간적 정보를 활용하여 전경과 배경을 분리하는 영상 처리 장치 및 방법을 제시하는 데에 목적이 있다. The embodiments disclosed herein are directed to an image processing apparatus and method for extracting temporal information of a moving image by using an artificial neural network and separating foreground and background using the extracted temporal information.

본 명세서에서 개시되는 실시예들은, 인공 신경망을 이용하여 배경에 대한 학습, 분리, 또는 갱신을 유저의 개입없이 자동으로 학습할 수 있는 영상 처리 장치 및 방법을 제시하는 데에 목적이 있다.The embodiments disclosed in the present specification aim to provide an image processing apparatus and method capable of automatically learning, separating, or updating a background using an artificial neural network without user intervention.

본 명세서에서 개시되는 실시예들은, 시각 정보뿐만 아니라 시간적 정보를 활용할 수 있도록 인공 신경망을 학습하여 움직임 정보를 검출할 수 있는 영상 처리 장치 및 방법을 제시하는데 목적이 있다.It is an object of the present invention to provide an image processing apparatus and method capable of detecting motion information by learning an artificial neural network so as to utilize not only time information but also temporal information.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 일 실시예에 따르면, 영상 처리 장치는 입력 프레임에 콘볼루션 필터를 적용하여 시각 정보를 추출하는 특징 추출부, 상기 시각 정보를 포함한 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득하는 패치 분할부, 상기 특징 패치에 대해 시간적 정보를 추출하는 시간적 정보 처리부, 상기 시간적 정보로부터 픽셀 별로 하나의 확률 결과를 추출하는 확률 분류부, 및 추출된 확률 결과에 기초하여 상기 입력 프레임의 전경과 배경을 분리하는 배경 분리부를 포함한다.According to an embodiment of the present invention, there is provided an image processing apparatus including a feature extraction unit for extracting time information by applying a convolution filter to an input frame, , A temporal information processing unit for extracting temporal information with respect to the feature patch, a probability classification unit for extracting one probability result for each pixel from the temporal information, and a probability distribution unit And a background separator for separating the foreground and the background of the input frame.

또 다른 실시예에 따르면, 영상 처리 장치에 의해 수행되는 영상 처리 방법은, 입력 프레임에 콘볼루션 필터를 적용하여 시각 정보를 추출하는 단계, 상기 시각 정보를 포함한 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득하는 단계, 상기 특징 패치에 대해 시간적 정보를 추출하는 단계, 상기 시간적 정보로부터 픽셀 별로 하나의 확률 결과를 추출하는 단계, 및 추출된 확률 결과에 기초하여 상기 입력 프레임의 전경과 배경을 분리하는 단계를 포함한다.According to another embodiment, an image processing method performed by an image processing apparatus includes extracting time information by applying a convolution filter to an input frame, dividing a feature map including the time information into pixels, Extracting temporal information for the feature patch, extracting one probability result for each pixel from the temporal information, and separating the foreground and background of the input frame based on the extracted probability result .

전술한 과제 해결 수단 중 어느 하나에 의하면, 인공 신경망을 이용하여 동영상의 전경과 배경을 분리하는 영상 처리 장치 및 방법을 제시할 수 있다.According to any one of the above-mentioned means for solving the above-mentioned problems, an image processing apparatus and method for separating the foreground and background of a moving image by using an artificial neural network can be presented.

또한, 전술한 과제 해결 수단 중 어느 하나에 의하면, 인공 신경망을 이용하여 동영상의 시간적 정보를 추출하고, 추출된 시간적 정보를 활용하여 전경과 배경을 분리하는 영상 처리 장치 및 방법을 제시할 수 있다.In addition, according to any one of the above-mentioned problems, an image processing apparatus and method for extracting temporal information of a moving image by using an artificial neural network and separating foreground and background using the extracted temporal information can be presented.

또한, 전술한 과제 해결 수단 중 어느 하나에 의하면, 인공 신경망을 이용하여 배경에 대한 학습, 분리, 또는 갱신을 유저의 개입없이 자동으로 학습할 수 있는 영상 처리 장치 및 방법을 제시할 수 있다.Further, according to any one of the above-mentioned means for solving the above-mentioned problems, it is possible to provide an image processing apparatus and method capable of automatically learning, separating, or updating the background using the artificial neural network without user intervention.

또한, 전술한 과제 해결 수단 중 어느 하나에 의하면, 시각 정보뿐만 아니라 시간적 정보를 활용할 수 있도록 인공 신경망을 학습하여 움직임 정보를 검출할 수 있는 영상 처리 장치 및 방법을 제시할 수 있다.In addition, according to any one of the above-mentioned problem solving means, an image processing apparatus and method capable of detecting motion information by learning an artificial neural network so as to utilize not only time information but also temporal information can be presented.

개시되는 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the disclosed embodiments are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood from the following description, by those skilled in the art, .

도 1은 일 실시예에 따른 영상 처리 장치를 도시한 블록도이다.
도 2는 일 실시예에 따른 입력 프레임으로부터 시각 정보를 추출하는 과정을 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 특징맵으로부터 특징 패치를 추출하는 과정을 설명하기 위한 도면이다.
도 4는 일 실시에에 따른 시간적 정보를 처리하는 과정을 설명하기 위한 도면이다.
도 5는 도 4에 도시된 순환신경망의 내부 구조를 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 픽셀별로 확률 결과를 출력하는 과정을 설명하기 위한 도면이다.
도 7은 일 실시예에 따른 영상 처리 방법을 도시한 순서도이다.1 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment of the present invention.
FIG. 2 is a diagram for explaining a process of extracting time information from an input frame according to an embodiment.
FIG. 3 is a diagram for explaining a process of extracting a feature patch from a feature map according to an embodiment.
4 is a diagram for explaining a process of processing temporal information according to one embodiment.
5 is a diagram for explaining the internal structure of the circular neural network shown in FIG.
6 is a diagram illustrating a process of outputting a probability result on a pixel-by-pixel basis according to an exemplary embodiment.
7 is a flowchart illustrating an image processing method according to an embodiment.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . Also, when an element is referred to as "comprising ", it means that it can include other elements as well, without departing from the other elements unless specifically stated otherwise.

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 영상 처리 장치를 도시한 블록도이다.1 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment of the present invention.

도 1에 도시된 바와 같이, 영상 처리 장치(100)는 특징 추출부(110), 패치 분할부(120), 시간적 정보 처리부(130), 확률 분류부(140), 및 배경 분리부(150)를 포함한다.1, the image processing apparatus 100 includes a feature extracting unit 110, a patch dividing unit 120, a temporal information processing unit 130, a probability classifying unit 140, and a background separating unit 150, .

특징 추출부(110)는 입력 프레임으로부터 특징맵(feature map)으로 구성될 수 있는 시각 정보를 추출할 수 있다. 여기서, 입력 프레임은 동영상을 구성하는 복수의 프레임 중 하나일 수 있다.The feature extraction unit 110 may extract the time information that can be composed of feature maps from the input frame. Here, the input frame may be one of a plurality of frames constituting a moving picture.

특징 추출부(110)에서 추출된 특징맵은 입력 프레임 내 적어도 일부의 영역에 대한 시각 정보를 포함한다. 특징 추출부(110)는 특징맵의 획득을 위해 콘볼루션 필터를 이용할 수 있다. 여기서, 콘볼루션 필터는 콘볼루션신경망(Convolution Neural Network, 이하 'CNN'이라 칭하기로 함)을 이용하여 구현될 수 있다. 따라서, 특징 추출부(110)는 입력 프레임, 즉 알지비(RGB) 영상으로부터 CNN을 이용하여 배경분리에 필요한 콘볼루션 필터를 학습하고, 학습된 콘볼루션 필터를 사용하여 특징맵을 추출할 수 있다.The feature map extracted by the feature extraction unit 110 includes time information for at least a part of an input frame. The feature extraction unit 110 may use a convolution filter to acquire the feature map. Here, the convolution filter may be implemented using a Convolution Neural Network (CNN). Accordingly, the feature extraction unit 110 can learn the convolution filter necessary for background separation using the CNN from the input frame, i.e., the RGB image, and extract the feature map using the learned convolution filter .

패치 분할부(120)는 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득할 수 있다. 패치 분할부(120)는 특징맵이 복수의 레이어로 구성되는 경우, 복수의 레이어 각각에 대한 시각 정보를 모두 포함한 특징 패치로 분할할 수 있다. 패치 분할부(120)는 특징맵을 픽셀 단위로 분할하기 때문에 하나의 특징맵으로부터 복수의 특징 패치를 획득할 수 있다.The patch dividing unit 120 may divide the feature map into pixel units to obtain a feature patch. When the feature map is composed of a plurality of layers, the patch dividing unit 120 can divide the feature map into feature patches including all of the time information for each of the plurality of layers. Since the patch dividing unit 120 divides the feature map in units of pixels, it can acquire a plurality of feature patches from one feature map.

시간적 정보 처리부(130)는 특징 패치 각각에 대해 시간적 정보를 추출할 수 있다. 여기서, 시간적 정보 처리부(130)는 시간적 정보의 추출을 위해 순환신경망(Recurrent Neural Network, 이하 'RNN'이라 칭하기로 함)을 이용할 수 있다. 이때, 시간적 정보 처리부(130)는 이전 특징 패치로부터 추출된 정보를 이용하여 현재 특징 패치와 함께 연산하여 동영상에 포함된 시간적 정보를 추출한다. 또한, 시간적 정보 처리부(130)는 순환신경망의 내부 상태, 즉 셀 상태(cell state)를 갱신할 수 있다.The temporal information processing unit 130 can extract temporal information for each feature patch. Here, the temporal information processing unit 130 may use a Recurrent Neural Network (RNN) for extracting temporal information. At this time, the temporal information processing unit 130 extracts the temporal information included in the moving picture by calculating it together with the current feature patch using the information extracted from the previous feature patch. In addition, the temporal information processing unit 130 can update the internal state of the Cyclic Neural Network, that is, the cell state.

확률 분류부(140)는 시간적 정보로부터 픽셀별로 확률 결과를 추출할 수 있다. 이를 위해, 확률 분류부(140)는 전체연결신경망(FCNN: Fully Connected Neural Network, 이하 'FCNN'라 칭하기로 함)을 사용할 수 있다.The probability classifying unit 140 may extract a probability result for each pixel from the temporal information. For this, the probability classifier 140 may use a Fully Connected Neural Network (FCNN).

배경 분리부(150)는 추출된 확률 결과에 기초하여 입력 프레임 내에서 특정 객체에 대한 움직임을 검출할 수 있다. 이를 통해, 배경 분리부(150)는 입력 프레임 내에서 전경과 배경을 분리할 수 있다.The background separating unit 150 may detect a motion for a specific object in the input frame based on the extracted probability result. Accordingly, the background separator 150 can separate the foreground and the background from each other in the input frame.

이와 같이, 제안된 영상 처리 장치(100)는 인공신경망을 활용하여 영상을 분리할 수 있다. 상술한 바와 같이, 영상 처리 장치(100)는 동영상 내 특정 객체의 움직임을 검출하기 위해서 CNN, RNN, FCNN의 인공 신경망들이 차례로 결합된 구조를 이용할 수 있다. 특히, 영상 처리 장치(100)는 RNN 구조를 사용하여 동영상의 시각 정보에만 의존하지 않고, 시간적 정보를 활용하는 인공신경망의 학습을 가능하게 한다.As described above, the proposed image processing apparatus 100 can separate images using an artificial neural network. As described above, the image processing apparatus 100 may use a structure in which artificial neural networks of CNN, RNN, and FCNN are sequentially combined to detect motion of a specific object in a moving image. In particular, the image processing apparatus 100 enables the learning of the artificial neural network that utilizes the temporal information without using the visual information of the moving picture by using the RNN structure.

또한, 영상 처리 장치(100)는 인공신경망을 사용하여 배경에 대한 학습, 분리, 또는 갱신을 유저의 직접적인 설계 없이도 자동으로 학습이 가능하며, 파라미터 선택 등에 의한 성능 변화가 적다.Further, the image processing apparatus 100 can automatically learn, separate, or update the background without using the user's direct design using the artificial neural network, and the performance change due to parameter selection, etc. is small.

도 2는 일 실시예에 따른 입력 프레임으로부터 시각 정보를 추출하는 과정을 설명하기 위한 도면이다.FIG. 2 is a diagram for explaining a process of extracting time information from an input frame according to an embodiment.

도 2에 도시된 바와 같이, 특징 추출부(110)는 콘볼루션 필터를 사용하여 입력 프레임으로부터 시각 정보를 추출할 수 있으며, 이를 위해 CNN을 사용할 수 있다.As shown in FIG. 2, the feature extraction unit 110 may extract the time information from an input frame using a convolution filter, and may use CNN for this purpose.

특징 추출부(110)는 입력 프레임(210)의 일부 영역(211)에 대해 콘볼루션 필터(220)를 사용할 수 있으며, 특징맵(230)으로 구성될 수 있는 시각 정보(또는, CNN 특징)을 추출할 수 있다. 도 2에서는, 특징맵(230)은 네 개의 레이어(231, 232, 233, 234)를 포함하는 것으로 도시하였지만 이에 한정되지 않고, 특징맵(230)은 콘볼루션 필터를 사용하여 하나 이상의 레이어를 갖도록 다양한 개수로 설정될 수 있다.The feature extraction unit 110 can use the convolution filter 220 for a partial area 211 of the input frame 210 and can extract the time information (or CNN feature) that can be composed of the feature map 230 Can be extracted. 2, the feature map 230 is shown to include four layers 231, 232, 233, and 234, but the feature map 230 is not limited to having one or more layers using a convolution filter And can be set to various numbers.

특징 추출부(110)는 입력 영상(210)으로부터 배경분리에 필요한 콘볼루션 필터를 학습할 수 있다. 여기서, 특징 추출부(110)는 콘볼루션 필터로 필터 뱅크의 역할을 수행하여 복수의 특징맵(230)을 학습 및 생성할 수 있다. 이를 통해, 특징 추출부(110)는 배경 분리에 필요한 정보를 자동적으로 학습하여 추출할 수 있다.The feature extraction unit 110 may learn a convolution filter necessary for background separation from the input image 210. [ Here, the feature extraction unit 110 may learn and generate a plurality of feature maps 230 by performing a role of a filter bank with a convolution filter. Accordingly, the feature extraction unit 110 can automatically extract and extract information necessary for background separation.

한편, 상술한 CNN은 인공신경망의 하나로 영상들에 대한 학습을 통해 영상 처리를 위한 영상의 검색, 분류, 및 이해를 위한 다양한 기능을 지원할 수 있다. CNN은 입력의 모든 영역을 연결하여 학습하는 다른 인공 신경망들과 다른 구조를 갖기 때문에, 하나의 파라미터를 입력 프레임의 여러 영역에서 각각 사용할 수 있다. 이로 인해, CNN은 적은 파라미터로도 영상을 처리할 수 있으며, 영상이 공간적으로 다르더라도 특성은 변하지 않기 때문에 영상 분석 장치(100)에서 시각 정보의 추출에 사용할 수 있다.Meanwhile, the CNN described above can support various functions for searching, classifying, and understanding images for image processing through learning of images as one of artificial neural networks. Because CNN has a different structure from other artificial neural networks that learn by connecting all areas of input, one parameter can be used in various areas of the input frame. Therefore, the CNN can process the image with a small number of parameters, and even if the image is spatially different, the characteristics are not changed, so that the image analyzing apparatus 100 can be used for extracting the visual information.

도 3은 일 실시예에 따른 특징맵으로부터 특징 패치를 추출하는 과정을 설명하기 위한 도면이다.FIG. 3 is a diagram for explaining a process of extracting a feature patch from a feature map according to an embodiment.

도 3에 도시된 바와 같이, 패치 분할부(120)는 특징맵(230)의 패치 분할을 통해, 복수의 레이어들(231, 232, 233, 234) 각각에 대해서 패치 분할을 수행할 수 있다. 이를 통해, 패치 분할부(120)는 특징맵(230)으로부터 픽셀 단위의 특징 패치(240)를 획득할 수 있다. 여기서, 특징 패치(240)는 복수의 레이어들(231, 232, 233, 234) 각각의 픽셀 단위의 시각 정보를 포함할 수 있다. 또한, 특징맵(230)의 레이어의 개수는 L(여기서는 L=4)개로 구분될 수 있으며, L은 1이상의 자연수이다.As shown in FIG. 3, the patch dividing unit 120 can perform patch division for each of the plurality of layers 231, 232, 233, and 234 through patch division of the feature map 230. FIG. In this way, the patch dividing unit 120 can obtain the feature patch 240 in units of pixels from the feature map 230. [ Here, the feature patch 240 may include time information on a pixel-by-pixel basis of each of the plurality of layers 231, 232, 233, and 234. In addition, the number of layers of the feature map 230 may be divided into L (L = 4), and L is a natural number of 1 or more.

패치 분할부(120)는 특징맵(230)의 분할에 의해 픽셀 단위의 특징 패치들을 복수개 생성할 수 있다. 또한, 패치 분할부(120)는 영상 처리 속도를 고려하여 간격(stride)을 두고 분할할 수도 있다.The patch dividing unit 120 may generate a plurality of feature patches in units of pixels by dividing the feature map 230. [ In addition, the patch dividing unit 120 may divide the image into a plurality of strides in consideration of image processing speed.

한편, 패치 분할부(120)에서 분할되는 특징맵(230)의 레이어의 개수(L)는 증가할수록 전경과 배경을 분리하기 위한 시각 정보를 더욱 많이 포함할 수 있다. 이로 인해, 특징맵(230)에서 레이어의 개수가 증가하면, 전경과 배경의 분리에 따른 성능이 향상될 수 있다.On the other hand, as the number L of layers of the feature map 230 divided by the patch dividing unit 120 increases, more time information for separating foreground and background can be included. As a result, if the number of layers in the feature map 230 increases, the performance according to the separation of the foreground and the background can be improved.

도 4는 일 실시에에 따른 시간적 정보를 처리하는 과정을 설명하기 위한 도면이다.4 is a diagram for explaining a process of processing temporal information according to one embodiment.

도 4에 도시된 바와 같이, 시간적 정보 처리부(130)는 특징 패치(240)로부터 시간적 특징(260)를 추출할 수 있다. 여기서, 시간적 정보 처리부(130)는 시간의 경과에 따른 시각 정보의 흐름에 대한 정보이며, 전경과 배경을 분리하기 위한 예측값인 시간적 특징(260)을 추출할 수 있다. 여기서, 시간적 정보 처리부(130)는 RNN(250)을 이용하여 구현될 수 있다.As shown in FIG. 4, the temporal information processing unit 130 may extract the temporal feature 260 from the feature patch 240. Here, the temporal information processing unit 130 may extract the temporal feature 260, which is a prediction value for separating the foreground and the background, from the temporal information flow as time elapses. Here, the temporal information processing unit 130 may be implemented using the RNN 250. [

또한, 시간적 정보 처리부(130)는 픽셀 단위의 시간적 정보(260)를 출력할 수 있으며, 이전 입력에 의한 셀 상태(cell state)를 다음 입력으로 수신할 수 있다. 이를 통해, 시간적 정보 처리부(130)는 픽셀 단위의 특징 패치(240)로부터 추출된 특징 벡터를 이용하여, 전경과 배경의 분리에 대한 예측값인 시간적 정보(260)를 추출할 수 있다.Also, the temporal information processing unit 130 may output the temporal information 260 on a pixel-by-pixel basis, and may receive the cell state of the previous input as the next input. Accordingly, the temporal information processing unit 130 can extract the temporal information 260, which is a predictive value for the separation of the foreground and the background, using the feature vector extracted from the feature patch 240 in units of pixels.

도 5는 도 4에 도시된 순환신경망의 내부 구조를 설명하기 위한 도면이다.5 is a diagram for explaining the internal structure of the circular neural network shown in FIG.

도 5에 도시된 바와 같이, RNN(250)은 일련의 순차적인 입력에 대해서 내부의 루프를 이용하여 이전 입력에 의한 정보를 피드백받아 처리할 수 있도록 구성된 신경망이다. 따라서, RNN(250)은 순차적인 입력에 대해 내부의 상태 정보를 사용하여 각 입력마다 다른 동작을 수행할 수 있고, 학습을 통해서 셀 상태 정보를 갱신하는 행렬을 학습할 수 있다.As shown in FIG. 5, the RNN 250 is a neural network configured to feed back information on previous inputs by using an internal loop for a series of sequential inputs. Accordingly, the RNN 250 can perform a different operation for each input by using the internal state information for the sequential input, and learn the matrix for updating the cell state information through learning.

RNN(250)은 장단기 메모리(Long Short Term Memory, 이하 'LSTM'라 칭하기로 함)로 구현될 수 있다. 이러한, LSTM 방식의 RNN(250)은 셀 상태(cell state) 연산 라인(251), 포겟 게이트 레이어(Forget Gate Layer)(252), 입력 게이트 레이어(Input Gate layer)(253), 업데이트 셀 상태(Update Cell State)(254), 및 출력 게이트 레이어(Output Gate Layer)(255)를 포함할 수 있다. 여기서, RNN(250-2)를 기준으로 설명하기로 하며, RNN(250-1)과 RNN(250-3)는 RNN(250-2)와 유사한 구조를 가질 수 있으며, 이전과 이후의 동작을 각각 나타내기 위해 도시된다.The RNN 250 may be implemented as a short-term memory (LSTM). The LSTM RNN 250 includes a cell state operation line 251, a Forget Gate Layer 252, an input gate layer 253, An Update Cell State) 254, and an Output Gate Layer 255. The RNN 250-1 and the RNN 250-3 may have a similar structure to the RNN 250-2, and may perform operations before and after the RNN 250-2. Respectively.

셀 상태 라인(251)은 곱셈 연산과 덧셈 연산을 포함하고 있으며, 이전에 출력된 내부 상태(즉, 셀 상태)인 C_t- ₁를 전달받아 현재 셀 상태인 C_t를 출력할 수 있다. 셀 상태(251)는 포겟 게이트 레이어(252), 입력 게이트 레이어(253), 및 출력 게이트 레이어(255)를 이용하여 시각 정보(즉, 특징 패치에 포함된 시각 정보)의 반영 여부를 결정해 줄 수 있다.Cell state line 251, it contains a multiply operation and add operations, receives the C _t- ₁ was previously output to the internal state (that is, the cell state) may output a C _t is the current cell state. The cell state 251 determines whether or not the time information (i.e., the time information included in the feature patch) is reflected using the getgate gate layer 252, the input gate layer 253, and the output gate layer 255 .

포겟 게이트 레이어(252)는 시그모이드(

) 연산을 이용하여, h_t-1과 x_t를 입력으로 수신한다. 여기서, x_t는 현재 프레임의 특징 패치(240)이다. 따라서, x_t _-1은 이전 프레임의 특징 패치이고, x_t ₊₁은 다음 프레임의 특징 패치이다. 또한, h_t- ₁는 히든 스테이트로서, 이전 RNN(250-1)의 출력값을 나타낸다. 포겟 게이트 레이어(252)는 시그모이드 연산을 사용하여 0에서 1사이의 값을 출력(0<

<1)할 수 있으며, 시그모이드 연산의 출력값은 1에 가까울수록 정보의 반영을 많이 하고, 0에 가까울수록 정보의 반영을 적게하는 것을 의미한다.The getgate gate layer 252 includes a sigmoid

) Operation to receive h _t-1 and x _t as inputs. Here, x _t is the feature patch 240 of the current frame. Thus, x _t _-1 is the feature patch of the previous frame, and x _t ₊₁ is the feature patch of the next frame. Also, _ht- ₁ is a hidden state and represents the output value of the previous RNN 250-1. The getgate gate layer 252 outputs a value between 0 and 1 using a sigmoid operation (0 <

<1), and the output value of the sigmoid operation is closer to 1 to reflect more information, and closer to 0 means less information to be reflected.

포겟 게이트 레이어(252)의 동작은 하기의 수학식 1과 같이 나타낼 수 있다.The operation of the getgate gate layer 252 can be expressed by Equation (1) below.

여기서, f_t는 포겟 게이트 레이어(252)의 출력을 나타낸다. W_f는 포겟 게이트의 가중치이고, b_f는 포겟 게이트의 바이어스 값을 나타내고, W_f와 b_f는 RNN의 학습에 의해 설정되는 값이다.Where f _t represents the output of the getgate gate layer 252. W _f is the weight of the foregget gate, b _f is the bias value of the getgate gate, and W _f and b _f are the values set by the learning of the RNN.

다음으로, 입력 게이트 레이어(253)는 새로운 정보가 셀 상태에 저장이 되는지의 여부를 결정한다. 입력 게이트 레이어(253)는 시그모이드(

) 연산과 쌍곡 탄젠트(tanh) 연산을 포함할 수 있다. 여기서, 시그모이드 연산은 어떤 값을 업데이트할지를 결정하며, 쌍곡 탄젠트 연산은 셀 상태에 더해질 후보값들의 벡터를 만든다. 이때, 입력 게이트 레이어(253)에서 수행되는 시그모이드 연산과 쌍곡 탄젠트 연산 각각은 하기의 수학식 2와 같이 나타낼 수 있다.Next, the input gate layer 253 determines whether the new information is stored in the cell state. The input gate layer 253 includes a sigmoid

) Operation and a hyperbolic tangent (tanh) operation. Here, the sigmoid operation determines which value to update and the hyperbolic tangent operation produces a vector of candidate values to be added to the cell state. At this time, the sigmoid operation and the hyperbolic tangent operation performed in the input gate layer 253 can be expressed by the following Equation (2).

여기서, i_t는 입력 게이트 레이어(253) 내 시그모이드 연산의 출력이고, c_t는 입력 게이트 레이어(253) 내 쌍곡 탄젠트 연산의 출력이다. 여기서도, W_i는 시그모이드 연산의 가중치이고, W_c는 쌍곡 탄젠트 연산의 가중치이다. 또한, b_i는 시그모이드 연산의 바이어스값이고, b_c는 쌍곡 탄젠트 연산의 바이어스 값이다. W_i, W_c, b_i, 및 b_c는 RNN의 학습에 의해 설정되는 값이다.Where i _t is the output of the sigmoid operation in the input gate layer 253 and c _t is the output of the hyperbolic tangent operation in the input gate layer 253. Here too, W _i is the weight of the sigmoid operation and W _c is the weight of the hyperbolic tangent operation. Also, b _i is the bias value of the sigmoid operation and b _c is the bias value of the hyperbolic tangent operation. W _i , W _c , b _i , and b _c are values set by the learning of the RNN.

업데이트 셀 상태(254)는 포겟 게이트 레이어(252)와 입력 게이트 레이어(253)에서 출력된 값들을 셀 상태 C_t-1과 C_t로 업데이트 한다. 셀 상태의 업데이트는 하기의 수학식 3과 같이 나타낼 수 있다.The update cell state 254 updates the values output from the get gate layer 252 and the input gate layer 253 to cell states C _t-1 and C _t . The update of the cell state can be expressed by the following equation (3).

여기서, C_t는 셀 상태를 나타내며, C_t- ₁는 이전 셀 상태를 나타낸다. f_t는 포겟 게이트 레이어(252)의 출력이고, i_t는 입력 게이트 레이어(253) 내 시그모이드 연산의 출력이고, c_t는 입력 게이트 레이어(253) 내 쌍곡 탄젠트 연산의 출력이다.Where C _t represents the cell state, and C _t- ₁ represents the previous cell state. f _t is the output of the getgate layer 252, i _t is the output of the sigmoid operation in the input gate layer 253 and c _t is the output of the hyperbolic tangent operation in the input gate layer 253.

다음으로, 출력 게이트 레이어(255)는 출력값(h_t)을 결정한다. h_t는 셀 상태(C_t)를 필터링한 값이다. 출력 게이트 레이어(255)는 시그모이드 연산과 쌍곡 탄젠트 연산을 포함할 수 있다. 여기서, 시그모이드 연산은 셀 상태의 부분을 결정하기 위한 셀 상태 결정 정보(o_t)를 계산할 수 있다. 쌍곡 탄젠트 연산은 업데이트된 셀 상태(C_t)를 쌍곡 탄젠트 연산하여 -1 내지 1 사이로 출력된 셀 상태 결정 정보(o_t)를 곱한다. 출력 게이트 레이어(255)에서 수행되는 시그모이드 연산과 쌍곡 탄젠트 연산 각각은 하기의 수학식 4와 같이 나타낼 수 있다.Next, the output gate layer 255 determines the output value h _t . h _t is the filtered value of the cell state (C _t ). The output gate layer 255 may include a sigmoid operation and a hyperbolic tangent operation. Here, the sigmoid operation can calculate cell state determination information (o _t ) for determining a part of the cell state. The hyperbolic tangent operation multiplies the updated cell state (C _t ) by the hyperbolic tangent operation and the cell state determination information (o _t ) output from -1 to 1. The sigmoid operation and the hyperbolic tangent operation performed in the output gate layer 255 can be expressed by Equation (4) below.

O_t는 출력 게이트 레이어(255) 내 시그모이드 연산의 출력이고, h_t는 출력 게이트 레이어(255) 내 쌍곡 탄젠트 연산의 출력이다. 여기서도, W_o는 시그모이드 연산의 가중치이고, b_o는 시그모이드 연산의 바이어스 값이다. W_o와 b_o는 RNN의 학습에 의해 설정되는 값이다.O _t is the output of the sigmoid operation in the output gate layer 255 and h _t is the output of the hyperbolic tangent operation in the output gate layer 255. Here too, W _o is the weight of the sigmoid operation and b _o is the bias value of the sigmoid operation. W _o and b _o are values set by the learning of the RNN.

도 6은 일 실시예에 따른 픽셀별로 확률 결과를 출력하는 과정을 설명하기 위한 도면이다.6 is a diagram illustrating a process of outputting a probability result on a pixel-by-pixel basis according to an exemplary embodiment.

도 6에 도시된 바와 같이, 확률 분류부(140)는 시간적 정보(260)에 대해 전경과 배경에 대한 확률(280)을 분류할 수 있으며, 픽셀 단위로 연산을 수행하기 대문에 원본과 동일한 해상도의 결과를 획득할 수 있다. 이를 위해, 확률 분류부(140)는 FCNN(270)을 사용하여 확률을 분류할 수 있다.As shown in FIG. 6, the probability classifier 140 can classify the probability 280 for foreground and background with respect to the temporal information 260, and performs a computation on a pixel-by-pixel basis. Can be obtained. For this, the probability classifier 140 may classify probabilities using the FCNN 270.

확률 분류부(140)는 RNN(250)에서 픽셀마다 사용한 파라미터를 FCNN(270)에서 동일하게 적용하므로 전경과 배경을 분리하기 위한 파라미터의 사용을 최소화할 수 있다. 예를 들어, FCNN(270)을 모든 픽셀에 대해 적용하는 경우, 해상도에 비례하는 파라미터를 필요로 하지만, 확률 분류부(140)는 RNN(250)에서 픽셀마다 사용한 파라미터를 FCNN(270)에서 동일하게 적용하므로 해상도와 상관없이 동일한 파라미터를 사용할 수 있다.The probability classifier 140 applies the same parameters to the FCNN 270 for each pixel in the RNN 250 so that the use of parameters for separating foreground and background can be minimized. For example, when the FCNN 270 is applied to all the pixels, the probability classification unit 140 needs parameters proportional to the resolution, but the parameters used for each pixel in the RNN 250 are the same in the FCNN 270 The same parameters can be used regardless of the resolution.

상술한 바와 같이, 확률 분류부(140)는 연속적인 입력에 대해 연속적인 결과를 출력하는 RNN(250)을 이용하기 때문에 동영상의 시각적인 정보 이외에도 프레임의 연속적인 배열로서 표현되는 시간적인 정보를 해석할 수 있다.As described above, since the probability classifier 140 uses the RNN 250 that outputs successive results for successive inputs, it interprets the temporal information represented as a continuous array of frames in addition to the visual information of the moving image can do.

도 7은 일 실시예에 따른 영상 처리 방법을 도시한 순서도이다.7 is a flowchart illustrating an image processing method according to an embodiment.

도 7에 도시된 바와 같이, 영상 처리 장치(100)는 입력 프레임으로부터 특징맵으로 구성될 수 있는 시각 정보를 추출할 수 있다(S310). 영상 처리 장치(100)는 시각 정보를 특징맵의 형태로 추출할 수 있으며, 입력 프레임에 콘볼루션 필터를 이용하여 특징맵을 추출할 수 있다. 여기서, 특징맵은 CNN을 이용하여 구현된 콘볼루션 필터를 이용함에 따라 CNN 특징으로 정의될 수도 있다.As shown in FIG. 7, the image processing apparatus 100 can extract the time information that can be composed of the feature map from the input frame (S310). The image processing apparatus 100 can extract the time information in the form of a feature map and extract the feature map using a convolution filter in the input frame. Here, the feature map may be defined as a CNN feature by using a convolution filter implemented using CNN.

영상 처리 장치(100)는 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득할 수 있다(S320). 이때, 영상 처리 장치(100)는 특징맵이 복수의 레이어로 구성되는 경우, 복수의 레이어 각각에 대한 시각 정보를 모두 포함한 특징 패치로 분할할 수 있다.The image processing apparatus 100 can obtain a feature patch by dividing the feature map on a pixel-by-pixel basis (S320). At this time, when the feature map is composed of a plurality of layers, the image processing apparatus 100 can be divided into feature patches including all of the time information for each of the plurality of layers.

영상 처리 장치(100)는 특징 패치 각각에 대해 RNN을 이용하여 시간적 정보를 추출할 수 있다(S330). 영상 처리 장치(100)는 외부 메모리(예를 들어, 도 6의 RNN의 출력값인 h_t-1, h_t, h_t+1 등으로 나타낼 수 있음)와 내부 메모리(예를 들어, 도 6의 셀 상태인 C_t-1, C_t, C_t+1등으로 나타낼 수 있음)를 사용하여 시간적 흐름에 대응하여 메모리를 갱신하면서, 외부 메모리와 내부 메모리에 저장된 정보를 포함하는 특징 벡터(W)를 추출할 수 있다. 이와 같이, 영상 처리 장치(100)는 시간적 정보로서 특징 벡터(예를 들어, 도 6의 가중치(W_f, W_i, W_c, W_o)와 바이어스(b_f, b_i, b_c, b_o))를 추출할 수 있다.The image processing apparatus 100 may extract the temporal information using the RNN for each feature patch (S330). The image processing apparatus 100 is, for external memory (e.g., can be represented by h _t-1 is the output value of the RNN in Fig. 6, h _t, h _{t + 1,} and so on) and an internal memory (for example, in Fig. 6 (Which may be represented by cell states C _t-1 , C _t , C _{t + 1} , and so forth) to update the memory corresponding to the temporal flow, Can be extracted. In this way, the image processing apparatus 100 is, for feature vectors (such as time information, the weights in Figure 6 (W _f, W _i, W _c, W _o) and a bias (b _f, b _i, b _c, b _o ) can be extracted.

영상 처리 장치(100)는 RNN을 이용하여 시각 정보에만 의존하지 않으며, 시간적 정보를 활용할 수 있는 신경망의 학습이 가능하다. 따라서, 영상 처리 장치(100)는 움직이는 객체에 대한 움직임 정보를 검출할 수 있다. 또한, 영상 처리 장치(100)는 RNN을 이용하여 영상 프레임으로부터 획득된 과거값을 저장하고, 다음값을 예측할 수 있다.The image processing apparatus 100 does not depend only on the time information using the RNN, and can learn the neural network that can utilize the temporal information. Accordingly, the image processing apparatus 100 can detect motion information on a moving object. Also, the image processing apparatus 100 can store the past values obtained from the image frame using the RNN, and predict the next value.

영상 처리 장치(100)는 시간적 정보로부터 픽셀별로 확률 결과를 추출할 수 있다(S340). 영상 처리 장치(100)는 확률 결과를 FCNN을 이용하여 시간적 정보로부터 추출할 수 있다.The image processing apparatus 100 may extract a probability result for each pixel from the temporal information (S340). The image processing apparatus 100 can extract the probability result from the temporal information using the FCNN.

영상 처리 장치(100)는 추출된 시간적 정보를 이용하여 입력 영상 내의 객체의 움직임을 검출할 수 있다(S350).The image processing apparatus 100 can detect the motion of the object in the input image using the extracted temporal information (S350).

영상 처리 장치(100)는 영상 내에서 움직임 검출을 통해 전경과 배경을 분리하고 종료할 수 있다(S360).The image processing apparatus 100 may separate the foreground and the background through motion detection in the image and terminate the process (S360).

한편, 도 7에서 도시된 영상 처리 장치(100)의 동작은 CNN, RNN, FCNN을 순차적으로 사용하여 영상을 분리할 수 있으며, 도 7에 도시된 프로세스를 이용하면, 전경과 배경을 분리하기 위한 학습 동작에도 사용할 수 있고, 실제 전경과 배경을 분리하기 위한 실제 영상 분석 동작에도 사용될 수 있다. 여기서, 영상 처리 장치(100)는 학습 동작 또는 실제 영상 분석 시 CNN, RNN, FCNN에서 파라미터들을 학습할 수 있으며, 학습된 파라미터들을 사용하여 파라미터를 업데이트할 수 있다.Meanwhile, the operation of the image processing apparatus 100 shown in FIG. 7 can separate images by sequentially using CNN, RNN, and FCNN. When the process shown in FIG. 7 is used, It can also be used for learning motion, and can also be used for actual image analysis operations to separate the actual foreground and background. Here, the image processing apparatus 100 can learn parameters in CNN, RNN, and FCNN during a learning operation or an actual image analysis, and can update parameters using the learned parameters.

본 실시예에 따른 영상 처리 장치(100)는 동영상에 대한 자동 분석 기법 중의 하나인 배경차분(예를 들어, 움직임 검출)을 할 수 있다. 이를 통해, 영상 처리 장치(100)는 동영상 내에서 움직이는 객체를 검출하여, 분석을 위한 영역을 움직이는 객체가 존재하는 영역으로 축소시킬 수 있어, 영상 분석 속도를 향상시킬 수 있다. 또한, 영상 처리 장치(100)는 동영상 내에서 움직임 유무, 방향, 종류에 대한 정보를 획득할 수 있다. 그러므로, 영상 처리 장치(100)는 보안, 경비와 같은 영상을 이용한 감시 분야에 활용될 수 있으며, 동영상 합성, 물체 인식, 영상 검색 등과 같은 기능이 활용될 수 있는 다른 다양한 분야에도 적용될 수 있다.The image processing apparatus 100 according to the present embodiment can perform a background difference (e.g., motion detection), which is one of automatic analysis techniques for moving images. Accordingly, the image processing apparatus 100 can detect a moving object in the moving image, reduce the area for analysis to a region where the moving object exists, and improve the image analysis speed. Also, the image processing apparatus 100 can acquire information on the presence / absence, direction, and type of motion in the moving image. Therefore, the image processing apparatus 100 can be applied to a surveillance field using images such as security and expense, and can be applied to various other fields where functions such as moving image synthesis, object recognition, and image retrieval can be utilized.

본 실시예에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC 와 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램특허 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다.The term " part " used in the present embodiment means a hardware component such as software or a field programmable gate array (FPGA) or an ASIC, and 'part' performs certain roles. However, 'part' is not meant to be limited to software or hardware. &Quot; to " may be configured to reside on an addressable storage medium and may be configured to play one or more processors. Thus, by way of example, 'parts' may refer to components such as software components, object-oriented software components, class components and task components, and processes, functions, , Subroutines, segments of program patent code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로부터 분리될 수 있다.The functions provided within the components and components may be combined with a smaller number of components and components or separated from additional components and components.

뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU 들을 재생시키도록 구현될 수도 있다.In addition, the components and components may be implemented to play back one or more CPUs in a device or a secure multimedia card.

또한 본 발명의 일실시예에 따르는 영상 처리 방법은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 컴퓨터 프로그램(또는 컴퓨터 프로그램 제품)으로 구현될 수도 있다. 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있으며, 특히, R 언어, Python 언어, Ruby 언어, Scheme 언어 등으로 구현될 수 있다. 또한 컴퓨터 프로그램은 유형의 컴퓨터 판독가능 기록매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)에 기록될 수 있다.Also, the image processing method according to an embodiment of the present invention may be implemented as a computer program (or a computer program product) including instructions executable by a computer. A computer program includes programmable machine instructions that are processed by a processor and may be implemented in a high-level programming language, an object-oriented programming language, an assembly language, a machine language, or the like , In particular, R language, Python language, Ruby language, Scheme language, and the like. The computer program may also be recorded on a computer readable recording medium of a type (e.g., memory, hard disk, magnetic / optical medium or solid-state drive).

따라서 본 발명의 일실시예에 따르는 영상 처리 방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 머더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다.Therefore, the image processing method according to the embodiment of the present invention can be realized by the computer program as described above being executed by the computing device. The computing device may include a processor, a memory, a storage device, a high-speed interface connected to the memory and a high-speed expansion port, and a low-speed interface connected to the low-speed bus and the storage device. Each of these components is connected to each other using a variety of buses and can be mounted on a common motherboard or mounted in any other suitable manner.

여기서 프로세서는 컴퓨팅 장치 내에서 명령어를 처리할 수 있는데, 이런 명령어로는, 예컨대 고속 인터페이스에 접속된 디스플레이처럼 외부 입력, 출력 장치상에 GUI(Graphic User Interface)를 제공하기 위한 그래픽 정보를 표시하기 위해 메모리나 저장 장치에 저장된 명령어를 들 수 있다. 다른 실시예로서, 다수의 프로세서 및(또는) 다수의 버스가 적절히 다수의 메모리 및 메모리 형태와 함께 이용될 수 있다. 또한 프로세서는 독립적인 다수의 아날로그 및(또는) 디지털 프로세서를 포함하는 칩들이 이루는 칩셋으로 구현될 수 있다.Where the processor may process instructions within the computing device, such as to display graphical information to provide a graphical user interface (GUI) on an external input, output device, such as a display connected to a high speed interface And commands stored in memory or storage devices. As another example, multiple processors and / or multiple busses may be used with multiple memory and memory types as appropriate. The processor may also be implemented as a chipset comprised of chips comprising multiple independent analog and / or digital processors.

또한 메모리는 컴퓨팅 장치 내에서 정보를 저장한다. 일례로, 메모리는 휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 다른 예로, 메모리는 비휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 또한 메모리는 예컨대, 자기 혹은 광 디스크와 같이 다른 형태의 컴퓨터 판독 가능한 매체일 수도 있다.The memory also stores information within the computing device. In one example, the memory may comprise volatile memory units or a collection thereof. In another example, the memory may be comprised of non-volatile memory units or a collection thereof. The memory may also be another type of computer readable medium such as, for example, a magnetic or optical disk.

그리고 저장장치는 컴퓨팅 장치에게 대용량의 저장공간을 제공할 수 있다. 저장 장치는 컴퓨터 판독 가능한 매체이거나 이런 매체를 포함하는 구성일 수 있으며, 예를 들어 SAN(Storage Area Network) 내의 장치들이나 다른 구성도 포함할 수 있고, 플로피 디스크 장치, 하드 디스크 장치, 광 디스크 장치, 혹은 테이프 장치, 플래시 메모리, 그와 유사한 다른 반도체 메모리 장치 혹은 장치 어레이일 수 있다.And the storage device can provide a large amount of storage space to the computing device. The storage device may be a computer readable medium or a configuration including such a medium and may include, for example, devices in a SAN (Storage Area Network) or other configurations, and may be a floppy disk device, a hard disk device, Or a tape device, flash memory, or other similar semiconductor memory device or device array.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

100: 영상 처리 장치 110: 특징 추출부
120: 패치 분할부 130: 시간적 정보 처리부
140: 확률 분류부100: Image processing apparatus 110: Feature extraction unit
120: patch division unit 130: temporal information processing unit
140: probability classification section

Claims

A feature extraction unit for extracting time information by applying a convolution filter to an input frame;
A patch division unit for dividing a feature map including the time information by pixels to acquire a feature patch;
A temporal information processing unit for extracting temporal information on the feature patches;
A probability classifier for extracting one probability result for each pixel from the temporal information; And
And a background separator for separating the foreground and background of the input frame based on the extracted probability result.

The method according to claim 1,
Wherein the patch dividing unit comprises:
Wherein when the feature map includes a plurality of layers, the feature patch includes pixel information of each of the plurality of layers.

The method according to claim 1,
Wherein the temporal information processing unit comprises:
And extracting the temporal information using a circular neural network (RNN).

The method of claim 3,
Wherein the temporal information processing unit comprises:
Wherein the temporal information is information including information on the flow of time information as time elapses, and extracts the temporal information as a predictive value for separating the foreground and the background.

The method according to claim 1,
The convolution filter comprises:
An image processing apparatus that is implemented using a convolutional neural network (CNN).

The method according to claim 1,
The probability classifying unit may classify,
And extracts the probability result using the total connected neural network (FCNN) from the temporal information.

The method according to claim 1,
Wherein the background separator comprises:
Detecting an object motion in the input frame, and separating a foreground and a background through the motion detection.

An image processing method performed by an image processing apparatus,
Extracting time information by applying a convolution filter to an input frame;
Obtaining a feature patch by dividing a feature map including the time information on a pixel basis;
Extracting temporal information for the feature patch;
Extracting one probability result for each pixel from the temporal information; And
And separating the foreground and background of the input frame based on the extracted probability result.

9. The method of claim 8,
Wherein acquiring the feature patch comprises:
And when the feature map includes a plurality of layers, dividing the feature patch so as to include all pixel information of each of the plurality of layers.

9. The method of claim 8,
The step of extracting the temporal information comprises:
And extracting the temporal information using a circular neural network (RNN).

11. The method of claim 10,
Wherein the temporal information comprises:
Wherein the information includes information on the flow of time information according to the flow of time, and is a predictive value for separating foreground and background.

9. The method of claim 8,
Wherein the extracting of the feature comprises:
And extracting features using the convolution filter implemented using a convolutional neural network (CNN).

9. The method of claim 8,
The step of extracting the probability result includes:
And extracting the probability result from the temporal information using a total connected neural network (FCNN).

9. The method of claim 8,
The step of separating the foreground and background comprises:
Detecting movement of an object within the input frame,
And separating foreground and background through the motion detection.