KR102070956B1

KR102070956B1 - Apparatus and method for processing image

Info

Publication number: KR102070956B1
Application number: KR1020170166214A
Authority: KR
Inventors: 조남익; 추성권; 이상훈
Original assignee: 서울대학교산학협력단
Priority date: 2016-12-20
Filing date: 2017-12-05
Publication date: 2020-01-29
Also published as: KR20180071947A

Abstract

본 명세서는 영상 처리 장치 및 방법에 관한 것이다. 본 실시예에 따른 영상 처리 장치는, 입력 프레임에 콘볼루션 필터를 적용하여 시각 정보를 추출하는 특징 추출부, 시각 정보를 포함한 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득하는 패치 분할부, 특징 패치에 대해 시간적 정보를 추출하는 시간적 정보 처리부, 시간적 정보로부터 픽셀 별로 하나의 확률 결과를 추출하는 확률 분류부, 및 추출된 확률 결과에 기초하여 입력 프레임의 전경과 배경을 분리하는 배경 분리부를 포함한다.The present specification relates to an image processing apparatus and a method. The image processing apparatus according to the present embodiment includes a feature extractor which extracts visual information by applying a convolution filter to an input frame, a patch splitter which obtains a feature patch by dividing a feature map including the visual information by pixel, and a feature A temporal information processor for extracting temporal information on the patch, a probability classifier for extracting one probability result per pixel from the temporal information, and a background separator for separating the foreground and background of the input frame based on the extracted probability result; .

Description

Image processing apparatus and method {APPARATUS AND METHOD FOR PROCESSING IMAGE}

본 명세서에서 개시되는 실시예들은 영상 처리 장치 및 방법에 관한 것이다. 보다 상세하게는, 인공 신경망을 이용하여 동영상의 전경과 배경을 분리하는 영상 처리 장치 및 방법에 관한 것이다.Embodiments disclosed herein relate to an image processing apparatus and a method. More specifically, the present invention relates to an image processing apparatus and method for separating a foreground and a background of a video using an artificial neural network.

기존의 영상 처리 방법은 전경과 배경을 분리하기 위해 입력된 영상에 대해 이전 영상과 현재 영상을 단순히 비교한 비교값을 사용한다. 관련하여 선행기술문헌인 한국특허 제10-2013-0063963호에서는 현재의 영상과 이전의 영상을 비교하여 영상 간 변화값을 누적하여 누적된 영상 변화값을 생성하고, 누적된 영상 변화값을 이용하여 객체 이동에 의한 영상의 변화값을 추정하는 영상 처리 방법을 기재하고 있다.The existing image processing method uses a comparison value of the input image to compare the previous image with the current image to separate the foreground and the background. In the related art, Korean Patent No. 10-2013-0063963 discloses a cumulative image change value by accumulating change values between images by comparing a current image with a previous image, and using the accumulated image change value. An image processing method for estimating a change value of an image due to object movement is described.

이를 포함한 기존의 영상 처리 방법들은 동영상의 앞부분을 배경모델의 학습구간으로 설정하여 전경분리 없이 모든 영역을 배경모델로 학습하고, 학습 구간 이후에서 입력 영상에 대해 전경과 배경을 분리, 및 배경 모델 갱신 등을 수행한다. 이때, 배경에 대한 학습, 분리, 및 갱신에 있어서 노이즈를 감소하고 성능을 높이기 위해서 다양한 기법을 활용하고 있으며, 노이즈 감소를 위해 유저가 직접 설계하며, 영상 특성마다 다른 파라미터를 사용한다. 이로 인해, 영상 처리를 위한 모든 부분을 유저가 직접 설계해야 하며, 복잡한 메모리 갱신 동작을 수행하고 있으며, CPU 연산 위주이기 때문에 속도가 제한될 수 있다.Conventional image processing methods including this set the front part of the video as the learning section of the background model to learn all the areas as the background model without foreground separation, and separate the foreground and background for the input image after the learning section, and update the background model. And so on. In this case, various techniques are used to reduce noise and improve performance in learning, separating, and updating the background, and the user directly designs for noise reduction and uses different parameters for each image characteristic. As a result, all parts for image processing must be designed by the user, complicated memory update operations are performed, and the speed can be limited because the CPU operation is focused on.

이와 같이, 기존의 영상 처리 방법들은 비학습 기반의 영상 처리 방법을 이용함에 따라 유저가 의해 직접 설계해야 하므로 구현이 용이하지 않고, 하드웨어 자원 사용의 비효율로 인해 성능의 한계가 존재하는 문제점이 있었다.As such, existing image processing methods have a problem that performance is limited due to inefficiency of hardware resource use because it is not easy to implement because the user needs to design the image by using a non-learning based image processing method.

따라서, 근래에는 이러한 문제점을 해결하기 위한 장치 및 방법이 요구되고 있는 실정이다.Therefore, in recent years, an apparatus and method for solving such a problem are required.

한편, 전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.On the other hand, the background art described above is technical information that the inventors possessed for the derivation of the present invention or acquired in the derivation process of the present invention, and is not necessarily a publicly known technique disclosed to the general public before the present application. .

본 명세서에서 개시되는 실시예들은, 인공 신경망을 이용하여 동영상의 전경과 배경을 분리하는 영상 처리 장치 및 방법을 제시하는 데에 목적이 있다.Embodiments disclosed herein are provided to provide an image processing apparatus and method for separating foreground and background of a video using an artificial neural network.

본 명세서에서 개시되는 실시예들은, 인공 신경망을 이용하여 동영상의 시간적 정보를 추출하고, 추출된 시간적 정보를 활용하여 전경과 배경을 분리하는 영상 처리 장치 및 방법을 제시하는 데에 목적이 있다. Embodiments disclosed herein are intended to provide an image processing apparatus and method for extracting temporal information of a video using an artificial neural network and separating the foreground and the background using the extracted temporal information.

본 명세서에서 개시되는 실시예들은, 인공 신경망을 이용하여 배경에 대한 학습, 분리, 또는 갱신을 유저의 개입없이 자동으로 학습할 수 있는 영상 처리 장치 및 방법을 제시하는 데에 목적이 있다.Embodiments disclosed herein are provided to provide an image processing apparatus and method that can automatically learn, separate, or update a background using an artificial neural network without user intervention.

본 명세서에서 개시되는 실시예들은, 시각 정보뿐만 아니라 시간적 정보를 활용할 수 있도록 인공 신경망을 학습하여 움직임 정보를 검출할 수 있는 영상 처리 장치 및 방법을 제시하는데 목적이 있다.Embodiments disclosed herein are intended to provide an image processing apparatus and method capable of detecting motion information by learning an artificial neural network so as to utilize not only visual information but also temporal information.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 일 실시예에 따르면, 영상 처리 장치는 입력 프레임에 콘볼루션 필터를 적용하여 시각 정보를 추출하는 특징 추출부, 상기 시각 정보를 포함한 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득하는 패치 분할부, 상기 특징 패치에 대해 시간적 정보를 추출하는 시간적 정보 처리부, 상기 시간적 정보로부터 픽셀 별로 하나의 확률 결과를 추출하는 확률 분류부, 및 추출된 확률 결과에 기초하여 상기 입력 프레임의 전경과 배경을 분리하는 배경 분리부를 포함한다.As a technical means for achieving the above technical problem, according to an embodiment, the image processing apparatus is a feature extractor for extracting visual information by applying a convolution filter to the input frame, the feature map including the visual information in units of pixels A patch partition unit for dividing into a feature patch to obtain a feature patch, a temporal information processing unit for extracting temporal information about the feature patch, a probability classifier for extracting one probability result for each pixel from the temporal information, and based on the extracted probability result And a background separator for separating the foreground and the background of the input frame.

또 다른 실시예에 따르면, 영상 처리 장치에 의해 수행되는 영상 처리 방법은, 입력 프레임에 콘볼루션 필터를 적용하여 시각 정보를 추출하는 단계, 상기 시각 정보를 포함한 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득하는 단계, 상기 특징 패치에 대해 시간적 정보를 추출하는 단계, 상기 시간적 정보로부터 픽셀 별로 하나의 확률 결과를 추출하는 단계, 및 추출된 확률 결과에 기초하여 상기 입력 프레임의 전경과 배경을 분리하는 단계를 포함한다.According to another embodiment, an image processing method performed by an image processing apparatus may include extracting visual information by applying a convolution filter to an input frame, dividing a feature map including the visual information into pixel units and patching the feature Acquiring a symbol, extracting temporal information on the feature patch, extracting one probability result for each pixel from the temporal information, and separating the foreground and the background of the input frame based on the extracted probability result. Steps.

전술한 과제 해결 수단 중 어느 하나에 의하면, 인공 신경망을 이용하여 동영상의 전경과 배경을 분리하는 영상 처리 장치 및 방법을 제시할 수 있다.According to any one of the aforementioned problem solving means, an image processing apparatus and method for separating the foreground and the background of a video using an artificial neural network can be proposed.

또한, 전술한 과제 해결 수단 중 어느 하나에 의하면, 인공 신경망을 이용하여 동영상의 시간적 정보를 추출하고, 추출된 시간적 정보를 활용하여 전경과 배경을 분리하는 영상 처리 장치 및 방법을 제시할 수 있다.In addition, according to any one of the aforementioned problem solving means, it is possible to provide an image processing apparatus and method for extracting the temporal information of the video using an artificial neural network, and separating the foreground and the background by using the extracted temporal information.

또한, 전술한 과제 해결 수단 중 어느 하나에 의하면, 인공 신경망을 이용하여 배경에 대한 학습, 분리, 또는 갱신을 유저의 개입없이 자동으로 학습할 수 있는 영상 처리 장치 및 방법을 제시할 수 있다.In addition, according to any one of the above-described problem solving means, it is possible to provide an image processing apparatus and method that can automatically learn the learning, separation, or update of the background without user intervention using an artificial neural network.

또한, 전술한 과제 해결 수단 중 어느 하나에 의하면, 시각 정보뿐만 아니라 시간적 정보를 활용할 수 있도록 인공 신경망을 학습하여 움직임 정보를 검출할 수 있는 영상 처리 장치 및 방법을 제시할 수 있다.In addition, according to any one of the above-described problem solving means, it is possible to provide an image processing apparatus and method that can detect the motion information by learning the artificial neural network to utilize not only visual information but also temporal information.

개시되는 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.Effects obtained in the disclosed embodiments are not limited to the above-mentioned effects, and other effects not mentioned above are clearly understood by those skilled in the art from the following description. Could be.

도 1은 일 실시예에 따른 영상 처리 장치를 도시한 블록도이다.
도 2는 일 실시예에 따른 입력 프레임으로부터 시각 정보를 추출하는 과정을 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 특징맵으로부터 특징 패치를 추출하는 과정을 설명하기 위한 도면이다.
도 4는 일 실시에에 따른 시간적 정보를 처리하는 과정을 설명하기 위한 도면이다.
도 5는 도 4에 도시된 순환신경망의 내부 구조를 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 픽셀별로 확률 결과를 출력하는 과정을 설명하기 위한 도면이다.
도 7은 일 실시예에 따른 영상 처리 방법을 도시한 순서도이다.1 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment.
2 is a diagram for describing a process of extracting visual information from an input frame, according to an exemplary embodiment.
3 is a diagram for describing a process of extracting a feature patch from a feature map, according to an exemplary embodiment.
4 is a diagram for describing a process of processing temporal information, according to an exemplary embodiment.
5 is a view for explaining the internal structure of the circulatory neural network shown in FIG.
6 is a diagram for describing a process of outputting a probability result for each pixel, according to an exemplary embodiment.
7 is a flowchart illustrating an image processing method, according to an exemplary embodiment.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, this means that it may further include other components, except to exclude other components unless otherwise stated.

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 영상 처리 장치를 도시한 블록도이다.1 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment.

도 1에 도시된 바와 같이, 영상 처리 장치(100)는 특징 추출부(110), 패치 분할부(120), 시간적 정보 처리부(130), 확률 분류부(140), 및 배경 분리부(150)를 포함한다.As illustrated in FIG. 1, the image processing apparatus 100 may include a feature extractor 110, a patch divider 120, a temporal information processor 130, a probability classifier 140, and a background separator 150. It includes.

특징 추출부(110)는 입력 프레임으로부터 특징맵(feature map)으로 구성될 수 있는 시각 정보를 추출할 수 있다. 여기서, 입력 프레임은 동영상을 구성하는 복수의 프레임 중 하나일 수 있다.The feature extractor 110 may extract visual information that may be configured as a feature map from an input frame. Here, the input frame may be one of a plurality of frames constituting the video.

특징 추출부(110)에서 추출된 특징맵은 입력 프레임 내 적어도 일부의 영역에 대한 시각 정보를 포함한다. 특징 추출부(110)는 특징맵의 획득을 위해 콘볼루션 필터를 이용할 수 있다. 여기서, 콘볼루션 필터는 콘볼루션신경망(Convolution Neural Network, 이하 'CNN'이라 칭하기로 함)을 이용하여 구현될 수 있다. 따라서, 특징 추출부(110)는 입력 프레임, 즉 알지비(RGB) 영상으로부터 CNN을 이용하여 배경분리에 필요한 콘볼루션 필터를 학습하고, 학습된 콘볼루션 필터를 사용하여 특징맵을 추출할 수 있다.The feature map extracted by the feature extractor 110 includes visual information about at least a portion of an area in the input frame. The feature extractor 110 may use a convolution filter to obtain a feature map. Here, the convolution filter may be implemented using a convolutional neural network (hereinafter, referred to as 'CNN'). Accordingly, the feature extractor 110 may learn a convolution filter for background separation using a CNN from an input frame, that is, an RGB image, and extract a feature map using the learned convolution filter. .

패치 분할부(120)는 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득할 수 있다. 패치 분할부(120)는 특징맵이 복수의 레이어로 구성되는 경우, 복수의 레이어 각각에 대한 시각 정보를 모두 포함한 특징 패치로 분할할 수 있다. 패치 분할부(120)는 특징맵을 픽셀 단위로 분할하기 때문에 하나의 특징맵으로부터 복수의 특징 패치를 획득할 수 있다.The patch dividing unit 120 may obtain a feature patch by dividing the feature map by a pixel unit. When the feature map includes a plurality of layers, the patch splitter 120 may divide the feature map into feature patches including all visual information about each of the plurality of layers. Since the patch dividing unit 120 divides the feature map in pixel units, the patch divider 120 may obtain a plurality of feature patches from one feature map.

시간적 정보 처리부(130)는 특징 패치 각각에 대해 시간적 정보를 추출할 수 있다. 여기서, 시간적 정보 처리부(130)는 시간적 정보의 추출을 위해 순환신경망(Recurrent Neural Network, 이하 'RNN'이라 칭하기로 함)을 이용할 수 있다. 이때, 시간적 정보 처리부(130)는 이전 특징 패치로부터 추출된 정보를 이용하여 현재 특징 패치와 함께 연산하여 동영상에 포함된 시간적 정보를 추출한다. 또한, 시간적 정보 처리부(130)는 순환신경망의 내부 상태, 즉 셀 상태(cell state)를 갱신할 수 있다.The temporal information processing unit 130 may extract temporal information for each feature patch. Here, the temporal information processing unit 130 may use a recurrent neural network (hereinafter, referred to as 'RNN') for extracting temporal information. In this case, the temporal information processor 130 extracts temporal information included in the video by calculating the current feature patch using information extracted from the previous feature patch. In addition, the temporal information processor 130 may update an internal state of the cyclic neural network, that is, a cell state.

확률 분류부(140)는 시간적 정보로부터 픽셀별로 확률 결과를 추출할 수 있다. 이를 위해, 확률 분류부(140)는 전체연결신경망(FCNN: Fully Connected Neural Network, 이하 'FCNN'라 칭하기로 함)을 사용할 수 있다.The probability classifier 140 may extract a probability result for each pixel from the temporal information. To this end, the probability classifier 140 may use a Fully Connected Neural Network (FCNN).

배경 분리부(150)는 추출된 확률 결과에 기초하여 입력 프레임 내에서 특정 객체에 대한 움직임을 검출할 수 있다. 이를 통해, 배경 분리부(150)는 입력 프레임 내에서 전경과 배경을 분리할 수 있다.The background separator 150 may detect a movement of a specific object in the input frame based on the extracted probability result. In this way, the background separator 150 may separate the foreground and the background in the input frame.

이와 같이, 제안된 영상 처리 장치(100)는 인공신경망을 활용하여 영상을 분리할 수 있다. 상술한 바와 같이, 영상 처리 장치(100)는 동영상 내 특정 객체의 움직임을 검출하기 위해서 CNN, RNN, FCNN의 인공 신경망들이 차례로 결합된 구조를 이용할 수 있다. 특히, 영상 처리 장치(100)는 RNN 구조를 사용하여 동영상의 시각 정보에만 의존하지 않고, 시간적 정보를 활용하는 인공신경망의 학습을 가능하게 한다.As such, the proposed image processing apparatus 100 may separate an image by using an artificial neural network. As described above, the image processing apparatus 100 may use a structure in which artificial neural networks of CNN, RNN, and FCNN are combined in order to detect movement of a specific object in a video. In particular, the image processing apparatus 100 enables the learning of an artificial neural network utilizing temporal information, instead of relying only on visual information of a video using an RNN structure.

또한, 영상 처리 장치(100)는 인공신경망을 사용하여 배경에 대한 학습, 분리, 또는 갱신을 유저의 직접적인 설계 없이도 자동으로 학습이 가능하며, 파라미터 선택 등에 의한 성능 변화가 적다.In addition, the image processing apparatus 100 may automatically learn learning, separation, or updating of the background using an artificial neural network without a user's direct design, and there is little performance change due to parameter selection.

도 2는 일 실시예에 따른 입력 프레임으로부터 시각 정보를 추출하는 과정을 설명하기 위한 도면이다.2 is a diagram for describing a process of extracting visual information from an input frame, according to an exemplary embodiment.

도 2에 도시된 바와 같이, 특징 추출부(110)는 콘볼루션 필터를 사용하여 입력 프레임으로부터 시각 정보를 추출할 수 있으며, 이를 위해 CNN을 사용할 수 있다.As shown in FIG. 2, the feature extractor 110 may extract visual information from an input frame using a convolution filter, and may use a CNN for this purpose.

특징 추출부(110)는 입력 프레임(210)의 일부 영역(211)에 대해 콘볼루션 필터(220)를 사용할 수 있으며, 특징맵(230)으로 구성될 수 있는 시각 정보(또는, CNN 특징)을 추출할 수 있다. 도 2에서는, 특징맵(230)은 네 개의 레이어(231, 232, 233, 234)를 포함하는 것으로 도시하였지만 이에 한정되지 않고, 특징맵(230)은 콘볼루션 필터를 사용하여 하나 이상의 레이어를 갖도록 다양한 개수로 설정될 수 있다.The feature extractor 110 may use the convolution filter 220 for the partial region 211 of the input frame 210, and may generate visual information (or CNN feature) that may be configured as the feature map 230. Can be extracted. In FIG. 2, the feature map 230 is illustrated as including four layers 231, 232, 233, and 234, but is not limited thereto. The feature map 230 may have one or more layers using a convolution filter. It can be set to various numbers.

특징 추출부(110)는 입력 영상(210)으로부터 배경분리에 필요한 콘볼루션 필터를 학습할 수 있다. 여기서, 특징 추출부(110)는 콘볼루션 필터로 필터 뱅크의 역할을 수행하여 복수의 특징맵(230)을 학습 및 생성할 수 있다. 이를 통해, 특징 추출부(110)는 배경 분리에 필요한 정보를 자동적으로 학습하여 추출할 수 있다.The feature extractor 110 may learn a convolution filter required for background separation from the input image 210. Here, the feature extractor 110 may serve as a filter bank as a convolution filter to learn and generate a plurality of feature maps 230. Through this, the feature extractor 110 may automatically learn and extract information necessary for background separation.

한편, 상술한 CNN은 인공신경망의 하나로 영상들에 대한 학습을 통해 영상 처리를 위한 영상의 검색, 분류, 및 이해를 위한 다양한 기능을 지원할 수 있다. CNN은 입력의 모든 영역을 연결하여 학습하는 다른 인공 신경망들과 다른 구조를 갖기 때문에, 하나의 파라미터를 입력 프레임의 여러 영역에서 각각 사용할 수 있다. 이로 인해, CNN은 적은 파라미터로도 영상을 처리할 수 있으며, 영상이 공간적으로 다르더라도 특성은 변하지 않기 때문에 영상 분석 장치(100)에서 시각 정보의 추출에 사용할 수 있다.Meanwhile, the above-described CNN may support various functions for searching, classifying, and understanding images for image processing through learning about images as one of artificial neural networks. Since the CNN has a different structure from other artificial neural networks that connect and learn all areas of the input, one parameter can be used in various areas of the input frame. As a result, the CNN can process the image even with a small number of parameters, and since the characteristic does not change even if the image is spatially different, the CNN can be used to extract visual information from the image analyzing apparatus 100.

도 3은 일 실시예에 따른 특징맵으로부터 특징 패치를 추출하는 과정을 설명하기 위한 도면이다.3 is a diagram for describing a process of extracting a feature patch from a feature map, according to an exemplary embodiment.

도 3에 도시된 바와 같이, 패치 분할부(120)는 특징맵(230)의 패치 분할을 통해, 복수의 레이어들(231, 232, 233, 234) 각각에 대해서 패치 분할을 수행할 수 있다. 이를 통해, 패치 분할부(120)는 특징맵(230)으로부터 픽셀 단위의 특징 패치(240)를 획득할 수 있다. 여기서, 특징 패치(240)는 복수의 레이어들(231, 232, 233, 234) 각각의 픽셀 단위의 시각 정보를 포함할 수 있다. 또한, 특징맵(230)의 레이어의 개수는 L(여기서는 L=4)개로 구분될 수 있으며, L은 1이상의 자연수이다.As illustrated in FIG. 3, the patch division unit 120 may perform patch division on each of the plurality of layers 231, 232, 233, and 234 through patch division of the feature map 230. In this way, the patch division unit 120 may obtain the feature patch 240 in units of pixels from the feature map 230. Here, the feature patch 240 may include visual information in pixel units of each of the plurality of layers 231, 232, 233, and 234. In addition, the number of layers of the feature map 230 may be divided into L (here, L = 4), where L is a natural number of 1 or more.

패치 분할부(120)는 특징맵(230)의 분할에 의해 픽셀 단위의 특징 패치들을 복수개 생성할 수 있다. 또한, 패치 분할부(120)는 영상 처리 속도를 고려하여 간격(stride)을 두고 분할할 수도 있다.The patch divider 120 may generate a plurality of feature patches in units of pixels by dividing the feature map 230. In addition, the patch splitter 120 may divide the data at a predetermined interval in consideration of the image processing speed.

한편, 패치 분할부(120)에서 분할되는 특징맵(230)의 레이어의 개수(L)는 증가할수록 전경과 배경을 분리하기 위한 시각 정보를 더욱 많이 포함할 수 있다. 이로 인해, 특징맵(230)에서 레이어의 개수가 증가하면, 전경과 배경의 분리에 따른 성능이 향상될 수 있다.Meanwhile, as the number L of layers of the feature map 230 divided by the patch dividing unit 120 increases, the visual information for separating the foreground and the background may be further included. For this reason, when the number of layers in the feature map 230 is increased, performance may be improved due to separation of the foreground and the background.

도 4는 일 실시에에 따른 시간적 정보를 처리하는 과정을 설명하기 위한 도면이다.4 is a diagram for describing a process of processing temporal information, according to an exemplary embodiment.

도 4에 도시된 바와 같이, 시간적 정보 처리부(130)는 특징 패치(240)로부터 시간적 특징(260)를 추출할 수 있다. 여기서, 시간적 정보 처리부(130)는 시간의 경과에 따른 시각 정보의 흐름에 대한 정보이며, 전경과 배경을 분리하기 위한 예측값인 시간적 특징(260)을 추출할 수 있다. 여기서, 시간적 정보 처리부(130)는 RNN(250)을 이용하여 구현될 수 있다.As shown in FIG. 4, the temporal information processor 130 may extract the temporal feature 260 from the feature patch 240. Here, the temporal information processor 130 may extract temporal features 260, which are information on the flow of visual information over time, and are prediction values for separating the foreground and the background. Here, the temporal information processor 130 may be implemented using the RNN 250.

또한, 시간적 정보 처리부(130)는 픽셀 단위의 시간적 정보(260)를 출력할 수 있으며, 이전 입력에 의한 셀 상태(cell state)를 다음 입력으로 수신할 수 있다. 이를 통해, 시간적 정보 처리부(130)는 픽셀 단위의 특징 패치(240)로부터 추출된 특징 벡터를 이용하여, 전경과 배경의 분리에 대한 예측값인 시간적 정보(260)를 추출할 수 있다.In addition, the temporal information processor 130 may output temporal information 260 in pixel units, and may receive a cell state of a previous input as a next input. In this way, the temporal information processor 130 may extract temporal information 260, which is a prediction value for separation of the foreground and the background, by using the feature vector extracted from the feature patch 240 in units of pixels.

도 5는 도 4에 도시된 순환신경망의 내부 구조를 설명하기 위한 도면이다.5 is a view for explaining the internal structure of the circulatory neural network shown in FIG.

도 5에 도시된 바와 같이, RNN(250)은 일련의 순차적인 입력에 대해서 내부의 루프를 이용하여 이전 입력에 의한 정보를 피드백받아 처리할 수 있도록 구성된 신경망이다. 따라서, RNN(250)은 순차적인 입력에 대해 내부의 상태 정보를 사용하여 각 입력마다 다른 동작을 수행할 수 있고, 학습을 통해서 셀 상태 정보를 갱신하는 행렬을 학습할 수 있다.As shown in FIG. 5, the RNN 250 is a neural network configured to receive and process information of a previous input by using an internal loop for a series of sequential inputs. Accordingly, the RNN 250 may perform different operations for each input by using internal state information on sequential inputs, and may learn a matrix for updating cell state information through learning.

RNN(250)은 장단기 메모리(Long Short Term Memory, 이하 'LSTM'라 칭하기로 함)로 구현될 수 있다. 이러한, LSTM 방식의 RNN(250)은 셀 상태(cell state) 연산 라인(251), 포겟 게이트 레이어(Forget Gate Layer)(252), 입력 게이트 레이어(Input Gate layer)(253), 업데이트 셀 상태(Update Cell State)(254), 및 출력 게이트 레이어(Output Gate Layer)(255)를 포함할 수 있다. 여기서, RNN(250-2)를 기준으로 설명하기로 하며, RNN(250-1)과 RNN(250-3)는 RNN(250-2)와 유사한 구조를 가질 수 있으며, 이전과 이후의 동작을 각각 나타내기 위해 도시된다.The RNN 250 may be implemented as long and short term memory (hereinafter, referred to as 'LSTM'). The LSN RNN 250 may include a cell state operation line 251, a forget gate layer 252, an input gate layer 253, and an update cell state ( Update cell state 254, and an output gate layer 255. Here, the description will be made based on the RNN 250-2, and the RNN 250-1 and the RNN 250-3 may have a structure similar to that of the RNN 250-2. Are shown for each representation.

셀 상태 라인(251)은 곱셈 연산과 덧셈 연산을 포함하고 있으며, 이전에 출력된 내부 상태(즉, 셀 상태)인 C_t- ₁를 전달받아 현재 셀 상태인 C_t를 출력할 수 있다. 셀 상태(251)는 포겟 게이트 레이어(252), 입력 게이트 레이어(253), 및 출력 게이트 레이어(255)를 이용하여 시각 정보(즉, 특징 패치에 포함된 시각 정보)의 반영 여부를 결정해 줄 수 있다.The cell state line 251 includes a multiplication operation and an addition operation. The cell state line 251 may receive C _t- ₁ , which is a previously output internal state (ie, a cell state), and output a current cell state C _t . The cell state 251 determines whether to reflect visual information (ie, visual information included in the feature patch) by using the forge gate layer 252, the input gate layer 253, and the output gate layer 255. Can be.

포겟 게이트 레이어(252)는 시그모이드(

) 연산을 이용하여, h_t-1과 x_t를 입력으로 수신한다. 여기서, x_t는 현재 프레임의 특징 패치(240)이다. 따라서, x_t _-1은 이전 프레임의 특징 패치이고, x_t ₊₁은 다음 프레임의 특징 패치이다. 또한, h_t- ₁는 히든 스테이트로서, 이전 RNN(250-1)의 출력값을 나타낸다. 포겟 게이트 레이어(252)는 시그모이드 연산을 사용하여 0에서 1사이의 값을 출력(0<

<1)할 수 있으며, 시그모이드 연산의 출력값은 1에 가까울수록 정보의 반영을 많이 하고, 0에 가까울수록 정보의 반영을 적게하는 것을 의미한다.Forget gate layer 252 is a sigmoid (

Operation, we receive h _t-1 and x _t as inputs. Where x _t is the feature patch 240 of the current frame. Thus, x _t _-1 is the feature patch of the previous frame and x _t ₊₁ is the feature patch of the next frame. In addition, h _t- ₁ is a hidden state, and represents the output value of the previous RNN 250-1. Forget gate layer 252 outputs a value between 0 and 1 using sigmoid operation (0 <

<1), the output value of the sigmoid operation means that the information is reflected more as the value is closer to 1, and the reflection of information is less as the value is closer to 0.

포겟 게이트 레이어(252)의 동작은 하기의 수학식 1과 같이 나타낼 수 있다.The operation of the forge gate layer 252 may be represented by Equation 1 below.

여기서, f_t는 포겟 게이트 레이어(252)의 출력을 나타낸다. W_f는 포겟 게이트의 가중치이고, b_f는 포겟 게이트의 바이어스 값을 나타내고, W_f와 b_f는 RNN의 학습에 의해 설정되는 값이다.Here, f _t represents the output of the forge gate layer 252. W _f is a weight of the forge gate, b _f is a bias value of the forge gate, and W _f and b _f are values set by learning of the RNN.

다음으로, 입력 게이트 레이어(253)는 새로운 정보가 셀 상태에 저장이 되는지의 여부를 결정한다. 입력 게이트 레이어(253)는 시그모이드(

) 연산과 쌍곡 탄젠트(tanh) 연산을 포함할 수 있다. 여기서, 시그모이드 연산은 어떤 값을 업데이트할지를 결정하며, 쌍곡 탄젠트 연산은 셀 상태에 더해질 후보값들의 벡터를 만든다. 이때, 입력 게이트 레이어(253)에서 수행되는 시그모이드 연산과 쌍곡 탄젠트 연산 각각은 하기의 수학식 2와 같이 나타낼 수 있다.Next, the input gate layer 253 determines whether new information is stored in the cell state. Input gate layer 253 is a sigmoid (

) And hyperbolic tangent (tanh) operation. Here, the sigmoid operation determines which value to update, and the hyperbolic tangent operation produces a vector of candidate values to be added to the cell state. In this case, each of the sigmoid operation and the hyperbolic tangent operation performed in the input gate layer 253 may be represented by Equation 2 below.

여기서, i_t는 입력 게이트 레이어(253) 내 시그모이드 연산의 출력이고, c_t는 입력 게이트 레이어(253) 내 쌍곡 탄젠트 연산의 출력이다. 여기서도, W_i는 시그모이드 연산의 가중치이고, W_c는 쌍곡 탄젠트 연산의 가중치이다. 또한, b_i는 시그모이드 연산의 바이어스값이고, b_c는 쌍곡 탄젠트 연산의 바이어스 값이다. W_i, W_c, b_i, 및 b_c는 RNN의 학습에 의해 설정되는 값이다.Where i _t is the output of the sigmoid operation in the input gate layer 253 and c _t is the output of the hyperbolic tangent operation in the input gate layer 253. Again, W _i is the weight of the sigmoid operation, and W _c is the weight of the hyperbolic tangent operation. In addition, b _i is a bias value of the sigmoid operation, b _c is a bias value of the hyperbolic tangent operation. W _i , W _c , b _i , and b _c are values set by learning of the RNN.

업데이트 셀 상태(254)는 포겟 게이트 레이어(252)와 입력 게이트 레이어(253)에서 출력된 값들을 셀 상태 C_t-1과 C_t로 업데이트 한다. 셀 상태의 업데이트는 하기의 수학식 3과 같이 나타낼 수 있다.The update cell state 254 updates the values output from the forge gate layer 252 and the input gate layer 253 to the cell states C _t-1 and C _t . The update of the cell state may be expressed as in Equation 3 below.

여기서, C_t는 셀 상태를 나타내며, C_t- ₁는 이전 셀 상태를 나타낸다. f_t는 포겟 게이트 레이어(252)의 출력이고, i_t는 입력 게이트 레이어(253) 내 시그모이드 연산의 출력이고, c_t는 입력 게이트 레이어(253) 내 쌍곡 탄젠트 연산의 출력이다.Here, C _t represents the cell state and C _t- ₁ represents the previous cell state. f _t is the output of the forget gate layer 252, i _t is the output of the sigmoid operation in the input gate layer 253, and c _t is the output of the hyperbolic tangent operation in the input gate layer 253.

다음으로, 출력 게이트 레이어(255)는 출력값(h_t)을 결정한다. h_t는 셀 상태(C_t)를 필터링한 값이다. 출력 게이트 레이어(255)는 시그모이드 연산과 쌍곡 탄젠트 연산을 포함할 수 있다. 여기서, 시그모이드 연산은 셀 상태의 부분을 결정하기 위한 셀 상태 결정 정보(o_t)를 계산할 수 있다. 쌍곡 탄젠트 연산은 업데이트된 셀 상태(C_t)를 쌍곡 탄젠트 연산하여 -1 내지 1 사이로 출력된 셀 상태 결정 정보(o_t)를 곱한다. 출력 게이트 레이어(255)에서 수행되는 시그모이드 연산과 쌍곡 탄젠트 연산 각각은 하기의 수학식 4와 같이 나타낼 수 있다.Next, the output gate layer 255 determines the output value h _t . h _t is the filtered value of the cell state (C _t ). The output gate layer 255 may include a sigmoid operation and a hyperbolic tangent operation. Here, the sigmoid operation may calculate cell state determination information o _t for determining a portion of the cell state. The hyperbolic tangent operation performs a hyperbolic tangent operation on the updated cell state C _t and multiplies the cell state determination information (o _t ) output between -1 and 1. Each of the sigmoid operation and the hyperbolic tangent operation performed at the output gate layer 255 may be represented by Equation 4 below.

O_t는 출력 게이트 레이어(255) 내 시그모이드 연산의 출력이고, h_t는 출력 게이트 레이어(255) 내 쌍곡 탄젠트 연산의 출력이다. 여기서도, W_o는 시그모이드 연산의 가중치이고, b_o는 시그모이드 연산의 바이어스 값이다. W_o와 b_o는 RNN의 학습에 의해 설정되는 값이다.O _t is the output of the sigmoid operation in the output gate layer 255, and h _t is the output of the hyperbolic tangent operation in the output gate layer 255. Here, W _o is a weight of the sigmoid operation, and b _o is a bias value of the sigmoid operation. W _o and b _o are the values set by the learning of the RNN.

도 6은 일 실시예에 따른 픽셀별로 확률 결과를 출력하는 과정을 설명하기 위한 도면이다.6 is a diagram for describing a process of outputting a probability result for each pixel, according to an exemplary embodiment.

도 6에 도시된 바와 같이, 확률 분류부(140)는 시간적 정보(260)에 대해 전경과 배경에 대한 확률(280)을 분류할 수 있으며, 픽셀 단위로 연산을 수행하기 대문에 원본과 동일한 해상도의 결과를 획득할 수 있다. 이를 위해, 확률 분류부(140)는 FCNN(270)을 사용하여 확률을 분류할 수 있다.As shown in FIG. 6, the probability classifier 140 may classify the probabilities 280 for the foreground and the background with respect to the temporal information 260. The result of can be obtained. To this end, the probability classifier 140 may classify the probabilities using the FCNN 270.

확률 분류부(140)는 RNN(250)에서 픽셀마다 사용한 파라미터를 FCNN(270)에서 동일하게 적용하므로 전경과 배경을 분리하기 위한 파라미터의 사용을 최소화할 수 있다. 예를 들어, FCNN(270)을 모든 픽셀에 대해 적용하는 경우, 해상도에 비례하는 파라미터를 필요로 하지만, 확률 분류부(140)는 RNN(250)에서 픽셀마다 사용한 파라미터를 FCNN(270)에서 동일하게 적용하므로 해상도와 상관없이 동일한 파라미터를 사용할 수 있다.Since the probability classifier 140 applies the same parameters used for each pixel in the RNN 250 in the FCNN 270, the use of the parameter for separating the foreground and the background may be minimized. For example, when the FCNN 270 is applied to all pixels, the parameter that is proportional to the resolution is required, but the probability classifier 140 uses the same parameters used in the FCNN 270 for each pixel in the RNN 250. The same parameters can be used regardless of the resolution.

상술한 바와 같이, 확률 분류부(140)는 연속적인 입력에 대해 연속적인 결과를 출력하는 RNN(250)을 이용하기 때문에 동영상의 시각적인 정보 이외에도 프레임의 연속적인 배열로서 표현되는 시간적인 정보를 해석할 수 있다.As described above, since the probability classification unit 140 uses the RNN 250 that outputs a continuous result with respect to the continuous input, in addition to the visual information of the video, the probability classification unit 140 interprets the temporal information represented as a continuous arrangement of frames. can do.

도 7은 일 실시예에 따른 영상 처리 방법을 도시한 순서도이다.7 is a flowchart illustrating an image processing method, according to an exemplary embodiment.

도 7에 도시된 바와 같이, 영상 처리 장치(100)는 입력 프레임으로부터 특징맵으로 구성될 수 있는 시각 정보를 추출할 수 있다(S310). 영상 처리 장치(100)는 시각 정보를 특징맵의 형태로 추출할 수 있으며, 입력 프레임에 콘볼루션 필터를 이용하여 특징맵을 추출할 수 있다. 여기서, 특징맵은 CNN을 이용하여 구현된 콘볼루션 필터를 이용함에 따라 CNN 특징으로 정의될 수도 있다.As illustrated in FIG. 7, the image processing apparatus 100 may extract visual information that may be configured as a feature map from an input frame (S310). The image processing apparatus 100 may extract visual information in the form of a feature map, and may extract a feature map by using a convolution filter on an input frame. Here, the feature map may be defined as a CNN feature by using a convolution filter implemented using the CNN.

영상 처리 장치(100)는 특징맵을 픽셀 단위로 분할하여 특징 패치를 획득할 수 있다(S320). 이때, 영상 처리 장치(100)는 특징맵이 복수의 레이어로 구성되는 경우, 복수의 레이어 각각에 대한 시각 정보를 모두 포함한 특징 패치로 분할할 수 있다.The image processing apparatus 100 may obtain a feature patch by dividing the feature map by pixel units (S320). In this case, when the feature map includes a plurality of layers, the image processing apparatus 100 may divide the feature map into feature patches including all visual information about each of the plurality of layers.

영상 처리 장치(100)는 특징 패치 각각에 대해 RNN을 이용하여 시간적 정보를 추출할 수 있다(S330). 영상 처리 장치(100)는 외부 메모리(예를 들어, 도 6의 RNN의 출력값인 h_t-1, h_t, h_t+1 등으로 나타낼 수 있음)와 내부 메모리(예를 들어, 도 6의 셀 상태인 C_t-1, C_t, C_t+1등으로 나타낼 수 있음)를 사용하여 시간적 흐름에 대응하여 메모리를 갱신하면서, 외부 메모리와 내부 메모리에 저장된 정보를 포함하는 특징 벡터(W)를 추출할 수 있다. 이와 같이, 영상 처리 장치(100)는 시간적 정보로서 특징 벡터(예를 들어, 도 6의 가중치(W_f, W_i, W_c, W_o)와 바이어스(b_f, b_i, b_c, b_o))를 추출할 수 있다.The image processing apparatus 100 may extract temporal information for each feature patch using the RNN (S330). The image processing apparatus 100 may include an external memory (for example, h _t-1 , h _t , h _{t + 1,} etc.) and an internal memory (for example, FIG. 6). Feature vector including information stored in the external memory and the internal memory, while updating the memory in response to the temporal flow using the cell states C _t-1 , C _t , C _{t + 1,} etc.). Can be extracted. As described above, the image processing apparatus 100 may include a feature vector (for example, weights W _f , W _i , W _c , and W _o of FIG. 6) and a bias b _f , b _i , b _c , and b as temporal information. _o )) can be extracted.

영상 처리 장치(100)는 RNN을 이용하여 시각 정보에만 의존하지 않으며, 시간적 정보를 활용할 수 있는 신경망의 학습이 가능하다. 따라서, 영상 처리 장치(100)는 움직이는 객체에 대한 움직임 정보를 검출할 수 있다. 또한, 영상 처리 장치(100)는 RNN을 이용하여 영상 프레임으로부터 획득된 과거값을 저장하고, 다음값을 예측할 수 있다.The image processing apparatus 100 does not rely only on visual information by using the RNN, but may learn a neural network that may utilize temporal information. Therefore, the image processing apparatus 100 may detect motion information about the moving object. In addition, the image processing apparatus 100 may store a past value obtained from the image frame by using the RNN and predict the next value.

영상 처리 장치(100)는 시간적 정보로부터 픽셀별로 확률 결과를 추출할 수 있다(S340). 영상 처리 장치(100)는 확률 결과를 FCNN을 이용하여 시간적 정보로부터 추출할 수 있다.The image processing apparatus 100 may extract a probability result for each pixel from the temporal information (S340). The image processing apparatus 100 may extract the probability result from the temporal information using the FCNN.

영상 처리 장치(100)는 추출된 시간적 정보를 이용하여 입력 영상 내의 객체의 움직임을 검출할 수 있다(S350).The image processing apparatus 100 may detect a movement of an object in the input image by using the extracted temporal information (S350).

영상 처리 장치(100)는 영상 내에서 움직임 검출을 통해 전경과 배경을 분리하고 종료할 수 있다(S360).The image processing apparatus 100 may separate and terminate the foreground and the background through motion detection in the image (S360).

한편, 도 7에서 도시된 영상 처리 장치(100)의 동작은 CNN, RNN, FCNN을 순차적으로 사용하여 영상을 분리할 수 있으며, 도 7에 도시된 프로세스를 이용하면, 전경과 배경을 분리하기 위한 학습 동작에도 사용할 수 있고, 실제 전경과 배경을 분리하기 위한 실제 영상 분석 동작에도 사용될 수 있다. 여기서, 영상 처리 장치(100)는 학습 동작 또는 실제 영상 분석 시 CNN, RNN, FCNN에서 파라미터들을 학습할 수 있으며, 학습된 파라미터들을 사용하여 파라미터를 업데이트할 수 있다.Meanwhile, in the operation of the image processing apparatus 100 illustrated in FIG. 7, images may be separated using CNN, RNN, and FCNN sequentially, and the process illustrated in FIG. 7 may be used to separate the foreground and the background. It can be used for learning operation, and can also be used for real image analysis operation for separating the real foreground and background. Here, the image processing apparatus 100 may learn the parameters in the CNN, RNN, and FCNN during the learning operation or the actual image analysis, and may update the parameters by using the learned parameters.

본 실시예에 따른 영상 처리 장치(100)는 동영상에 대한 자동 분석 기법 중의 하나인 배경차분(예를 들어, 움직임 검출)을 할 수 있다. 이를 통해, 영상 처리 장치(100)는 동영상 내에서 움직이는 객체를 검출하여, 분석을 위한 영역을 움직이는 객체가 존재하는 영역으로 축소시킬 수 있어, 영상 분석 속도를 향상시킬 수 있다. 또한, 영상 처리 장치(100)는 동영상 내에서 움직임 유무, 방향, 종류에 대한 정보를 획득할 수 있다. 그러므로, 영상 처리 장치(100)는 보안, 경비와 같은 영상을 이용한 감시 분야에 활용될 수 있으며, 동영상 합성, 물체 인식, 영상 검색 등과 같은 기능이 활용될 수 있는 다른 다양한 분야에도 적용될 수 있다.The image processing apparatus 100 according to the present exemplary embodiment may perform background difference (for example, motion detection), which is one of automatic analysis techniques for a video. In this way, the image processing apparatus 100 may detect the moving object in the video and reduce the area for analysis to the area in which the moving object exists, thereby improving the image analysis speed. In addition, the image processing apparatus 100 may obtain information on the presence, direction, and type of motion in the video. Therefore, the image processing apparatus 100 may be used in a surveillance field using an image such as security and security, and may be applied to various other fields in which functions such as video synthesis, object recognition, image search, etc. may be utilized.

본 실시예에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC 와 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램특허 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다.The term '~ part' used in the present embodiment refers to software or a hardware component such as a field programmable gate array (FPGA) or an ASIC, and the '~ part' performs certain roles. However, '~' is not meant to be limited to software or hardware. '~ Portion' may be configured to be in an addressable storage medium or may be configured to play one or more processors. Thus, as an example, '~' means components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, procedures, and the like. Subroutines, segments of program patent code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables.

구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로부터 분리될 수 있다.The functionality provided within the components and 'parts' may be combined into a smaller number of components and 'parts' or separated from additional components and 'parts'.

뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU 들을 재생시키도록 구현될 수도 있다.In addition, the components and '~' may be implemented to play one or more CPUs in the device or secure multimedia card.

또한 본 발명의 일실시예에 따르는 영상 처리 방법은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 컴퓨터 프로그램(또는 컴퓨터 프로그램 제품)으로 구현될 수도 있다. 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있으며, 특히, R 언어, Python 언어, Ruby 언어, Scheme 언어 등으로 구현될 수 있다. 또한 컴퓨터 프로그램은 유형의 컴퓨터 판독가능 기록매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)에 기록될 수 있다.In addition, the image processing method according to an embodiment of the present invention may be implemented as a computer program (or computer program product) including instructions executable by a computer. A computer program includes programmable machine instructions processed by a processor and may be implemented in a high-level programming language, object-oriented programming language, assembly language, or machine language. In particular, it can be implemented in the R language, the Python language, the Ruby language, and the Scheme language. Computer programs may also be recorded on tangible computer readable media (eg, memory, hard disks, magnetic / optical media or solid-state drives, etc.).

따라서 본 발명의 일실시예에 따르는 영상 처리 방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 머더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다.Accordingly, the image processing method according to an embodiment of the present invention may be implemented by executing the computer program as described above by the computing device. The computing device may include at least a portion of a processor, a memory, a storage device, a high speed interface connected to the memory and a high speed expansion port, and a low speed interface connected to the low speed bus and the storage device. Each of these components are connected to each other using a variety of buses and may be mounted on a common motherboard or otherwise mounted in a suitable manner.

여기서 프로세서는 컴퓨팅 장치 내에서 명령어를 처리할 수 있는데, 이런 명령어로는, 예컨대 고속 인터페이스에 접속된 디스플레이처럼 외부 입력, 출력 장치상에 GUI(Graphic User Interface)를 제공하기 위한 그래픽 정보를 표시하기 위해 메모리나 저장 장치에 저장된 명령어를 들 수 있다. 다른 실시예로서, 다수의 프로세서 및(또는) 다수의 버스가 적절히 다수의 메모리 및 메모리 형태와 함께 이용될 수 있다. 또한 프로세서는 독립적인 다수의 아날로그 및(또는) 디지털 프로세서를 포함하는 칩들이 이루는 칩셋으로 구현될 수 있다.Here, the processor may process instructions within the computing device, such as to display graphical information for providing a graphical user interface (GUI) on an external input, output device, such as a display connected to a high speed interface. Instructions stored in memory or storage. In other embodiments, multiple processors and / or multiple buses may be used with appropriately multiple memories and memory types. The processor may also be implemented as a chipset made up of chips comprising a plurality of independent analog and / or digital processors.

또한 메모리는 컴퓨팅 장치 내에서 정보를 저장한다. 일례로, 메모리는 휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 다른 예로, 메모리는 비휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 또한 메모리는 예컨대, 자기 혹은 광 디스크와 같이 다른 형태의 컴퓨터 판독 가능한 매체일 수도 있다.The memory also stores information within the computing device. In one example, the memory may consist of a volatile memory unit or a collection thereof. As another example, the memory may consist of a nonvolatile memory unit or a collection thereof. The memory may also be other forms of computer readable media, such as, for example, magnetic or optical disks.

그리고 저장장치는 컴퓨팅 장치에게 대용량의 저장공간을 제공할 수 있다. 저장 장치는 컴퓨터 판독 가능한 매체이거나 이런 매체를 포함하는 구성일 수 있으며, 예를 들어 SAN(Storage Area Network) 내의 장치들이나 다른 구성도 포함할 수 있고, 플로피 디스크 장치, 하드 디스크 장치, 광 디스크 장치, 혹은 테이프 장치, 플래시 메모리, 그와 유사한 다른 반도체 메모리 장치 혹은 장치 어레이일 수 있다.The storage device can provide a large storage space to the computing device. The storage device may be a computer readable medium or a configuration including such a medium, and may include, for example, devices or other configurations within a storage area network (SAN), and may include a floppy disk device, a hard disk device, an optical disk device, Or a tape device, flash memory, or similar other semiconductor memory device or device array.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

100: 영상 처리 장치 110: 특징 추출부
120: 패치 분할부 130: 시간적 정보 처리부
140: 확률 분류부100: image processing apparatus 110: feature extraction unit
120: patch division unit 130: temporal information processing unit
140: probability classification unit

Claims

A feature extractor which extracts visual information by applying a convolution filter to the input frame;
A patch dividing unit dividing the feature map including the visual information by pixel unit to obtain a feature patch;
A temporal information processor extracting temporal information about the feature patch;
A probability classification unit for extracting one probability result for each pixel from the temporal information; And
A background separator for separating the foreground and the background of the input frame based on the extracted probability result;
The temporal information processing unit extracts the temporal information using a cyclic neural network, and the probability classification unit extracts the probability result from the temporal information using a whole neural network, wherein the parameters used for each pixel in the cyclic neural network are all connected neural networks. The same applies to the image processing apparatus.

The method of claim 1,
The patch divider,
And when the feature map includes a plurality of layers, the feature patch includes all pixel information of each of the plurality of layers.

delete

The method of claim 1,
The temporal information processing unit,
And an information including information on the flow of visual information over time, and extracting the temporal information which is a prediction value for separating the foreground and the background.

The method of claim 1,
The convolution filter,
An image processing apparatus, which is a filter implemented using a convolutional neural network.

delete

The method of claim 1,
The background separator,
The image processing device detects the movement of the object within the input frame and separates the foreground and the background through the motion detection.

In the image processing method performed by the image processing apparatus,
Extracting visual information by applying a convolution filter to the input frame;
Dividing the feature map including the visual information by pixel unit to obtain a feature patch;
Extracting temporal information about the feature patch;
Extracting one probability result for each pixel from the temporal information; And
Separating the foreground and the background of the input frame based on the extracted probability result;
Extracting the temporal information,
Extracting the temporal information using a circulatory neural network,
Extracting the probability result,
And extracting the probability result from the temporal information using the whole neural network, wherein the parameters used for each pixel in the cyclic neural network are equally applied to the whole neural network.

The method of claim 8,
Acquiring the feature patch,
And when the feature map includes a plurality of layers, splitting the feature patch to include all pixel information of each of the plurality of layers.

delete

The method of claim 8,
The temporal information is,
An image processing method comprising information on the flow of visual information according to the flow of time, and is a prediction value for separating the foreground and the background.

The method of claim 8,
Extracting the feature,
And extracting a feature using the convolution filter implemented using a convolutional neural network.

delete

The method of claim 8,
Separating the foreground and the background,
Detecting a movement of an object within the input frame,
And separating the foreground and the background through the motion detection.