KR102622438B1

KR102622438B1 - Optical flow estimation method and object detection method using the same

Info

Publication number: KR102622438B1
Application number: KR1020210136543A
Authority: KR
Inventors: 정기석; 강준구; 노시동
Original assignee: 한양대학교 산학협력단
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2024-01-05
Also published as: KR20230053235A

Abstract

인공 신경망을 이용하여 옵티컬 플로우를 추정하고, 추정된 옵티컬 플로우를 이용하여 비디오에서 객체를 검출하는 방법이 개시된다. 개시된 옵티컬 플로우 추정 방법은 복수의 컨벌루션 레이어를 포함하는 인코더를 이용하여, 제1 및 제2이미지에 대한 특징값을 생성하는 단계; 복수의 디컨벌루션 레이어를 포함하는 디코더 및 상기 특징값을 이용하여, 상기 제1 및 제2이미지에 대한 적어도 하나의 옵티컬 플로우 및 상기 옵티컬 플로우에 대한 불확실성값을 추정하는 단계; 및 상기 옵티컬 플로우 및 불확실성값에 대한 손실값을 계산하여, 상기 인코더 및 디코더를 학습하는 단계를 포함한다.A method of estimating optical flow using an artificial neural network and detecting an object in video using the estimated optical flow is disclosed. The disclosed optical flow estimation method includes generating feature values for first and second images using an encoder including a plurality of convolutional layers; estimating at least one optical flow for the first and second images and an uncertainty value for the optical flow using a decoder including a plurality of deconvolution layers and the feature values; and calculating loss values for the optical flow and uncertainty values to learn the encoder and decoder.

Description

Optical flow estimation method and object detection method using the same {OPTICAL FLOW ESTIMATION METHOD AND OBJECT DETECTION METHOD USING THE SAME}

본 발명은 옵티컬 플로우 추정 방법 및 이를 이용하는 객체 검출 방법에 관한 것으로서 더욱 상세하게는 인공 신경망을 이용하여 옵티컬 플로우를 추정하고, 추정된 옵티컬 플로우를 이용하여 비디오에서 객체를 검출하는 방법에 관한 것이다. The present invention relates to an optical flow estimation method and an object detection method using the same. More specifically, it relates to a method of estimating optical flow using an artificial neural network and detecting an object in a video using the estimated optical flow.

옵티컬 플로우(optical flow)란, 인접한 두 이미지 사이에서의 객체의 움직임 패턴으로서, 두 이미지 사이에서 대응되는 픽셀별 모션 벡터로 계산될 수 있다. 단일 이미지와 달리 비디오는 연속된 이미지로 구성되며, 이러한 이미지 사이의 옵티컬 플로우로 인해, 비디오에서 객체 검출 성능이 저하될 수 있다.Optical flow is a movement pattern of an object between two adjacent images, and can be calculated as a motion vector for each pixel corresponding between the two images. Unlike a single image, a video consists of a series of images, and the optical flow between these images can degrade object detection performance in video.

이에 비디오에서 객체를 검출하기 위해 옵티컬 플로우가 이용된다. 옵티컬 플로우는, 두 이미지 사이에서의 객체의 움직임 변화에 대한 정보를 포함하므로, 두 이미지 사이의 옵티컬 플로우를 추정하여 비디오에서의 객체 검출 성능을 개선하는 방법이 개발되고 있다. Therefore, optical flow is used to detect objects in video. Since optical flow includes information about changes in object movement between two images, methods are being developed to improve object detection performance in video by estimating optical flow between two images.

관련 선행문헌으로 특허 문헌인 대한민국 공개특허 제2020-0010971호, 대한민국 등록특허 제10-2186764호, 비특허 문헌인 "Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision.", "Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei. 2017. Flow-guided feature aggregation for video object detection. In Proceedings of the IEEE International Conference on Computer Vision. 408-417."가 있다.Related prior documents include the patent literature, Republic of Korea Patent Publication No. 2020-0010971, Republic of Korea Patent Registration No. 10-2186764, and the non-patent literature “Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision.", "Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei . 2017. Flow-guided feature aggregation for video object detection. In Proceedings of the IEEE International Conference on Computer Vision. 408-417."

본 발명은 옵티컬 플로우와 함께 옵티컬 플로우에 대한 불확실성값을 추정하는 옵티컬 플로우 추정 방법을 제공하기 위한 것이다.The present invention is intended to provide an optical flow estimation method that estimates the uncertainty value for the optical flow along with the optical flow.

또한 본 발명은 추정된 옵티컬 플로우에 대한 부정확성으로 인해 발생할 수 있는 객체의 오검출을 줄일 수 있는, 객체 검출 방법을 제공하기 위한 것이다. Additionally, the present invention is intended to provide an object detection method that can reduce misdetection of objects that may occur due to inaccuracies in the estimated optical flow.

상기한 목적을 달성하기 위한 본 발명의 일 실시예에 따르면, 복수의 컨벌루션 레이어를 포함하는 인코더를 이용하여, 제1 및 제2이미지에 대한 특징값을 생성하는 단계; 복수의 디컨벌루션 레이어를 포함하는 디코더 및 상기 특징값을 이용하여, 상기 제1 및 제2이미지에 대한 적어도 하나의 옵티컬 플로우 및 상기 옵티컬 플로우에 대한 불확실성값을 추정하는 단계; 및 상기 옵티컬 플로우 및 불확실성값에 대한 손실값을 계산하여, 상기 인코더 및 디코더를 학습하는 단계를 포함하는 옵티컬 플로우 추정 방법이 제공된다.According to an embodiment of the present invention for achieving the above object, generating feature values for first and second images using an encoder including a plurality of convolutional layers; estimating at least one optical flow for the first and second images and an uncertainty value for the optical flow using a decoder including a plurality of deconvolution layers and the feature values; and calculating loss values for the optical flow and uncertainty values to learn the encoder and decoder.

또한 상기한 목적을 달성하기 위한 본 발명의 다른 실시예에 따르면, 복수의 컨벌루션 레이어를 포함하는 인코더를 이용하여, 제1 및 제2이미지에 대한 특징값을 생성하는 단계; 복수의 디컨벌루션 레이어를 포함하는 디코더 및 상기 특징값을 이용하여, 상기 제1 및 제2이미지에 대한 적어도 하나의 옵티컬 플로우 및 상기 옵티컬 플로우에 대한 불확실성값을 추정하는 단계; 및 상기 옵티컬 플로우 및 불확실성값에 대한 손실값을 계산하여, 상기 인코더 및 디코더를 학습하는 단계를 포함하며, 상기 인코더 및 디코더를 학습하는 단계는 상기 옵티컬 플로우가 가우시안 분포의 평균값, 상기 불확실성값이 상기 가우시안 분포의 분산값이 되도록, 상기 인코더 및 디코더를 학습하는 옵티컬 플로우 추정 방법이 제공된다.In addition, according to another embodiment of the present invention for achieving the above object, generating feature values for the first and second images using an encoder including a plurality of convolutional layers; estimating at least one optical flow for the first and second images and an uncertainty value for the optical flow using a decoder including a plurality of deconvolution layers and the feature values; And a step of learning the encoder and decoder by calculating a loss value for the optical flow and uncertainty value, wherein the step of learning the encoder and decoder is such that the optical flow is the average value of the Gaussian distribution, and the uncertainty value is the average value of the Gaussian distribution. An optical flow estimation method is provided for learning the encoder and decoder so that the variance value is a Gaussian distribution.

또한 상기한 목적을 달성하기 위한 본 발명의 또 다른 실시예에 따르면, 컨벌루션 레이어를 포함하는 미리 학습된 인코더를 이용하여, 비디오에 포함되는 제1 및 제2이미지에 대한 특징값을 생성하는 단계; 디컨벌루션 레이어를 포함하는 미리 학습된 디코더 및 상기 특징값을 이용하여, 상기 제1 및 제2이미지에 대한 적어도 하나의 옵티컬 플로우 및 상기 옵티컬 플로우에 대한 불확실성값을 추정하는 단계; 및 상기 옵티컬 플로우 및 불확실성값을 이용하여, 상기 제2이미지에서 타겟 객체를 검출하는 단계를 포함하는 비디오에서 객체를 검출하는 방법이 제공된다.In addition, according to another embodiment of the present invention for achieving the above object, generating feature values for first and second images included in a video using a pre-trained encoder including a convolutional layer; estimating at least one optical flow for the first and second images and an uncertainty value for the optical flow using a pre-trained decoder including a deconvolution layer and the feature values; and detecting a target object in the second image using the optical flow and uncertainty value.

본 발명의 일실시예에 따르면, 옵티컬 플로우 뿐만 아니라 옵티컬 플로우에 대한 불확실성값을 함께 추정함으로써, 추정된 옵티컬 플로우에 대한 신뢰도가 용이하게 예측될 수 있다.According to one embodiment of the present invention, the reliability of the estimated optical flow can be easily predicted by estimating the uncertainty value for the optical flow as well as the optical flow.

또한 본 발명의 일실시예에 따르면, 추정된 옵티컬 플로우에 대한 불확실성에 따라서 옵티컬 플로우의 반영 비율을 조절하여 객체를 검출함으로써, 옵티컬 플로우의 부정확에 따른 객체 인식 성능 저하가 방지될 수 있다.Additionally, according to an embodiment of the present invention, by detecting an object by adjusting the reflection ratio of the optical flow according to uncertainty about the estimated optical flow, deterioration of object recognition performance due to inaccuracy of the optical flow can be prevented.

도 1은 옵티컬 플로우를 이용하여, 비디오에서 객체를 검출하는 방법을 설명하기 위한 도면이다.
도 2는 본 발명의 일실시예에 따른 옵티컬 플로우 추정 방법을 나타내는 흐름도이다.
도 3 및 도 4는 본 발명의 일실시예에 따른 디코더의 옵티컬 플로우 추정 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 일실시예에 따른 비디오에서 객체를 검출하는 방법을 설명하기 위한 도면이다.Figure 1 is a diagram to explain a method of detecting an object in a video using optical flow.
Figure 2 is a flowchart showing an optical flow estimation method according to an embodiment of the present invention.
Figures 3 and 4 are diagrams for explaining a method of estimating optical flow of a decoder according to an embodiment of the present invention.
Figure 5 is a diagram for explaining a method of detecting an object in a video according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention. While describing each drawing, similar reference numerals are used for similar components.

이하에서, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the attached drawings.

도 1은 옵티컬 플로우를 이용하여, 비디오에서 객체를 검출하는 방법을 설명하기 위한 도면이다.Figure 1 is a diagram to explain a method of detecting an object in a video using optical flow.

도 1에 도시된 바와 같이, 옵티컬 플로우 추정을 위한 인공 신경망은 일반적으로 인코더(110)와 디코더(120)를 포함한다. 인코더(110)는 복수의 컨벌루션 레이어를 포함하며, 디코더(120)는 복수의 디컨벌루션 레이어를 포함한다.As shown in FIG. 1, an artificial neural network for optical flow estimation generally includes an encoder 110 and a decoder 120. The encoder 110 includes a plurality of convolution layers, and the decoder 120 includes a plurality of deconvolution layers.

인코더(110)로 제1 및 제2이미지(111, 112)가 입력된다. 여기서, 제1이미지(111)는 비디오의 이전 프레임에 대응되는 이미지, 제2이미지(112)는 비디오의 현재 프레임에 대응되는 이미지일 수 있으며, 인코더(110)는 제1 및 제2이미지(111, 112)에 대한 특징값을 출력한다. 그리고 특징값은 디코더(120)를 입력되며, 디코더(120)는 제1 및 제2이미지(111, 112)에 대한 옵티컬 플로우를 추정한다. 옵티컬 플로우는 제1 및 제2이미지(111, 112) 사이의 픽셀별 모션 벡터 맵의 형태일 수 있다. 제1 및 제2이미지(111, 112)에 대한 옵티컬 플로우의 정답값(ground truth)과 추정된 옵티컬 플로우 사이의 손실값이 손실 함수를 통해 계산되며, 역전파를 통해 이러한 손실값이 작아지도록 인코더(110)와 디코더(120)가 학습된다.The first and second images 111 and 112 are input to the encoder 110. Here, the first image 111 may be an image corresponding to the previous frame of the video, the second image 112 may be an image corresponding to the current frame of the video, and the encoder 110 may use the first and second images 111. , 112) outputs the feature values. Then, the feature values are input to the decoder 120, and the decoder 120 estimates the optical flow for the first and second images 111 and 112. The optical flow may be in the form of a motion vector map for each pixel between the first and second images 111 and 112. The loss value between the ground truth of the optical flow for the first and second images 111 and 112 and the estimated optical flow is calculated through a loss function, and the encoder reduces this loss value through backpropagation. 110 and decoder 120 are learned.

객체 검출기(130)는 제1 및 제2이미지(111, 112)에 대한 특징값과, 추정된 옵티컬 플로우를 이용하여, 제2이미지(112)에서 타겟 객체를 검출한다. 객체 검출기(130)는 제1 및 제2이미지(111, 112) 사이의 옵티컬 플로우를 이용하여, 제1이미지의 특징값을 갱신하고, 갱신된 제1이미지의 특징값을 제2이미지(112)에 대한 특징값에 워핑(warping)하여, 타겟 객체를 검출한다. 제1이미지(111) 및 제2이미지(112) 사이에서, 타겟 객체가 빠르게 이동할 경우, 제2이미지(112)에서 타겟 객체의 검출이 어려울 수 있는데, 옵티컬 플로우는 이러한 타겟 객체의 움직임 정보를 포함하므로, 객체 검출기(130)는 옵티컬 플로우를 이용하여 제2이미지(112)에 대한 특징값을 보정함으로써, 보다 정확하게 타겟 객체를 검출할 수 있다.The object detector 130 detects the target object in the second image 112 using the characteristic values of the first and second images 111 and 112 and the estimated optical flow. The object detector 130 uses the optical flow between the first and second images 111 and 112 to update the feature values of the first image, and converts the updated feature values of the first image into the second image 112. The target object is detected by warping the feature values for . If the target object moves quickly between the first image 111 and the second image 112, it may be difficult to detect the target object in the second image 112, and the optical flow includes movement information of the target object. Therefore, the object detector 130 can detect the target object more accurately by correcting the feature values of the second image 112 using optical flow.

이와 같이, 비디오에서의 객체 검출에 옵티컬 플로우를 이용함으로써, 객체 검출 성능이 향상될 수 있는데, 이는 추정된 옵티컬 플로우의 정확성을 전제로 한다. 만일 추정된 옵티컬 플로우가 정확하지 않음에도 불구하고, 옵티컬 플로우를 이용하여 객체를 검출할 경우, 오히려 객체 검출 성능이 저하될 수 있다.In this way, by using optical flow to detect objects in video, object detection performance can be improved, which is premised on the accuracy of the estimated optical flow. If an object is detected using optical flow even though the estimated optical flow is not accurate, object detection performance may actually deteriorate.

본 발명은 이러한 점에 착안하여 도출된 발명으로서, 인공 신경망을 이용하여 옵티컬 플로우를 추정할 때 옵티컬 플로우 뿐만 아니라 추정된 옵티컬 플로우에 대한 불확실성값도 함께 추정한다. 여기서 불확실성값은 옵티컬 플로우에 대한 신뢰도 또는 정확도에 대응되는 개념으로서, 불확실성값이 높을수록 추정된 옵티컬 플로우가 정확하지 않을 가능성이 높아지며, 불확실성값이 낮을수록 추정된 옵티컬 플로우가 정확하지 않을 가능성은 낮아진다. The present invention is an invention derived with this in mind. When estimating optical flow using an artificial neural network, not only the optical flow but also the uncertainty value for the estimated optical flow is estimated. Here, the uncertainty value is a concept that corresponds to the reliability or accuracy of the optical flow. The higher the uncertainty value, the higher the possibility that the estimated optical flow is inaccurate, and the lower the uncertainty value, the lower the possibility that the estimated optical flow is inaccurate. .

그리고 본 발명은, 추정된 옵티컬 플로우와 불확실성값을 함께 이용하여 비디오에서 객체를 검출한다. 불확실성값이 높을수록, 옵티컬 플로우의 반영 비율을 낮춰 객체를 검출함으로써, 옵티컬 플로우의 부정확으로 인한 객체 검출 성능 저하가 방지될 수 있다.And the present invention detects objects in video by using the estimated optical flow and uncertainty value together. The higher the uncertainty value, the lower the reflection ratio of the optical flow to detect the object, thereby preventing the object detection performance from being degraded due to the inaccuracy of the optical flow.

본 발명의 일실시예에 따른 옵티컬 플로우 추정 방법과 비디오에서의 객체 검출 방법은, 프로세서 및 메모리를 포함하는 컴퓨팅 장치에서 수행될 수 있으며, 인공 신경망 기반으로 구현될 수 있다.The optical flow estimation method and the object detection method in video according to an embodiment of the present invention can be performed on a computing device including a processor and memory, and can be implemented based on an artificial neural network.

도 2는 본 발명의 일실시예에 따른 옵티컬 플로우 추정 방법을 나타내는 흐름도이다.Figure 2 is a flowchart showing an optical flow estimation method according to an embodiment of the present invention.

본 발명의 일실시예에 따른 컴퓨팅 장치는, 인코더와 디코더를 포함하는 인공 신경망을 이용하여, 옵티컬 플로우와 옵티컬 플로우에 대한 불확실성값을 추정한다. 인코더는 제1 및 제2이미지를 입력받아, 제1 및 제2이미지에 대한 특징값을 출력하며, 디코더는 제1 및 제2이미지에 대한 특징값을 입력받아, 옵티컬 플로우 및 옵티컬 플로우에 대한 불확실성값을 출력한다.A computing device according to an embodiment of the present invention estimates optical flow and uncertainty values for the optical flow using an artificial neural network including an encoder and a decoder. The encoder receives the first and second images as input and outputs feature values for the first and second images, and the decoder receives feature values for the first and second images and calculates the optical flow and uncertainty about the optical flow. Prints the value.

도 2를 참조하면, 본 발명의 일실시예에 다른 컴퓨팅 장치는 복수의 컨벌루션 레이어를 포함하는 인코더를 이용하여, 제1 및 제2이미지에 대한 특징값을 생성(S210)한다. 전술된 바와 같이, 제1 및 제2이미지는 비디오에 포함된 프레임일 수 있으며, 제1이미지는, 제2이미지에서의 타겟 객체 검출을 위해 참조되는 프레임일 수 있다.Referring to FIG. 2, a computing device according to an embodiment of the present invention generates feature values for the first and second images using an encoder including a plurality of convolutional layers (S210). As described above, the first and second images may be frames included in the video, and the first image may be a frame referenced for detecting the target object in the second image.

컴퓨팅 장치는 복수의 디컨벌루션 레이어를 포함하는 디코더 및 단계 S210에서 생성된 특징값을 이용하여, 제1 및 제2이미지에 대한 적어도 하나의 옵티컬 플로우 및 옵티컬 플로우에 대한 불확실성값을 추정(S220)한다. 그리고 추정된 옵티컬 플로우 및 불확실성값에 대한 손실값을 계산하여, 인코더 및 디코더를 학습(S230)한다. The computing device estimates at least one optical flow for the first and second images and an uncertainty value for the optical flow using the decoder including a plurality of deconvolution layers and the feature value generated in step S210 (S220). . Then, the loss value for the estimated optical flow and uncertainty value is calculated to learn the encoder and decoder (S230).

전술된 바와 같이, 옵티컬 플로우는 제1 및 제2이미지 사이의 픽셀별 모션 벡터 맵으로서, x축 방향의 픽셀별 모션 벡터 맵, y축 방향의 픽셀별 모션 벡터 맵을 포함할 수 있다. 그리고 이와 같이 구성되는 옵티컬 플로우에 대응되도록 불확실성값은 x축 방향의 픽셀별 모션 벡터 맵에 대한 불확실성값 맵, y축 방향의 픽셀별 모션 벡터 맵에 대한 불확실성값 맵을 포함할 수 있다.As described above, the optical flow is a motion vector map for each pixel between the first and second images, and may include a motion vector map for each pixel in the x-axis direction and a motion vector map for each pixel in the y-axis direction. And, to correspond to the optical flow configured in this way, the uncertainty value may include an uncertainty value map for the motion vector map for each pixel in the x-axis direction and an uncertainty value map for the motion vector map for each pixel in the y-axis direction.

본 발명의 일실시예는 인공신경망에 의해 추정되는 옵티컬 플로우가 가우시안 분포를 추종하는 것으로 가정하고, 디코더에 의해 출력되는 옵티컬 플로우는 가우시안 분포의 평균값, 불확실성값은 가우시안 분포의 분산값이 되도록 인코더 및 디코더를 학습한다. 가우시안 분포에서 분산값이 크다는 것은, 데이터가 넓게 분포되어 있음을 의미하고, 확률 밀도 함수 관점에서 분산값이 클수록 평균값에 대한 확률값은 낮아지므로, 가우시안 분포에서의 분산값이 불확실성값으로 이용될 수 있다. One embodiment of the present invention assumes that the optical flow estimated by the artificial neural network follows the Gaussian distribution, and the optical flow output by the decoder is the average value of the Gaussian distribution, and the uncertainty value is the variance value of the Gaussian distribution. Learn the decoder. A large variance in the Gaussian distribution means that the data is widely distributed, and from the perspective of the probability density function, the larger the variance, the lower the probability value for the average value, so the variance in the Gaussian distribution can be used as an uncertainty value. .

이를 위해, 본 발명의 일실시예는 디코더의 출력값이 [수학식 1]과 같은 가우시안 분포의 우도 함수(, likelihood function)를 추종할 수 있는 손실함수를 이용하여, 인코더 및 디코더를 학습한다. 옵티컬 플로우의 경우, 정답값(GT)과의 비교를 통해 손실값이 계산되는 반면, 불확실성값은 정답값과의 비교없이 손실값이 계산된다.For this purpose, in one embodiment of the present invention, the output value of the decoder is a likelihood function of a Gaussian distribution such as [Equation 1] ( , likelihood function) is used to learn the encoder and decoder. In the case of optical flow, the loss value is calculated through comparison with the correct value (GT), while the uncertainty value is calculated without comparison with the correct value.

여기서, u^GT는 추정된 옵티컬 플로우에 대한 정답값을 나타낸다 그리고 μ는 추정된 옵티컬 플로우로서, x축 방향에 대한 옵티컬 플로우(μ_x), y축 방향에 대한 옵티컬 플로우를 포함한다. 그리고 는 불확실성값을 나타내며, x축 방향에 대한 옵티컬 플로우에 대한 불확실성값(x), y축 방향에 대한 옵티컬 플로우에 대한 불확실성값(y)을 포함한다. Here, u ^GT represents the correct answer value for the estimated optical flow and μ is the estimated optical flow, which includes the optical flow (μ _x ) in the x-axis direction and the optical flow in the y-axis direction. and represents the uncertainty value, and the uncertainty value for the optical flow in the x-axis direction ( Uncertainty values for optical flow in the x) and y-axis directions ( includes y).

인코더 및 디코더를 학습한다는 것은, 손실함수를 통해 계산된 손실값이 최소가 되도록 학습하는 것을 의미하며, 이는 [수학식 1]의 우도 함수가 최대 우도값을 출력하도록 학습하는 것에 대응된다. 그리고 [수학식 1]의 우도값을 최대화하는 것은, [수학식 1]에 -log를 취한 함수를 최소화하는 것에 대응되며, 따라서, 본 발명의 일실시예에 따른 컴퓨팅 장치는 [수학식 1]에 -log를 취한 [수학식 2]와 같은 함수를 손실 함수(Loss)로 이용하여, 손실값을 계산하고, 손실값이 최소가 되도록 인코더 및 디코더를 학습한다.Learning the encoder and decoder means learning to minimize the loss value calculated through the loss function, which corresponds to learning the likelihood function in [Equation 1] to output the maximum likelihood value. And maximizing the likelihood value of [Equation 1] corresponds to minimizing the function taking -log of [Equation 1], and therefore, the computing device according to an embodiment of the present invention is [Equation 1] A function such as [Equation 2], which takes -log of , is used as the loss function (Loss) to calculate the loss value, and learn the encoder and decoder to minimize the loss value.

이와 같은 손실 함수에 의해, 옵티컬 플로우는 가우시안 분포의 평균값, 불확실성값은 가우시안 분포의 분산값이 되도록 학습될 수 있다.By using this loss function, the optical flow can be learned to be the average value of the Gaussian distribution, and the uncertainty value can be learned to be the variance value of the Gaussian distribution.

한편, 컴퓨팅 장치는 멀티스케일(multiscale) 손실값을 계산하여 학습 성능을 향상시키기 위해, 단계 S220에서 단일의 옵티컬 플로우와 불확실성값을 추정하지 않고 복수의 옵티컬 플로우와 불확실성값을 추정할 수 있다. 컴퓨팅 장치는 복수의 옵티컬 플로우와 불확실성값에 대한 손실값을 계산한 후, 손실값을 더한값이 최소가 되도록 인코더와 디코더를 학습할 수 있다.Meanwhile, in order to improve learning performance by calculating a multiscale loss value, the computing device may estimate a plurality of optical flows and uncertainty values instead of estimating a single optical flow and uncertainty value in step S220. The computing device can calculate loss values for a plurality of optical flows and uncertainty values, and then learn the encoder and decoder so that the sum of the loss values is minimized.

도 3 및 도 4는 본 발명의 일실시예에 따른 디코더의 옵티컬 플로우 추정 방법을 설명하기 위한 도면으로서, 도 3은 디코더의 블록도를 도시하는 도면이며, 도 4는 디코더의 옵티컬 플로우 추정 방법을 레이어의 출력 데이터 중심으로 설명하는 도면이다.Figures 3 and 4 are diagrams for explaining the optical flow estimation method of the decoder according to an embodiment of the present invention. Figure 3 is a diagram showing a block diagram of the decoder, and Figure 4 is a diagram showing the optical flow estimation method of the decoder. This diagram focuses on the output data of the layer.

도 3을 참조하면 본 발명의 일실시예에 따른 디코더는 제1 및 제2추정 레이어(311, 312), 제1디컨벌루션 레이어(321)를 포함한다. 그리고 실시예에 따라서, 추가적으로 추정 레이어 및 디컨벌루션 레이어를 더 포함할 수 있다. 추정 레이어와 디컨벌루션 레이어의 개수는 실시예에 따라서 조절될 수 있다.Referring to FIG. 3, the decoder according to an embodiment of the present invention includes first and second estimation layers 311 and 312, and a first deconvolution layer 321. And depending on the embodiment, it may additionally include an estimation layer and a deconvolution layer. The number of estimation layers and deconvolution layers may be adjusted depending on the embodiment.

인코더에 의해 출력된 제1 및 제2이미지의 특징값(300)은, 제1추정 레이어(311) 및 제1디컨벌루션 레이어(321)로 입력된다. The feature values 300 of the first and second images output by the encoder are input to the first estimation layer 311 and the first deconvolution layer 321.

제1추정 레이어(311)는 특징값(200)으로부터 제1옵티컬 플로우 및 제1불확실성값을 추정한다. 추정 레이어는 인코더와 같이 컨벌루션을 이용하여 옵티컬 플로우 및 불확실성값을 추정할 수 있으며, x축 및 y축 방향의 옵티컬 플로우와 이에 대응되는 2개의 불확실성값을 출력하도록 4개의 채널을 이용할 수 있다. The first estimation layer 311 estimates the first optical flow and the first uncertainty value from the feature value 200. The estimation layer can estimate the optical flow and uncertainty value using convolution like an encoder, and can use four channels to output the optical flow in the x- and y-axis directions and the two corresponding uncertainty values.

제1디컨벌루션 레이어(321)는 입력된 특징값(300)을 디컨벌루션하여 출력한다. The first deconvolution layer 321 deconvolves the input feature value 300 and outputs it.

제2추정 레이어(312)는 제1디컨벌루션 레이어(321)의 출력값, 제1옵티컬 플로우 및 제1불확실성값으로부터 제2옵티컬 플로우 및 제2불확실성값을 추정한다. 이 때, 제1디컨벌루션 레이어(321)의 출력값, 제1옵티컬 플로우 및 제1불확실성값은 연결(concatenation)되어 제2추정 레이어(312)로 입력될 수 있다.The second estimation layer 312 estimates the second optical flow and the second uncertainty value from the output value of the first deconvolution layer 321, the first optical flow, and the first uncertainty value. At this time, the output value of the first deconvolution layer 321, the first optical flow, and the first uncertainty value may be concatenated and input to the second estimation layer 312.

컴퓨팅 장치는 전술된 손실 함수를 이용하여, 제1옵티컬 플로우 및 제1불확실성값에 대한 손실값과, 제2옵티컬 플로우 및 제2불확실성값에 대한 손실값을 계산할 수 있다.The computing device may calculate a loss value for the first optical flow and the first uncertainty value, and a loss value for the second optical flow and the second uncertainty value using the above-described loss function.

실시예에 따라서, 디코더는 추가적으로 옵티컬 플로우와 불확실성값이 추정될 수 있도록 추정 레이어와 디컨벌루션 레이어를 더 포함할 수 있다.Depending on the embodiment, the decoder may further include an estimation layer and a deconvolution layer so that optical flow and uncertainty values can be estimated.

제2디컨벌루션 레이어(322)는 제2추정 레이어(312)와 같이, 제1디컨벌루션 레이어(321)의 출력값, 제1옵티컬 플로우 및 제1불확실성값을 입력받아 디컨벌루션을 수행한다.Like the second estimation layer 312, the second deconvolution layer 322 receives the output value, first optical flow, and first uncertainty value of the first deconvolution layer 321 and performs deconvolution.

그리고 제3추정 레이어(313)는 제2디컨벌루션 레이어(322)의 출력값, 제2옵티컬 플로우 및 제2불확실성값으로부터, 제3옵티컬 플로우 및 제3불확실성값을 추정한다.And the third estimation layer 313 estimates the third optical flow and the third uncertainty value from the output value of the second deconvolution layer 322, the second optical flow, and the second uncertainty value.

한편, 도 4는 5개의 추정 레이어와 4개의 디컨벌루션 레이어를 포함하는 디코더에서 생성되는 데이터 중심으로, 옵티컬 플로우 추정 방법을 설명하기 위한 도면으로서, 도 4에서 파란색 박스는 디컨벌루션된 특징값, 초록색 박스는 인코더의 컨벌루션 레이어에서 출력된 특징값, 빨간색 박스는 연결된 옵티컬 플로우 및 불확실성값을 나타낸다. Meanwhile, Figure 4 is a diagram to explain the optical flow estimation method centered on data generated from a decoder including 5 estimation layers and 4 deconvolution layers. In Figure 4, the blue box represents the deconvolved feature value, and the green box represents the deconvolved feature value. The box represents the feature values output from the encoder's convolution layer, and the red box represents the connected optical flow and uncertainty value.

도 4를 참조하면, 최초 인코더에서 출력된 특징값(300)으로부터 제1옵티컬 플로우(4111) 및 제1불확실성값(4112), 그리고 디컨벌루션된 특징값(4211)이 생성되고, 제1옵티컬 플로우(4111), 제1불확실성값(4112) 및 디컨벌루션된 특징값(4211)은 연결되어, 다음 추정 레이어(312) 및 다음 디컨벌루션 레이어(322)로 입력된다. 이 때, 인코더의 컨벌루션 레이어 중 하나에서 출력된 특징값이 추가로 연결되어, 다음 추정 레이어(312) 및 다음 디컨벌루션 레이어(322)로 입력될 수 있다. 디컨벌루션이 진행될수록 디컨벌루션된 특징값을 나타내는 맵의 사이즈는 증가하며, 이러한 맵의 사이즈에 대응되는 컨벌루션 레이어의 특징값이 다음 추정 레이어(312) 및 다음 디컨벌루션 레이어(322)로 입력될 수 있다.Referring to FIG. 4, a first optical flow 4111, a first uncertainty value 4112, and a deconvoluted feature value 4211 are generated from the feature value 300 output from the first encoder, and the first optical flow (4111), the first uncertainty value (4112), and the deconvolved feature value (4211) are connected and input to the next estimation layer (312) and the next deconvolution layer (322). At this time, the feature value output from one of the convolution layers of the encoder may be additionally connected and input to the next estimation layer 312 and the next deconvolution layer 322. As deconvolution progresses, the size of the map representing the deconvolved feature values increases, and the feature values of the convolution layer corresponding to the size of this map can be input to the next estimation layer 312 and the next deconvolution layer 322. there is.

그리고 다음 추정 레이어(312) 및 다음 디컨벌루션 레이어(322)에서 생성된 제2옵티컬 플로우(4121), 제2불확실성값(4122) 및 디컨벌루션된 특징값(4221)은 다시 연결되어, 다음 추정 레이어(313) 및 다음 디컨벌루션 레이어로 입력된다. 이와 같이 이전 레이어의 출력값이 다음 레이어로 입력되며, 마지막 추정 레이어는, 마지막 디컨벌루션 레이어에서 출력된 디컨벌루션된 특징값(4241)과, 마지막 추정 레이어의 이전 추정 레이어에서 출력된 옵티컬 플로우(4141) 및 불확실성값(4142)으로부터, 옵티컬 플로우(4151) 및 불확실성값(4152)을 추정한다.And the second optical flow 4121, the second uncertainty value 4122, and the deconvolved feature value 4221 generated in the next estimation layer 312 and the next deconvolution layer 322 are reconnected to the next estimation layer. (313) and is input to the next deconvolution layer. In this way, the output value of the previous layer is input to the next layer, and the last estimation layer includes the deconvolved feature value (4241) output from the last deconvolution layer and the optical flow (4141) output from the previous estimation layer of the last estimation layer. And from the uncertainty value 4142, the optical flow 4151 and the uncertainty value 4152 are estimated.

컴퓨팅 장치는 각 추정 레이어에서 생성된 옵티컬 플로우 및 불확실성값에 대해, 전술된 손실 함수와 정답값(400)을 이용하여 손실값을 계산한다. 그리고 계산된 손실값을 더한값이 최소가 되도록 인코더와 디코더를 학습할 수 있다.The computing device calculates a loss value for the optical flow and uncertainty value generated in each estimation layer using the above-described loss function and the correct answer value (400). And the encoder and decoder can be trained so that the sum of the calculated loss values is minimum.

이 때, 디컨벌루션이 진행될수록 디컨벌루션된 특징값을 나타내는 맵의 사이즈가 증가하며, 추정 레이어는 디컨벌루션된 특징값을 나타내는 맵의 사이즈와 동일한 사이즈의 옵티컬 플로우 및 불확실성값을 생성하기 때문에, 각 추정 레이어에서 생성된 옵티컬 플로우 즉, 모션 벡터 맵의 사이즈는 정답값을 나타내는 모션 벡터 맵(400)의 사이즈가 상이할 수 있다. 이 경우, 컴퓨팅 장치는 정답값을 나타내는 모션 벡터 맵(400)의 사이즈를 조절하여, 옵티컬 플로우 및 불확실성값에 대한 손실값을 계산한다.At this time, as deconvolution progresses, the size of the map representing the deconvolved feature values increases, and the estimation layer generates optical flow and uncertainty values of the same size as the size of the map representing the deconvolved feature values, so each The size of the optical flow generated in the estimation layer, that is, the motion vector map, may be different from the size of the motion vector map 400 representing the correct value. In this case, the computing device adjusts the size of the motion vector map 400 representing the correct answer value and calculates the loss value for the optical flow and uncertainty value.

만일, 디코더가 제1 및 제2추정 레이어(311, 312)와 제1디컨벌루션 레이어(321)를 포함할 경우, 제1옵티컬 플로우에 대한 모션 벡터 맵의 사이즈는, 정답값을 나타내는 모션 벡터 맵의 사이즈보다 작을 수 있다. 컴퓨팅 장치는 정답값을 나타내는 모션 벡터 맵의 사이즈를 조절하여, 제1옵티컬 플로우 및 제1불확실성값에 대한 손실값을 계산할 수 있다.If the decoder includes the first and second estimation layers 311 and 312 and the first deconvolution layer 321, the size of the motion vector map for the first optical flow is the motion vector map representing the correct value. It may be smaller than the size of . The computing device may adjust the size of the motion vector map representing the correct value and calculate the loss value for the first optical flow and the first uncertainty value.

도 5는 본 발명의 일실시예에 따른 비디오에서 객체를 검출하는 방법을 설명하기 위한 도면이다.Figure 5 is a diagram for explaining a method of detecting an object in a video according to an embodiment of the present invention.

도 5를 참조하면 본 발명의 일실시예에 따른 컴퓨팅 장치는 전술된 바와 같이 인공 신경망을 이용하여, 비디오에 포함되는 제1 및 제2이미지에 대한 옵티컬 플로우 및 옵티컬 플로우에 대한 불확실성값을 추정(S510)한다. 즉 컨벌루션 레이어를 포함하는 미리 학습된 인코더를 이용하여, 제1 및 제2이미지에 대한 특징값을 생성하고, 디컨벌루션 레이어를 포함하는 미리 학습된 디코더 및 특징값을 이용하여, 제1 및 제2이미지에 대한 적어도 하나의 옵티컬 플로우 및 옵티컬 플로우에 대한 불확실성값을 추정한다.Referring to FIG. 5, the computing device according to an embodiment of the present invention uses an artificial neural network as described above to estimate the optical flow for the first and second images included in the video and the uncertainty value for the optical flow ( S510). That is, using a pre-trained encoder including a convolution layer, feature values for the first and second images are generated, and using a pre-learned decoder including a deconvolution layer and feature values, the first and second images are generated. At least one optical flow for the image and an uncertainty value for the optical flow are estimated.

그리고 컴퓨팅 장치는 추정된 옵티컬 플로우 및 불확실성값을 이용하여, 제2이미지에서 타겟 객체를 검출(S520)한다.Then, the computing device detects the target object in the second image using the estimated optical flow and uncertainty value (S520).

단계 S520에서 컴퓨팅 장치는 전술된 바와 같이, 옵티컬 플로우를 제1이미지의 특징값에 반영하여 타겟 객체를 검출할 수 있는데, 이 때 불확실성값에 따라 옵티컬 플로우의 반영 정도를 조절하여 타겟 객체를 검출한다. 컴퓨팅 장치는 불확실성값에 따라 결정되는 가중치를 추정된 옵티컬 플로우에 적용하고, 가중치가 적용된 옵티컬 플로우를 이용하여, 제1이미지의 특징값을 갱신한다. 즉, 픽셀별 옵티컬 플로우값에 대해 가중치가 곱해지며, 제1이미지의 특징값은 가중치가 곱해진 옵티컬 플로우에 의해 보정된다. 불확실성값이 크다는 것은, 추정된 옵티컬 플로우가 부정확할 가능성이 높다는 의미이므로, 가중치는 불확실성값이 클수록 감소하도록 설정될 수 있다.In step S520, the computing device may detect the target object by reflecting the optical flow to the feature value of the first image, as described above. At this time, the target object is detected by adjusting the degree of reflection of the optical flow according to the uncertainty value. . The computing device applies a weight determined according to the uncertainty value to the estimated optical flow and updates the feature value of the first image using the weighted optical flow. That is, the optical flow value for each pixel is multiplied by a weight, and the feature value of the first image is corrected by the optical flow multiplied by the weight. Since a large uncertainty value means that the estimated optical flow is likely to be inaccurate, the weight can be set to decrease as the uncertainty value increases.

일실시예로서, 컴퓨팅 장치는 -1이 곱해진 불확실성값에 시그모이드 함수를 취하여 나온 결과값을 가중치로 이용할 수 있다. As an example, the computing device may use the result obtained by taking the sigmoid function to the uncertainty value multiplied by -1 as a weight.

그리고 컴퓨팅 장치는 갱신된 제1이미지의 특징값을 제2이미지의 특징값에 워핑하여, 제2이미지에서 타겟 객체를 검출한다.Then, the computing device detects the target object in the second image by warping the updated feature values of the first image to the feature values of the second image.

이와 같이, 본 발명의 일실시예에 따르면, 추정된 옵티컬 플로우를 그대로 이용하여 제1이미지의 특징값을 갱신하지 않고 불확실성값을 반영하여 제1이미지의 특징값을 갱신함으로써, 옵티컬 플로우의 부정확에 따른 객체 인식 성능 저하가 방지될 수 있다.In this way, according to one embodiment of the present invention, the feature value of the first image is updated by reflecting the uncertainty value rather than updating the feature value of the first image using the estimated optical flow as is, thereby reducing the inaccuracy of the optical flow. Deterioration in object recognition performance can be prevented.

또한 본 발명의 일실시예에 따르면, 옵티컬 플로우에 대한 불확실성값을 이용하지 않고 객체를 검출하는 경우와 비교하여, ImageNet VID 데이터셋에 대한, mean average precision(mAP)와 오검출(false positive, FP) 특성이 1.27%, 10.59% 만큼 향상되었다.Additionally, according to an embodiment of the present invention, compared to the case of detecting an object without using the uncertainty value for the optical flow, the mean average precision (mAP) and false positive (FP) for the ImageNet VID dataset ) characteristics were improved by 1.27% and 10.59%.

앞서 설명한 기술적 내용들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예들을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The technical contents described above may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiments or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. A hardware device may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described with specific details such as specific components and limited embodiments and drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , those skilled in the art can make various modifications and variations from this description. Accordingly, the spirit of the present invention should not be limited to the described embodiments, and the scope of the patent claims described below as well as all modifications that are equivalent or equivalent to the scope of this patent claim shall fall within the scope of the spirit of the present invention. .

Claims

Generating feature values for the first and second images using an encoder including a plurality of convolutional layers;
estimating at least one optical flow for the first and second images and an uncertainty value for the optical flow using a decoder including a plurality of deconvolution layers and the feature values; and
Comprising a step of learning the encoder and decoder by calculating loss values for the optical flow and uncertainty values,
The decoder is
a first estimation layer that estimates a first optical flow and a first uncertainty value from the feature values;
A first deconvolution layer that receives the feature values; and
It includes a second estimation layer that estimates a second optical flow and a second uncertainty value from the output value of the first deconvolution layer, the first optical flow, and the first uncertainty value,
The step of learning the encoder and decoder is
Learning the encoder and decoder so that the first and second optical flows are the average value of the Gaussian distribution, and the first and second uncertainty values are the variance value of the Gaussian distribution.
Optical flow estimation method.

delete

According to clause 1,
The decoder is
a second deconvolution layer that receives the output value of the first deconvolution layer, the first optical flow, and the first uncertainty value; and
A third estimation layer that estimates a third optical flow and a third uncertainty value from the output value of the second deconvolution layer, the second optical flow, and the second uncertainty value.
Optical flow estimation method including.

According to clause 1,
The first and second estimation layers are
Using convolution, estimating the first and second optical flows and the first and second uncertainty values.
Optical flow estimation method.

delete

According to clause 1,
The step of learning the encoder and decoder is
Calculating the loss value for the first optical flow and the first uncertainty value, and the loss value for the second optical flow and the second uncertainty value using the loss function expressed by the following equation:
Optical flow estimation method.
[Equation]

Here, μ is the optical flow, represents the uncertainty value, and u ^GT represents the ground truth for the optical flow.

According to clause 6,
The first and second optical flows are
It is a motion vector map for each pixel between the first and second images,
The step of learning the encoder and decoder is
Calculating loss values for the first optical flow and first uncertainty value by adjusting the size of the motion vector map representing the correct value.
Optical flow estimation method.

delete

Generating feature values for first and second images included in the video using a pre-trained encoder including a convolutional layer;
estimating at least one optical flow for the first and second images and an uncertainty value for the optical flow using a pre-trained decoder including a deconvolution layer and the feature values; and
Applying a weight determined according to the uncertainty value to the optical flow and detecting a target object in the second image using the optical flow to which the weight is applied.
A method for detecting objects in a video containing.

According to clause 9,
The step of detecting the target object is
updating feature values of the first image using the weighted optical flow; and
Warping the updated feature value to the feature value of the second image to detect the target object.
A method for detecting objects in a video containing.

According to clause 10,
The weight is
It is set to decrease as the uncertainty value increases.
How to detect objects in video.