KR20190080388A

KR20190080388A - Photo Horizon Correction Method based on convolutional neural network and residual network structure

Info

Publication number: KR20190080388A
Application number: KR1020170182797A
Authority: KR
Inventors: 홍은빈; 전준호; 조성현; 이승용
Original assignee: 포항공과대학교 산학협력단
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2019-07-08
Also published as: KR102028705B1

Abstract

The present invention relates to a method of leveling a tilted image by measuring and using the inclination of the image when the tilted image is inputted. The method includes: a step (a) of generating a learning data set for learning an angle measurement network in accordance with a residual network structure including first and second pulling layers and an angle prediction part; a step (b) of learning an optimal parameter of the angle measurement network; and a step (c) of rotating the image and cropping an empty pixel area.

Description

[0001] The present invention relates to an image horizontal correction method and a residual network structure using a CNN,

본 발명은 주어진 입력이 수평이 맞지 않는 영상일 때(카메라의 시선 방향 축을 기준으로 한 회전인 롤(roll) 회전이 포함된 영상일 때), 영상의 기울어짐 정도를 측정한 후 이를 이용해 영상을 수평에 맞게 바로 세우는 방법에 관한 것이다. In the present invention, when a given input is an image that is not a horizontal image (when the image includes a roll rotation that is based on the view direction axis of the camera), the degree of tilt of the image is measured, And a method for standing upright in a horizontal direction.

영상의 수평 보정 여부는 사진을 미학적으로 평가할 때 중요한 역할을 차지한다. 예를 들어 도 1의 왼쪽 영상(도 1의 (a))같이 카메라가 기울어져 찍힌 사진은 선명도, 색이 아무리 좋아도 미학적으로 좋지 않아 보인다. 이에 비해 도 1의 오른쪽 영상(도 1의 (b))은 영상의 수평이 잘 맞기 때문에 미학적으로 좋은 영상이라 할 수 있다. The horizontal correction of the image plays an important role in the aesthetic evaluation of the photograph. For example, a photograph of a camera leaning like the left image of FIG. 1 ((a) in FIG. 1) does not seem to be aesthetically satisfactory in terms of sharpness and color. On the other hand, the right image of FIG. 1 (FIG. 1 (b)) is aesthetically good because the image is horizontally aligned.

영상 수평 보정 기술은 실제 사용자가 디지털 카메라나 스마트폰으로 사진을 찍을 때 가장 필요로 하는 기술 중 하나이다. 기존의 영상 수평 보정 기술은 입력 영상 내에 존재하는 선 정보에 기반하여 기울어져 찍힌 영상을 바로 세워 주는데 이러한 방법들은 뚜렷한 선 정보가 없는 인물 사진의 경우에는 제대로 동작하지 않는다.Image stabilization technology is one of the most needed techniques for real users to take pictures with digital camera or smart phone. Conventional image horizontal correction techniques are based on the line information existing in the input image, but they do not work well for portrait images without clear line information.

본 발명은 깊은 신경망 기술을 이용하여 영상의 수평을 보정함으로써 실제 사용자들이 일상 생활 속에서 촬영한 다양한 장면의 사진들에 대해 강인하게 동작할 수 있다. 이와 같은 강인한 영상 수평 보정 기술은 다양한 어플리케이션에 유용하게 적용 될 수 있는 기술이다.The present invention can operate robustly against the photographs of various scenes taken by the actual users in daily life by correcting the horizontal position of the image by using the deep neural network technology. This robust image level correction technique is a technique that can be applied to various applications.

영상의 수평 보정과 관련하여 지금까지 많은 연구가 진행되어 왔다. Much research has been conducted on the horizontal correction of images.

(비특허문헌 1)은 영상 내의 선 정보를 분석해 이를 이용하여 영상을 바로 세워주는 방법을 제시하였다. 그러나 영상 내 존재하는 선 정보에 기반하기 때문에 인공 구조물이 많이 포함된 사진이 아닌 경우나 뚜렷한 선이 없는 사진(e.g. 사람, 자연물)의 경우에는 잘 동작하지 않는다는 한계점이 있다. (Non-Patent Document 1) proposed a method of analyzing the line information in the image and setting up the image directly using the line information. However, since it is based on the existing line information, there is a limitation in that it does not work well for photographs that contain many artifacts or photographs that do not have sharp lines (for example, people or natural objects).

(비특허문헌 2)는 깊은 신경망 기술을 이용하여 세 가지 어려움 단계(±30°, ±40°, ±360° 사이의 각도)에 대해 기울어진 각도 측정을 수행하였다. 이 방법은 최근 뛰어난 성능을 보이고 있는 깊은 신경망 기술을 영상 수평 보정 기술에 적용했다는 데에 큰 의의가 있지만, 단순한 구조의 학습 네트워크를 사용하여 수평 보정 결과 영상이 3°~20° 정도의 큰 오차를 보이기 때문에 실제 실용성은 떨어진다. (Non-Patent Document 2) performed an angular measurement with respect to three difficult steps (angles between ± 30 °, ± 40 ° and ± 360 °) using deep neural network technology. This method has a great significance in applying the deep neural network technology, which has recently achieved excellent performance, to the image horizontal correction technique. However, by using a simple structure learning network, the horizontal correction result image has a large error of about 3 ° to 20 ° Practicality is lost because it is visible.

Hyunjoon Lee, Eli Shechtman, Jue Wang, and Seungyong Lee. Automatic upright adjustment of photographs. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 877-884. IEEE, 2012. Hyunjoon Lee, Eli Shechtman, Jue Wang, and Seungyong Lee. Automatic upright adjustment of photographs. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 877-884. IEEE, 2012. Philipp Fischer, Alexey Dosovitskiy, and Thomas Brox. Image orientation estimation with convolutional networks. In German Conference on Pattern Recognition, pages 368-378. Springer, 2015. Philipp Fischer, Alexey Dosovitskiy, and Thomas Brox. Image orientation estimation with convolutional networks. In German < / RTI > Conference on Pattern Recognition, pages 368-378. Springer, 2015. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016. Giorgos Tolias and Yannis Avrithis. Speeded-up, relaxed spatial matching. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1653-1660. IEEE, 2011. Giorgos Tolias and Yannis Avrithis. Speeded-up, relaxed spatial matching. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1653-1660. IEEE, 2011.

본 발명은 주어진 입력이 기울어져 찍힌 영상일 때 CNN을 기반으로 영상의 기울어짐 정도를 측정한 후 이 각도에 대해 회전시킴으로써 수평 보정이 된 결과영상을 얻을 수 있는 CNN을 이용한 영상 수평 보정 방법 및 그 방법에 사용되는 레지듀얼 네트워크 구조를 제공함을 그 목적으로 한다.The present invention relates to an image horizontal correction method using a CNN that can obtain a resultant image that is horizontally corrected by measuring the degree of tilt of an image based on CNN when the input is a slanted image, And to provide a residual network structure used in the method.

상기의 목적을 달성하기 위하여, 본 발명에 의한 레지듀얼 네트워크 구조는, 네트워크 구조의 마지막에, 1x1 크기의 특징 맵을 출력으로 갖는 제1 풀링 레이어; 2x2 크기의 특징 맵을 출력으로 갖는 제2 풀링 레이어; 및 제1 풀링 레이어에서 얻은 특징 벡터와 제2 풀링 레이어에서 얻은 특징 벡터를 결합하여 최종 결과 예측으로 사용하는 각도 예측부;를 포함하는 것을 특징으로 한다.In order to achieve the above object, a residual network structure according to the present invention comprises: a first pooling layer having a 1x1 feature map as an output at the end of a network structure; A second pooling layer having a 2x2 feature map as an output; And an angle predicting unit for combining the feature vector obtained from the first pooling layer and the feature vector obtained from the second pooling layer and using the feature vector as a final result prediction.

상기 레지듀얼 네트워크 구조에 있어서, 상기 각도 예측부는 2개의 특징 벡터의 크기를 통일하고, 크기가 통일된 2개의 특징 벡터의 평균 평균값을 산출하여 최종 결과 예측으로 사용하는 것을 특징으로 한다.In the residual network structure, the angle predicting unit may uniformize the sizes of the two feature vectors, calculate an average value of the two feature vectors having the same size, and use the result as a final result prediction.

상기의 목적을 달성하기 위하여, 본 발명에 의한 CNN을 이용한 영상 수평 보정 방법은 상기 레지듀얼 네트워크 구조에 따른 각도 측정 네트워크를 학습시키기 위한 학습 데이터셋을 생성하는 단계; (b) 학습 세팅을 달리해가며 상기 각도 측정 네트워크의 최적 파라미터를 학습하는 단계; 및 (c) 입력 영상에 대하여 상기 각도 측정 네트워크를 이용하여 기울어진 각도를 측정하고, 측정된 기울어진 각도만큼 반대 방향으로 영상을 회전 시키고, 빈 픽셀 영역을 크롭하는 단계;를 포함하여 구성된다.According to another aspect of the present invention, there is provided an image horizontal correction method using CNN, comprising: generating a learning data set for learning an angular measurement network according to the residual network structure; (b) learning the optimal parameters of the angular measurement network with different learning settings; And (c) measuring an inclined angle of the input image using the angular measurement network, rotating the image in a direction opposite to the measured inclined angle, and cropping the empty pixel region.

상기 CNN을 이용한 영상 수평 보정 방법에 있어서, 상기 (a) 단계는 바로 세워진 다수의 일반 영상에 대하여 회전된 영상과 그때의 회전된 각도를 레이블로 배정하여 학습 데이터셋을 생성하는 것을 특징으로 한다.In the image horizontal correction method using the CNN, in step (a), a learning data set is generated by assigning a rotated image and a rotated angle at that time to a plurality of general images set up as labels.

상기 CNN을 이용한 영상 수평 보정 방법에 있어서, 학습 데이터셋 생성을 위한 영상을 회전 시키기 전에 가로, 세로 길이의 1/2만큼 대칭적 패딩(symmetric padding)을 하는 것을 특징으로 한다.In the image horizontal correction method using CNN, symmetric padding is performed by 1/2 of the horizontal and vertical lengths before rotating the image for generating the learning data set.

상기 CNN을 이용한 영상 수평 보정 방법에 있어서, 상기 (a) 단계 이전에 상기 각도 측정 네트워크를 ImageNet 영상 분류 데이터셋으로 학습시킨 후, 상기 각도 측정 네트워크를 후반부에 위치한 소정 수의 레이어들의 가중치 파라미터들을 초기화 하는 것을 특징으로 한다.In the image horizontal correction method using the CNN, the angular measurement network may be initialized as an ImageNet image classification data set prior to the step (a), and then weight parameters of a predetermined number of layers located in the second half of the angle measurement network may be initialized .

상기의 목적을 달성하기 위하여, 본 발명에 의한 컴퓨터로 읽을 수 있는 기록 매체는 상기 CNN을 이용한 영상 수평 보정 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한다.In order to achieve the above object, a computer-readable recording medium according to the present invention records a program for causing a computer to execute an image horizontal correction method using the CNN.

상기의 목적을 달성하기 위하여, 본 발명에 의한 컴퓨터 프로그램은 상기 CNN을 이용한 영상 수평 보정 방법을 컴퓨터에서 실행시키기 위하여 매체에 저장된다.In order to accomplish the above object, a computer program according to the present invention is stored in a medium for executing an image horizontal correction method using the CNN in a computer.

본 발명에 의하면, 영상 수평 보정을 위한 영상의 기울어짐의 정도를 측정함에 있어서, 선분과 같이 기울어짐을 판단하기 위해 미리 정의된 특징이 존재하지 않는 영상이나, 영상의 기울어짐을 판단할 수 있는 정보가 특정 영역에만 존재하는 영상의 경우에도 그 정확성을 훨씬 높일 수 있게 된다. According to the present invention, in measuring the degree of tilting of an image for horizontal image correction, it is possible to detect a tilted image such as a line segment in which there is no predefined characteristic or information for determining tilting of the image It is possible to improve the accuracy even in the case of an image existing only in a specific region.

도 1은 수평이 맞지 않는 영상과 수평이 잘 맞는 영상의 예를 보여주는 도면이다.
도 2는 CNN을 이용한 영상 수평 보정 방법의 전체적인 구조를 보여주는 도면이다.
도 3은 영상 수평 보정 방법이 수행되는 과정을 보여주는 도면이다.
도 4는 2개의 레이어를 갖는 일반적인 CNN 구조(a)와 2개의 레이어에 대해 skip connection 구조가 추가된 레지듀얼 네터워크 구조(b)를 보여주는 도면이다.
도 5는 34 레이어로 구성된 일반 CNN 구조(a)와 residual network 구조(b)를 보여주는 도면이다.
도 6은 본 발명에 의한 기울어진 각도 측정을 위한 레지듀얼 네트워크 구조를 도시한 것이다.
도 7은 대칭적 패딩을 이용하여 회전된 영상을 만드는 과정을 보여주는 도면이다.
도 8은 한 영상에 대해 7가지 임의의 각도로 회전시킨 결과를 보여주는 도면이다.
도 9는 영상 내 주요 선 존재 여부를 판별하는 과정을 보여주는 도면이다.
도 10은 본 발명에 의한 CNN을 이용한 영상 수평 보정 결과를 보여주는 도면이다.
도 11은 기존 상용 기술 중 하나인 Adobe Lightroom 과 본 발명에 의한 방법의 결과를 비교하여 보여주는 도면이다.FIG. 1 is a view showing an example of an image in which a horizontal image matches a horizontal image.
FIG. 2 is a diagram showing the overall structure of an image horizontal correction method using CNN.
3 is a diagram illustrating a process of performing an image horizontal correction method.
FIG. 4 is a diagram showing a general CNN structure (a) having two layers and a residual network structure (b) having a skip connection structure added to two layers.
FIG. 5 is a diagram showing a general CNN structure (a) and a residual network structure (b) composed of 34 layers.
FIG. 6 illustrates a residual network structure for inclined angle measurement according to the present invention.
7 is a view illustrating a process of creating a rotated image using symmetric padding.
FIG. 8 is a view showing a result of rotating an arbitrary image at seven arbitrary angles.
9 is a diagram illustrating a process of determining whether a main line exists in an image.
FIG. 10 is a diagram illustrating a result of image horizontal correction using CNN according to the present invention.
FIG. 11 is a diagram showing a comparison between the result of the method of the present invention and Adobe Lightroom, which is one of existing commercial technologies.

이하에서, 첨부된 도면을 참조하면서 본 발명의 바람직한 실시예에 대하여 상세히 설명하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

<알고리즘 개요><Outline of Algorithm>

본 발명에서 제안하는 영상 수평 보정 기술의 전체 과정은 도 2와 같다. 기울어진 영상

에 대한 각도 측정 네트워크를

라 정의하면, 네트워크의 학습은 크게 세 단계로 구성된다.The entire process of the image horizontal correction technique proposed in the present invention is shown in FIG. Tilted image

Angle measurement network for

The network learning consists of three stages.

우선 학습할 네트워크

의 구조를 구성한다. 네트워크 구조는 비특허문헌 3에서 제안한 레지듀얼 네트워크(residual network)를 기반으로 하여 두 갈래로 나뉘어진 풀링(pooling) 레이어를 추가하여 새롭게 구성한다. First network to learn

. The network structure is newly constructed by adding a two-divided pooling layer based on the residual network proposed in Non-Patent Document 3. [

그 후 네트워크

을 학습시키기 위해 학습 데이터셋을 생성하는 것이 두 번째 단계인데, 바로 세워져 있다고 판단한 대규모 영상들을 임의의 각도로 회전시킨 후 회전된 영상과 그 때 회전시킨 각도를 레이블로 하여 학습 데이터셋을 구성한다. Then,

The second step is to generate a learning data set to train the learning data set. The learning data set is composed of the rotated image and the rotated angle as a label after rotating the large images determined to be erected at an arbitrary angle.

마지막으로 네트워크와 학습 데이터셋이 준비되면 학습 세팅을 달리해가며 네트워크

의 최적 파라미터

를 학습한다. 입력 영상이 네트워크의 입력으로 들어왔을 때 수행되는 과정은 도 3에서 표시된 과정을 거치게 된다.Finally, when the network and learning data sets are ready,

Optimal parameter of

. The process performed when the input image arrives at the input of the network goes through the process shown in FIG.

<네트워크 구조> <Network structure>

본 발명에서 사용한 네트워크 구조는 물체 인식에 뛰어난 효과를 보이는 레지듀얼 네트워크(residual network)에 기반하고 있다. The network structure used in the present invention is based on a residual network which has an excellent effect on object recognition.

기존의 CNN(Convolutional Neural Network) 구조들은 역전파(backpropagation) 기법을 이용하여 학습을 진행하는데 CNN 구조가 깊을수록 전 레이어의 그라디언트(gradient) 정도에 따라서 다음 레이어의 그라디언트 값이 지수적으로 작아지는 현상이 발생할 수 있다. 따라서 출력 레이어 근처의 그라디언트들은 값이 존재하지만, 입력 레이어에서는 그 값이 0에 가까워지는 문제가 발생해 학습이 더 이상 진행되지 않는 문제가 발생한다(gradient vanishing problem). Conventional CNN (Convolutional Neural Network) structures are learning using backpropagation technique. As the CNN structure becomes deeper, the gradient value of the next layer decreases exponentially according to the gradient of the previous layer Can occur. Therefore, the gradient near the output layer has a value, but at the input layer, the value becomes close to zero, and the problem is that the learning does not proceed any more (gradient vanishing problem).

레지듀얼 네트워크(residual network)는 이러한 gradient vanishing 문제를 해결하여 깊은 네트워크 구조의 효과적인 학습을 가능하게 한 구조이다. 이 네트워크에서는 skip connection 이라는 구조를 이용하여 입력 값을 몇 개의 레이어에 통과시킨 출력 값(도 4의 F(x))과 입력 값(도 4의 identity)의 합을 계산하여 결과 값을 얻을 수 있도록 구성한다. The residual network is a structure that solves this gradient vanishing problem and enables efficient learning of the deep network structure. In this network, the sum of the output value (F (x) in FIG. 4) and the input value (identity in FIG. 4) obtained by passing an input value through several layers using a skip connection structure is calculated .

도 4의 (a)가 기존의 일반적인 네트워크 구조인데, 이러한 구조일 때 두 레이어를 거친 후 매핑된 결과는

로 표현할 수 있다. 도 4의 (b)는 레지듀얼 네트워크(residual network) 구조인데, 2개의 레이어를 건너 뛴 skip connection을 활용한 것이 특징이다. 여기서

는

로 표현할 수 있다. 이러한 구성은 그라디언트를 계산하는데 있어 그 값이 너무 작아지지 않게 해주기 때문에 기존 CNN에서 발생되던 gradient vanishing 문제를 해결할 수 있다. 따라서 보다 많은 레이어의 모델을 잘 학습시킬 수 있고 적은 수의 레이어를 갖는 모델로도 높은 성능을 얻을 수 있다. 4 (a) shows a conventional network structure. In this structure, the result mapped after passing through two layers is

. 4B is a residual network structure, which is characterized in that a skip connection is used which skips two layers. here

The

. This configuration can solve the gradient vanishing problem that occurred in existing CNN because it does not become too small in calculating the gradient. Therefore, it is possible to learn the model of more layers well and obtain high performance even with a model with a small number of layers.

본 발명에서는 도 5의 (b)에 도시된 바와 같은 34개의 레이어로 구성된 레지듀얼 네트워크(residual network)를 사용한다. 일반적으로 레지듀얼 네트워크(residual network)는 입력 영상에 존재하는 물체 인식에 사용되기 때문에 입력 영상 전체의 정보를 한번에 이용하며, 이를 위한 구조가 도 5의 네트워크 구조의 마지막에 존재하는 하나의 애버리지 풀링(average pooling) 레이어다. In the present invention, a residual network composed of 34 layers as shown in FIG. 5 (b) is used. Generally, since the residual network is used for object recognition in the input image, information of the whole input image is used at once, and a structure for the residual network is used for one aver- age pooling average pooling layer.

도 6은 본 발명에서 입력 영상의 기울어진 각도를 측정하는 네트워크

의 구조를 보여준다. 이를 위해 사용한 네트워크 구조는 기존의 레지듀얼 네트워크(residual network) 구조를 기반으로 한다. FIG. 6 is a flowchart illustrating a method of measuring a tilt angle of an input image according to an exemplary embodiment of the present invention.

. The network structure used for this is based on the existing residual network structure.

그런데, 기존의 레지듀얼 네트워크(residual network) 구조에서 가장 마지막 풀링(pooling) 레이어는 최종 특징 맵을 1x1로 만들어 주는 역할을 하는데, 본 발명에서는 이 레이어를 두 개의 풀링(pooling) 레이어로 대체하여 특징 맵을 각각 1x1, 2x2로 만든 후 이를 결합한 벡터를 최종 결과 예측에 사용한다. 이를 통해 입력 영상의 다중 스케일 특징을 고려할 수 있게 된다. However, in the existing residual network structure, the last pooling layer serves to make the final feature map 1x1. In the present invention, this layer is replaced with two pooling layers We make maps 1x1 and 2x2, respectively, and combine them into final result prediction. This allows multi-scale features of the input image to be considered.

두 풀링(pooling) 레이어는 각각 1x1, 2x2 크기의 특징 맵을 출력으로 갖는데, 1x1 특징 맵은 영상 전체의 특징을 추출하고, 2x2 특징 맵은 각 셀(cell)이 영상을 동일한 크기로 4등분한 각 영역의 특징을 추출한다. 두 풀링(pooling) 레이어로부터 얻은 특징 벡터는 그 크기가 서로 다르기 때문에 선형 레이어를 통해 256 크기로 통일한 뒤, 두 벡터의 평균값을 취한다. 마지막으로 선형 레이어를 통해 영상의 최종 기울어짐 각도를 예측하도록 한다. 예측 정확도를 측정하는 로스(loss) 함수로는 L1 loss(absolute difference)를 사용한다.The two pooling layers have feature maps of 1x1 and 2x2, respectively. The 1x1 feature map extracts the features of the whole image, and the 2x2 feature map shows that each cell divides the image into four equal parts And extracts features of each region. Since the feature vectors obtained from two pooling layers are different in size, they are unified into a 256-size through a linear layer, and then the average of the two vectors is taken. Finally, the final tilt angle of the image is predicted through the linear layer. Loss (absolute difference) is used as a loss function to measure the prediction accuracy.

위와 같이 두 개의 (pooling) 레이어를 통해 추출한 다중 스케일의 특징을 이용할 경우, 1x1 특징 맵으로부터 얻은 정보는 영상 전체 기울어짐의 구조적인 정보를 표현하고, 2x2 특징 맵의 각 셀(cell)은 영상의 서로 다른 영역에 대한 특징 정보를 표현하는데, 이후의 fully connected 레이어를 통해 서로 다른 영역 간의 위치 관계가 분석되어 영상 기울어짐 측정에 도움이 될 수 있다. When the multi-scale features extracted from the two pooling layers are used, the information obtained from the 1x1 feature map represents the structural information of the whole image inclination, and each cell of the 2x2 feature map represents the image The feature information of different regions is expressed, and the positional relationship between the different regions is analyzed through the fully connected layer to help measure the image tilting.

예를 들어 사람의 얼굴 영상의 경우 얼굴 전체 외곽선의 기울어짐이 영상의 기울어짐에 대한 특징 정보가 될 수도 있지만, 양 눈 사이의 관계, 혹은 코와 입이 이루는 상대적인 위치 관계에 대한 정보가 영상의 기울어짐 측정에 단서가 될 수 있는 것이다. 이와 같이 다중 스케일 특징을 고려하는 경우, 선분과 같이 기울어짐을 판단하기 위해 미리 정의된 특징이 존재하지 않는 다양한 영상의 경우에도 많은 데이터를 통해 영상 영역간의 위치 관계를 학습함으로써 정확한 영상 기울어짐을 측정할 수 있다.For example, in the case of a human face image, the inclination of the entire outline of the face may be the feature information on the tilt of the image, but the information about the relationship between the eyes, or the relative positional relation between the nose and mouth, It can be a clue to the tilt measurement. In the case of considering multi-scale features as described above, accurate image tilting can be measured by learning the positional relationship between image regions through a large amount of data even in the case of various images in which there is no predefined characteristic in order to judge inclination as a line segment have.

더불어 2x2로 만들어진 특징 맵은 영상의 상하좌우 각 영역의 정보를 독립적으로 판단하기 때문에 영상의 기울어짐을 판단할 수 있는 정보가 특정 영역에만 존재할 경우에도 해당 정보가 존재하지 않는 영역에 의해 방해 받지 않으며, 영상 내 다양한 크기의 특징들을 동시에 학습할 수 있기 때문에 보다 강인하게 최종 기울어진 각도 측정 성능을 높일 수 있다.In addition, since the feature map made of 2x2 independently determines the information of each area of the upper, lower, right and left sides of the image, even if information capable of determining the tilting of the image exists only in the specific area, Since it is possible to simultaneously learn features of various sizes in an image, it is possible to enhance the tilt angle measurement performance more stably.

<네트워크 학습><Network Learning>

영상 기울어짐 측정 네트워크

을 학습하기 위해 우선 학습 데이터셋을 생성한다. 학습 데이터셋은 이미 바로 세워져 있는 영상들을 임의의 각도로 회전시킨 뒤 이때의 각도를 레이블로 사용하는 방식으로 생성한다. Image Tilt Measurement Network

The learning data set is generated first. The training data set is rotated by an arbitrary angle and then the angle is used as a label.

<데이터셋 생성><Create Dataset>

영상 기울어짐 측정 네트워크

을 학습하기 위한 영상 데이터셋을 만드는 과정은 도 7과 같다. 기울어지지 않고 바로 세워져 있는 영상에 대해 -20° ~ +20° 사이의 각도 7개를 무작위로 선택하여 그 각도만큼 회전시킨다. 이 때, 영상을 회전시키면 빈 픽셀 영역이 발생하게 되는데 이 부분을 채우기 위해 영상을 회전 시키기 전에 가로, 세로 길이의 1/2만큼 대칭적 패딩(symmetric padding)을 한다. 그 후 영상을 특정 각도만큼 회전시키고, 다시 원래 크기만큼 크롭하면 빈 픽셀 영역이 없는 사각형 꼴의 회전된 영상을 얻을 수 있다. Image Tilt Measurement Network

The process of creating the image data set for learning the image data is shown in FIG. For images that are not tilted, select seven random angles between -20 ° and + 20 ° and rotate them by that angle. In this case, when the image is rotated, an empty pixel area is generated. To fill the area, symmetric padding is performed by 1/2 of the horizontal and vertical length before the image is rotated. Thereafter, the image is rotated by a certain angle, and then the original image is cropped to obtain a rotated rectangular image without an empty pixel region.

최종 학습 데이터셋은 대칭적 패딩을 이용하여 얻은 회전된 영상과 그때의 회전된 각도를 레이블로 배정하여 구성한다. 한 영상에 대해 임의의 7가지 각도로 회전하여 얻은 데이터셋 예제는 도 8에서 보여진다. 회전되지 않은 원본 영상도 레이블을 0으로 하여 학습 데이터셋에 포함시킨다.The final training data set consists of the rotated images obtained using symmetric padding and the rotated angles at that time are assigned to labels. An example of a data set obtained by rotating at any 7 angles to an image is shown in FIG. The original image, which is not rotated, is also included in the learning data set with a label of 0.

기울어진 영상 데이터셋을 생성하기 위한 원본 영상 데이터셋은 비특허문헌 4에서 소개된 World Cities Dataset이다. Flickr 웹사이트로부터 40개의 주요 도시들의 지리학적 쿼리(geographic query)를 이용하여 수집한 220만장의 영상들로 구성되어 있는데, 본 발명에서는 이 중 일부인 21,994장을 학습 데이터셋, 1,000장은 검증 데이터셋으로 사용하였다. World Cities Dataset의 영상들 중에는 기울어져 찍힌 영상들도 소수 포함되어 있으나, 대부분은 바로 세워져 있기 때문에 학습 시 전체적인 경향에 큰 영향이 없을 것이라 가정하였고, 실제로 실험 결과를 통해 동작이 잘 됨을 확인할 수 있었다. 검증 데이터셋은 직접 눈으로 확인하여 기울어져 있는 영상들은 제외시켰다(865장).The original image data set for generating the tilted image data set is the World Cities Dataset introduced in Non-Patent Document 4. It consists of 2.2 million images collected using a geographic query from 40 major cities from the Flickr website. In the present invention, 21,994 are part of the learning data set and 1,000 are the verification data sets Respectively. Although the images of World Cities Dataset include a small number of tilted images, most of them are assumed to have no significant influence on the overall trend of learning because they are built right up. Verification datasets were directly visualized and skewed (865).

본 발명에서는 선분과 같이 기울어짐을 판단하는데 도움이 되는 정보가 부족한 영상에 대해서도 강건하게 동작하는 기울어짐 보정을 목표로 하였기 때문에 학습 데이터셋 내에 주요 선이 존재하는 영상(건물, 나무 등) 외에도 주요 선이 존재하지 않는 영상(인물, 자연 풍경 등)도 충분히 포함되도록 각 그룹의 비율을 조정하였다. 이를 위해 영상에서 찾은 선분을 클러스터링 한 뒤 선분의 길이의 합이 가장 긴 클러스터의 중심을 주요 선분으로 검출했을 때, 만약 검출된 주요 선이 영상의 높이, 너비 중 짧은 쪽의 1/3 보다 짧다면 이 영상에서 주요 선은 존재하지 않는다고 가정한다(도 9 참조). 이를 통해 최종 학습 데이터셋을 주요 선이 존재하는 영상과 주요 선이 존재하지 않는 영상을 각각 2:1 비율로 구성하였다.In the present invention, since it is aimed at correcting skewing which operates robustly even for images lacking information to help judge skewing such as line segments, in addition to images (buildings, trees, etc.) in which main lines exist in the learning data set, And the ratio of each group was adjusted so that the non-existent images (portrait, natural scenery, etc.) were sufficiently included. For this purpose, when the center of the cluster with the longest sum of the lengths of the segments is detected as the main segment after clustering the segments found in the image, if the detected main line is shorter than 1/3 of the height and width of the image It is assumed that no main line exists in this image (see FIG. 9). The final learning data set consisted of 2: 1 ratio images with main lines and no main lines.

<영상 기울어짐 측정 네트워크 학습><Learning the image tilt measurement network>

학습 네트워크는 ImageNet 영상 분류 데이터셋에 대해 미리 학습된 레지듀얼 네트워크(residual network) 모델을 사용하였다. ImageNet 분류 문제는 대상의 기울어짐에 대해 강건하게 분류할 수 있도록 네트워크가 학습된다. 따라서 미세한 기울어짐 정보를 구분할 수 있어야 하는 본 알고리즘의 목적과는 반대된다고 할 수 있다. 이를 보완하기 위해 네트워크의 후반부 레이어들(9개의 residual block)의 가중치 파라미터들을 초기화 한 후, 앞에서 설명한 학습 데이터셋으로 다시 학습하였다. 이를 통해 네트워크의 초반부 레이어들에서는 ImageNet 영상 분류 데이터셋으로부터 학습된 영상의 저수준 특징을 추출하고, 후반부 레이어들에서는 기울어짐을 측정할 수 있는 구조적 정보나 영역 간의 상대적 위치 관계 등이 새롭게 학습되도록 유도하였다.The learning network used a pre-learned residual network model for the ImageNet image classification data set. The ImageNet classification problem is learned by the network so that it can be robustly categorized against the object's tilt. Therefore, it is contrary to the purpose of this algorithm that it is necessary to distinguish fine tilting information. In order to compensate for this, we initialized the weight parameters of the remaining layers (nine residual blocks) of the network, and then learned again with the learning data set described above. In the early layers of the network, low-level features of the learned image are extracted from the ImageNet image classification data set, and structural information for measuring the tilt in the latter layers and the relative positional relationship between the regions are newly learned.

<결과 영상 및 정량적 평가><Results and Quantitative Evaluation>

본 발명은 CNN을 이용한 영상 수평 보정 방법을 제안한다. 학습 과정을 통해 생성한 네트워크

은 입력 영상의 기울어진 각도를 측정하기 위해 사용된다. 이렇게 얻은 각도와 반대 방향으로 입력 영상

를 회전시키면 기울어짐이 보정된 결과 영상을 생성할 수 있다. 도 10은 다양한 입력 영상을 본 발명의 프레임워크에 적용시킨 결과이다. 본 방법의 결과는 영상 내 직선이 많인 경우뿐만 아니라 주요 선이 존재하지 않는 복잡한 장면에 대해서도 잘 동작하는 것을 확인할 수 있다. 또한 배경이 어두운 영상에 대해서도 강인하게 동작한다. The present invention proposes an image horizontal correction method using CNN. Network created through learning process

Is used to measure the tilted angle of the input image. In this case,

It is possible to generate an image having a corrected skew. 10 is a result of applying various input images to the framework of the present invention. The results of this method show that the proposed method works well not only for a large number of straight lines but also for complicated scenes where no main line exists. Also, it operates robustly against dark background images.

기울어짐 측정 네트워크

의 정확도를 정량적으로 평가하고, 영상의 기울어진 정도에 따라 측정 정확도가 어떻게 변하는지 평가하기 위해 865장의 다양한 영상으로 이루어진 검증 데이터셋을 다섯 가지 각도(3°, 5°, 10°, 15°, 20°)에 대해 (+, -) 방향으로 회전한 후, 각 기울어진 각도에 대해 정확도를 측정하였다. 결과는 표 1에서 보여진다. Tilt Measurement Network

(3 °, 5 °, 10 °, 15 °, and 90 °) were used to quantitatively evaluate the accuracy of the image and to evaluate how the measurement accuracy varies according to the degree of tilt of the image. 20 °), and then the accuracy was measured for each tilted angle. The results are shown in Table 1.

표 1에서의 오차는 본 방법을 통해 측정된 각도와 GT(ground truth) 각도의 차이 값의 평균이다. 실험 결과, 다섯 가지 경우 모두 1° 내외의 오차를 보일 정도로 높은 정확도를 갖는 것을 확인 할 수 있다.The error in Table 1 is the average of the difference between the angle measured by this method and the ground truth angle. As a result of the experiment, it can be confirmed that all five cases have high accuracy to show an error of about 1 °.

<네트워크 구조에 따른 성능 비교><Performance comparison according to network structure>

본 발명에서 제안한 기울어짐 측정 네트워크

의 구조는 영상의 다중 스케일 특징을 동시에 고려하기 때문에 단일 스케일 특징만을 고려한 네트워크 구조에 비해 높은 성능을 얻을 수 있다. 이를 확인하기 위해 단일 스케일 특징만을 고려하는 네트워크를 동일한 조건에서 학습한 후 다중 스케일 네트워크와의 성능 비교를 수행하였다. 단일 스케일 네트워크 구조는 도 6에서 보여지는 다중 스케일 네트워크 구조에서 2개의 pooling 레이어를 1개의 pooling 레이어로 대체하여 512x1x1의 특징맵을 생성한 후 선형 레이어를 통해 512 크기의 특징을 거쳐 최종 기울어짐 각도를 측정하도록 구성하였다. 이를 본 알고리즘과 동일한 데이터셋으로 동일한 epoch 만큼 학습하고, 결과를 비교하였다. 전체 검증 데이터셋에 대해 단일 스케일, 다중 스케일 네트워크 각각을 이용하여 기울어짐 각도를 측정한 뒤 GT(ground truth) 각도와의 평균 오차를 측정, 학습 epoch가 진행됨에 따라 두 네트워크의 성능 변화를 비교하였다. The inclination measurement network proposed by the present invention

Since the multi-scale feature of the image is considered at the same time, the high performance can be obtained compared with the network structure considering only the single-scale feature. In order to verify this, a network considering only a single - scale feature is learned under the same conditions and performance comparison with a multi - scale network is performed. In the multi-scale network structure shown in FIG. 6, a single-scale network structure is created by replacing two pooling layers with one pooling layer to generate a 512x1x1 feature map, and then, through a linear layer, a final tilt angle Respectively. We learned the same data sets and learned the same epoch and compared the results. We measured the tilt angles using a single-scale and multi-scale networks for the entire validation dataset, measured the mean error with the GT (ground truth) angle, and compared the performance changes of the two networks as the learning epoch progressed .

비교 결과는 [표 2]와 같다. The comparison results are shown in [ Table 2 ] .

비교 결과, 실험 초반 epoch에서 단일 스케일 네트워크가 더 높은 정확도를 보이는 현상이 관찰되었는데, 이는 다중 스케일 네트워크의 경우 단일 스케일 네트워크에 비해 학습해야 할 특징의 종류, 파라미터의 개수가 더 많기 때문이다. 그러나 학습이 계속 진행됨에 따라 다중 스케일 네트워크의 파라미터 학습이 충분히 이루어지고 결과적으로 더 높은 정확도를 보이는 것을 확인할 수 있다.As a result of comparison, single-scale network showed higher accuracy in early epoch, because multi-scale network has more kinds of features and parameters to learn than single-scale network. However, as the learning progresses, the parameter learning of the multi-scale network is sufficiently performed, and as a result, higher accuracy is obtained.

<데이터셋 구성에 따른 결과 비교><Comparison of results according to dataset configuration>

딥러닝 기반의 영상 분석 알고리즘은 네트워크 학습 시 사용하는 데이터셋을 어떻게 구성하느냐에 따라 학습 결과가 달라질 수 있다. 본 장에서는 학습 데이터셋을 세 가지 경우로 다르게 구성하여 네트워크

를 학습했을 때 검증 데이터셋에 대한 기울임 측정 정확도가 어떻게 차이 나는지 비교한다. 이 실험에 사용한 네트워크 구조는 앞에서 설명한 다중 스케일 네트워크이다.Deep learning-based image analysis algorithms can vary depending on how the data set used in network learning is constructed. In this chapter, the learning data set consists of three different cases,

We compare how the accuracy of the tilt measurement for the verification dataset differs. The network structure used in this experiment is the multi-scale network described above.

첫 번째는 주요 선이 존재하는 영상들(건물, 나무 등이 존재하는 영상들) 만으로 학습 데이터셋을 구성한 경우이다. 학습 데이터셋 구성 시, 도 9를 통해 설명한 영상 내 주요 선 존재 여부 판별 알고리즘을 이용하여 주요 선이 존재하는 영상들로만 구성된 데이터셋으로 네트워크

를 학습하였다. 이렇게 학습된 모델은 선분 등의 구조적 특징을 갖고 있는 영상들에 대해서는 잘 동작할 것으로 예상할 수 있다. The first is a case where a learning data set is composed of images (buildings, trees, etc.) in which main lines exist. In the learning data set configuration, a data set having only main lines existing using the main line existence determination algorithm in FIG. 9,

Respectively. The learned model can be expected to work well for images with structural features such as line segments.

두 번째는 주요 선이 존재하지 않는 영상들(인물 중심 또는 자연 풍경 배경의 영상들) 만으로 데이터셋을 구성한 경우이다. 이 역시 주요 선 존재 여부 판별 알고리즘을 사용해 주요 선이 존재하지 않는 영상들로만 구성된 학습 데이터셋을 생성하여 네트워크

를 학습하였는데, 이 모델은 선이 존재하는 영상들에 대해서는 잘 작동하지 않을 것으로 예상할 수 있다. The second is a case where a dataset is composed only of images (no person-centered or natural scenery background images) in which no main line exists. In this case, it is also possible to generate a learning data set composed only of images in which there are no main lines by using the main line existence discrimination algorithm,

, Which can be expected to work well for images with lines.

마지막으로 주요 선이 존재하는 영상과 주요 선이 존재하지 않는 영상을 각각 2:1로 구성하여 네트워크

를 학습하였다.Finally, the image with the main line and the image without the main line are each composed of 2: 1,

Respectively.

실험 결과, 표 3에서 볼 수 있듯이 주요 선이 존재하는 영상과 존재하지 않는 영상을 모두 포함시킨 원본 데이터 셋이 가장 정확도가 높은 것으로 나타났다. 이는 영상으로부터 얻을 수 있는 선분 정보만 이용하기보다 선분이 없는 일반적인 영상에서 얻을 수 있는 다양한 피사체의 특징으로부터 얻는 추가적인 정보를 활용해 기울어짐을 측정할 때, 전체 검증 데이터 셋에 대한 정확도가 높아진다는 것을 의미한다. 또한 주요 선분이 존재하거나 하지 않는 학습 데이터 셋으로 학습한 두 네트워크 모두 선분이 존재하는 검증 데이터 셋에 대한 정확도가 선분이 존재하지 않는 데이터 셋에 비해 높았는데, 이는 단일 정보로 사용될 때 기울어진 선분 정보가 일반적인 피사체의 기울어짐 정보보다 영상의 기울어짐을 측정하기에 유리하다는 것을 뜻한다.As shown in Table 3, the original data set including both the main line image and the non-existent image was found to be the most accurate. This means that the accuracy of the entire validation data set is enhanced when the tilt is measured by using additional information from various object features that can be obtained from a general image without a line segment, rather than using only the line information obtained from the image do. In addition, the accuracy of the validation dataset with segmentation is higher than that of the dataset with no segmentation in both networks with learning data set with or without major segmentation, Means that it is advantageous to measure the tilt of the image rather than the tilt information of a general subject.

<기존 상용 기술과의 비교><Comparison with existing commercial technology>

마지막으로 본 발명에서 제안한 방법과 기존의 상용 사진 기술어짐 보정 기술의 결과를 비교하였다. 본 방법과 비교한 기존 상용 기술은 Adobe Lightroom CC 2015에 내장된 기능이다. Finally, we compared the results of the proposed method with the conventional commercial photography technique. The existing commercial technology compared to this method is a feature built into Adobe Lightroom CC 2015.

비교 결과는 도 11에서 보여진다. 첫 번째 예제 영상은 영상 내에 다수의 건물들이 위치하고 있기 때문에 선 검출이 용이하다. 그 결과 본 발명의 방법뿐만 아니라 기존 기술의 수행 결과 모두 기울어짐 보정이 잘 되는 것을 확인 할 수 있다. 반면 두 번째 예제 영상은 배경의 수평선을 중심으로 회전시켜야 할 지 주요 물체에 맞춰 회전시켜야 할 지 모호하다. 본 방법은 배경의 수평선을 기준으로 보정을 하여 GT 영상과 비슷한 결과를 얻은 반면 기존 기술은 결과가 좋지 않다. 나머지 예제 영상들의 경우 영상 내 주요 선이 존재하지 않거나 검출하기 어려운 영상으로, 기존 기술의 경우 제대로 동작하지 않는 반면 본 방법은 사진의 피사체인 인물 또는 동물이 똑바로 세워지도록 기울어짐 보정을 잘 수행하는 것을 확인할 수 있다.The comparison result is shown in FIG. The first example image is easy to detect because many buildings are located in the image. As a result, it can be confirmed that the inclination correction is performed well both in the method of the present invention and in the result of the performance of the existing technique. On the other hand, it is unclear whether the second example image should be rotated around the horizontal line of the background or rotated to the main object. This method is similar to GT image by correcting based on the horizontal line of the background, but the existing technique is not good. In the case of the remaining example images, there is no main line in the image or the image is difficult to detect. In the case of the conventional technique, the method does not work properly. However, the method performs the skew correction well so that the person or animal Can be confirmed.

한편, 상술한 본 발명의 실시예는 개인용 컴퓨터를 포함한 범용 컴퓨터에서 사용되는 매체에 기록될 수 있다. 상기 매체는 마그네틱 기록매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독매체(예를 들면, 씨디롬, 디브이디 등) 및 전기적 기록매체(예를 들면, 플레쉬 메모리, 메모리 스틱 등)와 같은 기록매체를 포함한다.Meanwhile, the above-described embodiments of the present invention can be recorded in a medium used in a general-purpose computer including a personal computer. The medium may be a magnetic recording medium (e.g., ROM, floppy disk, hard disk, etc.), an optical reading medium (e.g. CD ROM, And the like.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예는 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to the preferred embodiments. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The disclosed embodiments should, therefore, be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

At the end of the network structure,
A first pooling layer having a 1x1 feature map as an output;
A second pooling layer having a 2x2 feature map as an output; And
And an angle predicting unit for combining the feature vector obtained from the first pooling layer and the feature vector obtained from the second pooling layer and using the combined feature vector as a final result prediction.

The apparatus of claim 1, wherein the angle estimator
Wherein the average of the two feature vectors having the same size is calculated and used as a final result prediction.

(a) an angle measurement network according to the residual network structure of (1)

Generating a learning data set for learning the training data set;
(b) the angular measurement network

Optimal parameter of

; And
(c) for the input image,

And rotating the image in the opposite direction by the measured tilted angle and cropping the blank pixel region.

4. The method of claim 3, wherein step (a)
And a learning data set is generated by assigning a rotated image and a rotated angle at that time to a plurality of general images set up immediately.

5. The method as claimed in claim 4, wherein symmetric padding is performed by 1/2 of the horizontal and vertical lengths before rotating the image for generating the learning data set.

4. The method of claim 3, wherein before step (a)
The angle measurement network

To an ImageNet image classification data set,

Wherein the weighting parameters of a predetermined number of layers located in the second half of the image are initialized.

A computer-readable recording medium having recorded thereon a program for causing a computer to execute an image horizontal correction method using the CNN according to any one of claims 3 to 6.

A computer program stored in a medium for executing in a computer an image horizontal correction method using the CNN of any one of claims 3 to 6.