KR20220118010A

KR20220118010A - Learning apparatus and learning method for shadow area detection

Info

Publication number: KR20220118010A
Application number: KR1020210021585A
Authority: KR
Inventors: 김주완; 장인성
Original assignee: 한국전자통신연구원
Priority date: 2021-02-18
Filing date: 2021-02-18
Publication date: 2022-08-25

Abstract

A learning device for shadow area detection infers a shadow image for an input learning image using a weight and bias value set in a neural network learning model to output an inferred shadow mask image; calculates a pixel-by-pixel error between the inferred shadow mask image and the correct answer shadow mask image for the learning image; calculates a gradient value representing a degree of change with each neighboring pixel for each pixel from the inferred shadow mask image; calculates a pixel-by-pixel gradient error between the pixel-by-pixel gradient value calculated from the inferred shadow mask image and the pixel-by-pixel gradient value of the correct answer shadow mask image; and adjusts the weight and bias value set in the neural network learning model using the pixel-by-pixel error and the pixel-by-pixel gradient error. Accordingly, a more detailed and accurate shadow area can be detected.

Description

Learning device and learning method for shadow area detection

본 발명은 그림자 영역 검출을 위한 학습 장치 및 학습 방법에 관한 것으로, 보다 상세하게는 도시 모델과 같은 복잡한 영상에서 기계학습을 통해 보다 세밀하고 정확하게 그림자 영역을 검출할 수 있는 그림자 영역 검출을 위한 학습 장치 및 학습 방법에 관한 것이다. The present invention relates to a learning apparatus and a learning method for detecting a shadow area, and more particularly, to a learning apparatus for detecting a shadow area that can detect a shadow area more precisely and precisely through machine learning in a complex image such as an urban model. and learning methods.

스마트 시티, 디지털 트윈 등 영상을 기반으로 다양한 가상 도시 모델을 제작하는 과정에서 현실감을 높이기 위하여 항공영상, 드론영상 등 카메라를 이용하여 획득한 영상을 활용하여 가상 도시 모델을 제작하고 있다. 그러나 획득된 영상에 포함된 태양에 의해 발생되는 그림자 부분은 만들어진 가상 도시 모델의 현실감을 높여주는 장점이 있지만, 시간, 기상 등 다양한 변화를 부여하는 과정에 장애물이 되기도 한다. In the process of producing various virtual city models based on images such as smart cities and digital twins, in order to increase the sense of reality, virtual city models are produced using images acquired using cameras such as aerial images and drone images. However, the shadow part generated by the sun included in the acquired image has the advantage of increasing the realism of the created virtual city model, but it also becomes an obstacle in the process of giving various changes such as time and weather.

가상 도시 모델을 제작하는 과정에서 그림자 크기가 가장 작은 정오시간을 중심으로 촬영을 진행하기도 하나, 그림자를 완전히 제거하기는 어렵다.In the process of creating a virtual city model, shooting is carried out around noon, when the shadow size is the smallest, but it is difficult to completely remove the shadow.

최근 CNN(Convolution Neural Network)을 활용한 딥러닝 기술이 발전하면서, 학습 기반으로 그림자를 검출하고 제거하는 방법에 대한 연구가 활발하게 진행되고 있다. Recently, as deep learning technology using CNN (Convolution Neural Network) has been developed, research on a method of detecting and removing shadows based on learning is being actively conducted.

현재 방법은 CNN으로 구성된 학습 네트워크(신경망)를 구성하고, 그림자가 있는 영상과 그림자에서 마스크만을 가진 정답 영상을 입력으로 학습 네트워크에 대한 학습을 진행한다. The current method constructs a learning network (neural network) composed of CNNs, and trains the learning network by inputting an image with a shadow and an image with only a mask in the shadow as input.

학습 성능을 결정하는 지표로 인공지능 학습에서 추론된 값과 사용자가 원하는 정답값과의 차이를 계산하는 방법인 손실함수가 중요하다. 즉, 인공지능의 학습은 손실함수의 값이 최소가 되도록 가중치와 편향값(bias value)을 가지게 되므로, 손실함수는 성능에 많은 영향을 미치는 중요한 요소이다. As an index that determines learning performance, the loss function, which is a method of calculating the difference between the value inferred from artificial intelligence learning and the correct answer value desired by the user, is important. That is, the learning of artificial intelligence has weights and bias values so that the value of the loss function is minimized, so the loss function is an important factor that greatly affects the performance.

현재 그림자 영역 검출에 사용되는 손실함수는 학습에 의해 추론된 그림자 마스크와 정답 그림자 마스크로부터 각각 픽셀 단위로 그림자 여부를 판정하고, 판정된 값으로부터 픽셀 단위로 오차를 계산한다. 영상에 포함된 그림자를 제거하기 위해서는 정확한 그림자 영역을 검출하는 것이 선행되어야 한다. 그러나 그림자는 영역 형태로 만들어지는 특성이 있기 때문에 단순히 픽셀단위의 값을 비교하는 것으로는 그림자 영역의 경계를 판정하기에는 한계가 있다. The loss function currently used to detect the shadow region determines whether or not there is a shadow in units of pixels from the shadow mask inferred by learning and the correct shadow mask, respectively, and calculates an error in units of pixels from the determined value. In order to remove the shadow included in the image, it is necessary to detect the correct shadow area first. However, since shadows are made in the form of regions, there is a limit to determining the boundaries of shadow regions by simply comparing the values in pixel units.

본 발명이 해결하려는 과제는 그림자 영역을 보다 세밀하고 정확하게 추론할 수 있도록 하는 그림자 영역 검출을 위한 학습 장치 및 방법을 제공하는 것이다.SUMMARY OF THE INVENTION An object of the present invention is to provide a learning apparatus and method for detecting a shadow region that can infer the shadow region more precisely and more precisely.

본 발명의 한 실시 예에 따르면, 그림자 영역 검출을 위한 학습 장치에서 그림자 영역을 검출하는 신경망 학습 모델을 학습하는 방법이 제공된다. 상기 학습 방법은 상기 신경망 학습 모델에서 설정된 가중치와 편향값을 이용하여 입력된 학습영상에 대한 그림자 영상을 추론하여 추론 그림자 마스크 영상을 출력하는 단계, 상기 추론 그림자 마스크 영상과 상기 학습 영상에 대한 정답 그림자 마스크 영상간 픽셀별 오차를 계산하는 단계, 상기 추론 그림자 마스크 영상으로부터 픽셀별로 각 주변 픽셀과의 변화 정도를 나타내는 그래디언트 값을 계산하는 단계, 상기 추론 그림자 마스크 영상으로부터 계산된 픽셀별 그래디언트 값과 상기 정답 그림자 마스크 영상의 픽셀별 그래디언트 값간 픽셀별 그래디언트 오차를 계산하는 단계, 그리고 상기 픽셀별 오차와 상기 픽셀별 그래디언트 오차를 이용하여 상기 신경망 학습 모델에 설정된 가중치와 편향값을 조정하는 단계를 포함한다. According to an embodiment of the present invention, there is provided a method for learning a neural network learning model for detecting a shadow region in a learning apparatus for detecting a shadow region. The learning method includes outputting an inferred shadow mask image by inferring a shadow image for an input learning image using weights and bias values set in the neural network learning model, and the inferred shadow mask image and the correct shadow for the learning image calculating an error for each pixel between mask images, calculating a gradient value representing a degree of change from the inferred shadow mask image to each neighboring pixel for each pixel from the inferred shadow mask image, and the gradient value for each pixel calculated from the inferred shadow mask image and the correct answer Calculating a gradient error for each pixel between gradient values for each pixel of the shadow mask image, and adjusting the weight and bias values set in the neural network learning model by using the pixel-by-pixel error and the pixel-by-pixel gradient error.

본 발명의 실시 예에 의하면, 기계학습을 통해 도시모델과 같은 복잡한 영상에서 그림자의 경계를 정확하게 찾아내기 위해 손실함수 계산에 있어 그림자 경계 분야에 대한 그래디언트 성분을 포함함으로써 보다 세밀하고 정확한 그림자 영역을 검출할 수 있다.According to an embodiment of the present invention, a more detailed and accurate shadow region is detected by including a gradient component for the shadow boundary field in the loss function calculation in order to accurately find the shadow boundary in a complex image such as a city model through machine learning. can do.

도 1은 본 발명의 실시 예에 따른 그림자 영역 검출을 위한 학습 장치를 나타낸 도면이다.
도 2는 본 발명의 실시 예에 따른 그림자 영역 검출을 위한 학습 방법을 나타낸 흐름도이다.
도 3은 본 발명의 실시 예에 따른 다중 스케일 기반 그래디언트 오차 계산 설명하는 도면이다.
도 4는 본 발명의 다른 실시 예에 따른 그림자 영역 검출을 위한 학습 장치를 나타낸 도면이다. 1 is a diagram illustrating a learning apparatus for detecting a shadow region according to an embodiment of the present invention.
2 is a flowchart illustrating a learning method for detecting a shadow region according to an embodiment of the present invention.
3 is a diagram for explaining multi-scale-based gradient error calculation according to an embodiment of the present invention.
4 is a diagram illustrating a learning apparatus for detecting a shadow region according to another embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in various different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 및 청구범위 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification and claims, when a part "includes" a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated.

이제 본 발명의 실시 예에 따른 그림자 영역 검출을 위한 학습 장치 및 학습 방법에 대하여 도면을 참고로 하여 상세하게 설명한다.Now, a learning apparatus and a learning method for detecting a shadow region according to an embodiment of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시 예에 따른 그림자 영역 검출을 위한 학습 장치를 나타낸 도면이다.1 is a diagram illustrating a learning apparatus for detecting a shadow region according to an embodiment of the present invention.

도 1을 참고하면, 그림자 영역 검출을 위한 학습 장치(100)는 신경망 학습 모델(110) 및 손실 계산부(120)를 포함한다. 그림자 영역 검출을 위한 학습 장치(100)는 학습 데이터 데이터베이스(database, DB)(130)를 더 포함할 수 있다.Referring to FIG. 1 , the learning apparatus 100 for detecting a shadow region includes a neural network learning model 110 and a loss calculator 120 . The learning apparatus 100 for detecting a shadow region may further include a learning data database (DB) 130 .

학습 데이터 DB(130)는 학습 데이터가 저장되어 있다. 학습 데이터는 그림자가 포함된 입력 영상(x)과 입력 영상(x)에 대한 정답 값(ground truth)인 정답 그림자 마스크 영상(GT_pixel), 정답 그림자 마스크 영상(GT_pixel)의 픽셀별 그래디언트 값을 포함한 그래디언트 그림자 마스크 데이터(GT_gard)로 구성된다. The learning data DB 130 stores learning data. The training data is the input image (x) with shadow, the correct answer shadow mask image (GT_pixel) that is the correct answer value (ground truth) for the input image (x), and the gradient including the gradient value for each pixel of the correct answer shadow mask image (GT_pixel) Consists of shadow mask data (GT_gard).

그림자 경계에 대한 정보를 제공하기 위하여, 정답 그림자 마스크 영상(GT_pixel)에서 픽셀 단위로 각 주변 픽셀과의 변화 정도를 나타내는 그래디언트 값이 계산되고, 그래디언트 그림자 마스크 데이터(GT_gard)는 정답 그림자 마스크 영상(GT_pixel)으로부터 계산된 픽셀별 그래디언트 값을 포함한다. In order to provide information on the shadow boundary, a gradient value indicating the degree of change with each neighboring pixel in units of pixels is calculated in the correct shadow mask image (GT_pixel), and the gradient shadow mask data (GT_gard) is the correct shadow mask image (GT_pixel) ), including the gradient value for each pixel calculated from

신경망 학습 모델(110)의 학습을 위해 입력 영상(x)은 신경망 학습 모델(110)로 입력되고, 정답 그림자 마스크 영상(GT_pixel) 및 그래디언트 그림자 마스크 데이터(GT_gard)는 손실 계산부(120)로 입력된다. For learning of the neural network learning model 110 , the input image (x) is input to the neural network learning model 110 , and the correct shadow mask image (GT_pixel) and the gradient shadow mask data (GT_gard) are input to the loss calculator 120 . do.

신경망 학습 모델(110)은 각각의 가중치와 편향값을 이용하여 입력 영상(x)으로부터 그림자 영역을 추론하여 추론 그림자 마스크 영상(Y_pixel)을 생성하고, 추론 그림자 마스크 영상(Y_pixel)을 출력한다. The neural network learning model 110 generates an inferred shadow mask image (Y_pixel) by inferring a shadow region from the input image (x) using respective weights and bias values, and outputs the inferred shadow mask image (Y_pixel).

손실 계산부(120)는 제1 오차 계산부(122), 그래디언트 계산부(124), 제2 오차 계산부(126) 및 통합 오차 계산부(128)를 포함한다. The loss calculator 120 includes a first error calculator 122 , a gradient calculator 124 , a second error calculator 126 , and an integrated error calculator 128 .

제1 오차 계산부(122)는 제1 손실 함수를 이용하여 신경망 학습 모델(110)로부터 출력된 추론 그림자 마스크 영상(Y_pixel)과 정답 그림자 마스크 영상(GT_pixel)간 픽셀별 오차(E)를 계산한다. 제1 손실 함수는 픽셀별 오차(E)를 구하는 함수이며, 픽셀별 오차(E)는 수학식 1과 같이 제1 손실 함수를 이용하여 계산될 수 있다. The first error calculator 122 calculates an error E for each pixel between the inferred shadow mask image Y_pixel output from the neural network learning model 110 and the correct shadow mask image GT_pixel by using the first loss function. . The first loss function is a function for obtaining an error E for each pixel, and the error E for each pixel may be calculated using the first loss function as shown in Equation (1).

제1 오차 계산부(122)는 픽셀별 오차(E)를 이용하여 픽셀별 오차(E)에 대한 RMSE(Root Mean Square Error)를 계산하고, 계산한 픽셀별 오차(E)에 대한 RMSE 값을 통합 오차 계산부(128)로 전달한다. The first error calculator 122 calculates a Root Mean Square Error (RMSE) for the error E for each pixel by using the error E for each pixel, and calculates the RMSE value for the calculated error E for each pixel. It is transmitted to the integrated error calculation unit 128 .

그래디언트 계산부(124)는 신경망 학습 모델(110)로부터 출력된 추론 그림자 마스크 영상(Y_pixel)으로부터 픽셀별 그래디언트 값(Y_gard)을 계산하고, 계산된 픽셀별 그래디언트 값(Y_gard)을 제2 오차 계산부(126)로 전달한다. The gradient calculator 124 calculates a gradient value Y_gard for each pixel from the inferred shadow mask image Y_pixel output from the neural network learning model 110, and calculates the gradient value Y_gard for each pixel as a second error calculator. forward to (126).

제2 오차 계산부(126)는 제2 손실 함수를 이용하여 추론 그림자 마스크 영상으로부터 계산된 픽셀별 그래디언트 값(Y_gard)과 정답 그림자 마스크 영상(GT_pixel)의 픽셀별 그래디언트 값(GT_gard)간 그래디언트 오차(E_g)를 계산한다. 제2 손실 함수는 픽셀별 그래디언트 오차(E_g)를 구하는 함수이며, 픽셀별 그래디언트 오차(E_g)는 수학식 2와 같이 제2 손실 함수를 이용하여 계산될 수 있다. The second error calculation unit 126 is a gradient error between the gradient value (Y_gard) for each pixel calculated from the inferred shadow mask image using the second loss function and the gradient value for each pixel (GT_gard) of the correct shadow mask image (GT_pixel) ( E_g) is calculated. The second loss function is a function for calculating the gradient error E_g for each pixel, and the gradient error E_g for each pixel may be calculated using the second loss function as shown in Equation (2).

제2 오차 계산부(126)는 픽셀별 그래디언트 오차(E_g)를 이용하여 픽셀별 그래디언트 오차(E_g)에 대한 RMSE를 계산하고, 계산한 픽셀별 그래디언트 오차(E_g)에 대한 RMSE 값을 통합 오차 계산부(128)로 전달한다. The second error calculator 126 calculates the RMSE for the gradient error E_g per pixel by using the gradient error E_g per pixel, and calculates the integration error by calculating the RMSE value for the gradient error E_g per pixel. forwarded to unit 128 .

통합 오차 계산부(128)는 픽셀별 오차(E)에 대한 RMSE 값과 픽셀별 그래디언트 오차(E_g)에 대한 RMSE 값을 합산하여 최종 손실 값(Loss)을 계산한다. 이때 최종 손실 값(Loss)은 수학식 3과 같이 픽셀별 오차(E)에 대한 RMSE 값과 픽셀별 그래디언트 오차(E_g)에 대한 RMSE 값에 각각 설정 비율을 반영한 후 합산함으로써 계산될 수 있다. The integration error calculator 128 calculates a final loss value (Loss) by summing the RMSE value for the pixel-by-pixel error (E) and the RMSE value for the pixel-by-pixel gradient error (E_g). In this case, the final loss value (Loss) can be calculated by reflecting the set ratio to the RMSE value for the pixel-specific error (E) and the RMSE value for the pixel-specific gradient error (E_g), respectively, and then summing them as shown in Equation (3).

여기서, E'는 픽셀별 오차(E)에 대한 RMSE 값을 나타내고, E_g'는 픽셀별 그래디언트 오차(E_g)에 대한 RMSE 값을 나타낸다. 또한 α는 설정 비율을 나타내며, α는 예를 들면, 0.6으로 설정될 수 있다. Here, E' represents the RMSE value for the pixel-by-pixel error (E), and E_g' represents the RMSE for the pixel-by-pixel gradient error (E_g). Also, α represents a set ratio, and α may be set to, for example, 0.6.

통합 오차 계산부(128)는 계산된 최종 손실 값(Loss)을 이용하여 신경망 학습 모델(110)의 가중치와 편향값을 조정한다. The integrated error calculator 128 adjusts the weight and bias values of the neural network learning model 110 by using the calculated final loss value (Loss).

한편, 픽셀별 그래디언트 값(Y_gard, GT_gard)은 수평과 수직 방향의 픽셀 변화량을 나타내며, 수학식 4와 같이 나타낼 수 있다.Meanwhile, the gradient values Y_gard and GT_gard for each pixel indicate the amount of change in pixels in the horizontal and vertical directions, and can be expressed as Equation (4).

여기서,

는 해당 픽셀의 수평과 수직 방향의 픽셀 변화량을 나타낸다. here,

denotes the amount of pixel change in the horizontal and vertical directions of the corresponding pixel.

이때 픽셀별 그래디언트의 크기는 수학식 5와 같이 나타낼 수 있고, 픽셀별 그래디언트의 방향은 수학식 6과 같이 나타낼 수 있다.In this case, the magnitude of the gradient per pixel may be expressed as in Equation 5, and the direction of the gradient per pixel may be expressed as in Equation 6.

픽셀별 그래디언트 오차(E_g)는 픽셀별 그래디언트 크기 오차와 방향값 오차를 포함할 수 있다. The gradient error E_g for each pixel may include a gradient size error and a direction value error for each pixel.

도 2는 본 발명의 실시 예에 따른 그림자 영역 검출을 위한 학습 방법을 나타낸 흐름도이다.2 is a flowchart illustrating a learning method for detecting a shadow region according to an embodiment of the present invention.

도 2를 참고하면, 그림자 영역 검출을 위한 학습 장치(100)의 신경망 학습 모델(110)로 입력 영상(x)과 정답 그림자 마스크 영상(GT_pixel)이 입력된다.Referring to FIG. 2 , an input image (x) and a correct shadow mask image (GT_pixel) are input to the neural network learning model 110 of the learning apparatus 100 for detecting a shadow region.

신경망 학습 모델(110)은 입력 영상(x)을 수신하면(S200), 설정된 가중치와 편향값을 이용하여 입력 영상(x)으로부터 그림자 영역을 추론하여 추론 그림자 마스크 영상(Y_pixel)을 출력한다(S210). When the neural network learning model 110 receives the input image (x) (S200), it infers a shadow region from the input image (x) using the set weights and bias values to output an inferred shadow mask image (Y_pixel) (S210) ).

손실 계산부(120)는 신경망 학습 모델(110)로부터 출력된 추론 그림자 마스크 영상(Y_pixel)과 정답 그림자 마스크 영상(GT_pixel)간 픽셀별 오차(E)를 수학식 1과 같이 계산하고(S220), 픽셀별 오차(E)에 대한 RMSE 값을 계산한다. The loss calculator 120 calculates the pixel-specific error E between the inferred shadow mask image (Y_pixel) output from the neural network learning model 110 and the correct shadow mask image (GT_pixel) as in Equation 1 (S220), Calculate the RMSE value for the pixel-by-pixel error (E).

손실 계산부(120)는 신경망 학습 모델(110)로부터 출력된 추론 그림자 마스크 영상(Y_pixel)으로부터 픽셀별 그래디언트 값(Y_gard)을 수학식 4 내지 수학식 6을 통해 계산한다(S230). The loss calculator 120 calculates a gradient value Y_gard for each pixel from the inferred shadow mask image Y_pixel output from the neural network learning model 110 through Equations 4 to 6 ( S230 ).

손실 계산부(120)는 추론 그림자 마스크 영상으로부터 계산된 픽셀별 그래디언트 값(Y_gard)과 정답 그림자 마스크 영상(GT_pixel)의 픽셀별 그래디언트 값(GT_gard)간 그래디언트 오차(E_g)를 수학식 2와 같이 계산하고(S240), 그래디언트 오차(E_g)에 대한 RMSE 값을 계산한다. The loss calculator 120 calculates the gradient error E_g between the gradient value Y_gard for each pixel calculated from the inferred shadow mask image and the gradient value GT_gard for each pixel of the correct shadow mask image GT_pixel as in Equation 2 and (S240), and calculates the RMSE value for the gradient error (E_g).

다음, 손실 계산부(120)는 픽셀별 오차(E)에 대한 RMSE 값과 픽셀별 그래디언트 오차(E_g)에 대한 RMSE 값을 이용하여 수학식 3과 같이 최종 손실 값(Loss)을 계산한다(S250). Next, the loss calculator 120 calculates a final loss value (Loss) as in Equation 3 by using the RMSE value for the pixel-by-pixel error (E) and the RMSE value for the pixel-by-pixel gradient error (E_g) (S250) ).

손실 계산부(120)는 계산된 최종 손실 값(Loss)을 이용하여 신경망 학습 모델(110)의 가중치와 편향값을 조정하는 역전파 과정을 수행한다(S260). 손실 계산부(120)는 최종 손실 값(Loss)을 최소화시키는 방향으로 신경망 학습 모델(110)의 가중치와 편향값을 조정한다. The loss calculator 120 performs a backpropagation process of adjusting the weight and bias values of the neural network learning model 110 using the calculated final loss value (Loss) (S260). The loss calculator 120 adjusts the weight and bias values of the neural network learning model 110 in a direction to minimize the final loss value (Loss).

그리고 다시 신경망 학습 모델(110)로 학습 데이터 즉, 입력 영상(x)이 입력되면, 신경망 학습 모델(110)은 조정된 가중치와 편향값을 이용하여 입력 영상(x)으로부터 그림자 영역을 추론하고, 이후 단계(S210~S260)를 반복한다. And when the training data, that is, the input image (x) is input to the neural network learning model 110 again, the neural network learning model 110 infers a shadow region from the input image (x) using the adjusted weight and bias values, Thereafter, steps S210 to S260 are repeated.

학습 장치(100)는 학습 데이터를 이용하여 앞에서 설명한 단계를 반복하면서, 목표한 조건에 맞도록 신경망 학습 모델(110)을 계속하여 학습한다. The learning apparatus 100 continuously learns the neural network learning model 110 to meet the target condition while repeating the above-described steps using the learning data.

기존의 학습 방법에서는 수학식 1과 같이 각 픽셀별 오차(E)가 계산되면, 각 픽셀별 오차(E)를 이용하여 신경망 학습 모델(110)의 가중치와 편향값을 조정하는 역전파 과정을 수행한다. In the existing learning method, when the error (E) for each pixel is calculated as shown in Equation 1, a backpropagation process of adjusting the weight and bias values of the neural network learning model 110 is performed using the error (E) for each pixel. do.

본 발명의 실시 예에서는 그림자 영역을 좀 더 세밀하게 검출할 수 있도록 추론된 결과와 정답값의 오차 계산 과정에서, 픽셀에 대한 그림자 영역의 오차값뿐만 아니라 그림자 마스크가 주변 값과의 변화 정도를 나타내는 그래디언트 성분에 대한 오차 값을 같이 반영함으로써, 기존의 학습 방법에 비해 경계선 부분이 향상된 그림자 영역을 검출할 수 있다. In an embodiment of the present invention, in the process of calculating the error between the inferred result and the correct value so that the shadow region can be detected more precisely, not only the error value of the shadow region for the pixel but also the shadow mask indicates the degree of change with the surrounding values. By reflecting the error value of the gradient component together, it is possible to detect a shadow region with an improved boundary line compared to the existing learning method.

도 3은 본 발명의 실시 예에 따른 다중 스케일 기반 그래디언트 오차 계산 설명하는 도면이다.3 is a diagram for explaining multi-scale-based gradient error calculation according to an embodiment of the present invention.

도 3을 참고하면, 픽셀별 그래디언트 오차(E_g)는 다중 스케일 기반으로 계산될 수 있다. Referring to FIG. 3 , the gradient error E_g for each pixel may be calculated based on multiple scales.

구체적으로, 입력 영상을 다양한 스케일로 스케일링하여 다중 스케일의 입력 영상이 생성될 수 있다. 다중 스케일의 입력 영상에 대한 정답 그림자 마스크 영상(GT_pixel)으로부터 다중 스케일의 그래디언트 그림자 마스크 데이터(GT_gard)가 생성될 수 있다. 이때 스케일링하는 영상의 크기 및 개수는 활용 목적에 따라 선택될 수 있다. Specifically, the input image of multiple scales may be generated by scaling the input image to various scales. Multi-scale gradient shadow mask data GT_gard may be generated from the correct shadow mask image GT_pixel with respect to the multi-scale input image. In this case, the size and number of images to be scaled may be selected according to the purpose of use.

또한 다중 스케일의 입력 영상에 대해 신경망 학습 모델(110)로부터 출력된 다중 스케일의 추론 그림자 마스크 영상(Y_pixel)으로부터 각각 픽셀별 그래디언트 값(Y_gard)이 계산되고, 다중 스케일별로 픽셀별 그래디언트 오차(E_g₁, E_g₂, …, E_g_n)가 계산될 수 있다. In addition, a gradient value (Y_gard) for each pixel is calculated from a multi-scale inference shadow mask image (Y_pixel) output from the neural network learning model 110 for a multi-scale input image, and a gradient error per pixel (E_g ₁ ) for each multiple scale. , E_g ₂ , …, E_g _n ) may be calculated.

예를 들면, 입력 영상이 256x256 크기일 경우, 256x256 영상에 대해 픽셀별 그래디언트 오차를 계산하고, 픽셀별 그래디언트 오차에 대한 RMSE 값을 계산한다. For example, when the input image has a size of 256x256, a gradient error for each pixel is calculated for the 256x256 image, and an RMSE value for the gradient error for each pixel is calculated.

또한 128x128로 크기를 줄인 영상에 대해서도 픽셀별 그래디언트 오차를 계산하고, 픽셀별 그래디언트 오차에 대한 RMSE 값을 계산한다. 이러한 방법으로 다중 스케일별로 그래디언트 오차에 대한 RMSE 값이 계산되며, 최종적인 그래디언트 오차 값은 각 스케일별 영상에 대한 RMSE 값을 모두 합산함으로써 구해질 수 있다.Also, for the image reduced in size to 128x128, the gradient error for each pixel is calculated, and the RMSE value for the gradient error for each pixel is calculated. In this way, the RMSE value for the gradient error for each multiple scale is calculated, and the final gradient error value can be obtained by summing all the RMSE values for the images for each scale.

그림자 영역 검출을 위한 학습 장치(100)는 다중 스케일의 입력 영상과 다중 스케일의 정답 그림자 마스크 영상(GT_pixel) 및 다중 스케일의 그래디언트 그림자 마스크 데이터(GT_gard)를 이용하여 도 1 및 도 2에 도시한 바와 같이 신경망 학습 모델(110)을 학습하면, 영상 크기 변화에도 강건한 그림자 영상을 검출할 수 있다. The learning apparatus 100 for detecting a shadow region uses the multi-scale input image, the multi-scale correct shadow mask image (GT_pixel), and the multi-scale gradient shadow mask data (GT_gard) as shown in FIGS. 1 and 2 . Similarly, if the neural network learning model 110 is trained, a shadow image that is robust to a change in image size can be detected.

도 4는 본 발명의 다른 실시 예에 따른 그림자 영역 검출을 위한 학습 장치를 나타낸 도면이다. 4 is a diagram illustrating a learning apparatus for detecting a shadow region according to another embodiment of the present invention.

도 4를 참고하면, 그림자 영역 검출을 위한 학습 장치(400)는 앞에서 설명한 본 발명의 실시 예에 따른 그림자 영역 검출을 위한 학습 방법이 구현된 컴퓨팅 장치를 나타낼 수 있다. Referring to FIG. 4 , a learning apparatus 400 for detecting a shadow area may represent a computing device in which the learning method for detecting a shadow area according to the embodiment of the present invention described above is implemented.

그림자 영역 검출을 위한 학습 장치(400)는 프로세서(410), 메모리(420), 입력 인터페이스 장치(430), 출력 인터페이스 장치(440), 및 저장 장치(450) 중 적어도 하나를 포함할 수 있다. 각각의 구성 요소들은 공통 버스(bus)(460)에 의해 연결되어 서로 통신을 수행할 수 있다. 또한, 각각의 구성 요소들은 공통 버스(460)가 아니라, 프로세서(410)를 중심으로 개별 인터페이스 또는 개별 버스를 통하여 연결될 수도 있다.The learning apparatus 400 for detecting a shadow region may include at least one of a processor 410 , a memory 420 , an input interface device 430 , an output interface device 440 , and a storage device 450 . Each of the components may be connected by a common bus 460 to communicate with each other. In addition, each of the components may be connected through a separate interface or a separate bus with the processor 410 as the center instead of the common bus 460 .

프로세서(410)는 AP(Application Processor), CPU(Central Processing Unit), GPU(Graphic　Processing　Unit) 등과 같은 다양한 종류들로 구현될 수 있으며, 메모리(420) 또는 저장 장치(450)에 저장된 명령을 실행하는 임의의 반도체 장치일 수 있다. 프로세서(410)는 메모리(420) 및 저장 장치(450) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. 이러한 프로세서(410)는 위의 도 1 내지 도 3을 토대로 설명한 그림자 영역 검출을 위한 학습 기능 및 방법을 구현하도록 구성될 수 있다. 예를 들어, 프로세서(410)는 도 1에서 설명한 신경망 학습 모델(110) 및 손실 계산부(120)의 적어도 일부 기능을 수행하도록 구성될 수 있다. The processor 410 may be implemented in various types such as an application processor (AP), a central processing unit (CPU), a graphic processing unit (GPU), and the like, and executes a command stored in the memory 420 or the storage device 450 . It may be any semiconductor device that does The processor 410 may execute a program command stored in at least one of the memory 420 and the storage device 450 . The processor 410 may be configured to implement a learning function and method for detecting a shadow region described above based on FIGS. 1 to 3 . For example, the processor 410 may be configured to perform at least some functions of the neural network learning model 110 and the loss calculator 120 described with reference to FIG. 1 .

메모리(420) 및 저장 장치(450)는 다양한 형태의 휘발성 또는 비 휘발성 저장 매체를 포함할 수 있다. 예를 들어, 메모리(420)는 ROM(read-only memory)(421) 및 RAM(random access memory)(422)를 포함할 수 있다. 본 발명의 실시 예에서 메모리(420)는 프로세서(410)의 내부 또는 외부에 위치할 수 있고, 메모리(420)는 이미 알려진 다양한 수단을 통해 프로세서(410)와 연결될 수 있다. The memory 420 and the storage device 450 may include various types of volatile or non-volatile storage media. For example, the memory 420 may include a read-only memory (ROM) 421 and a random access memory (RAM) 422 . In an embodiment of the present invention, the memory 420 may be located inside or outside the processor 410 , and the memory 420 may be connected to the processor 410 through various known means.

입력 인터페이스 장치(430)는 데이터(예를 들어, 입력 영상 및 정답 그림자 마스크 영상 등)를 프로세서(410)로 제공하도록 구성된다. 예를 들면, 입력 인터페이스 장치(430)는 입력 영상 및 정답 그림자 마스크 영상 등을 프로세서(410)로 제공하도록 구성될 수 있다. The input interface device 430 is configured to provide data (eg, an input image and a correct answer shadow mask image) to the processor 410 . For example, the input interface device 430 may be configured to provide an input image and a correct answer shadow mask image to the processor 410 .

출력 인터페이스 장치(440)는 프로세서(410)로부터의 데이터를 출력하도록 구성된다. 예를 들면, 출력 인터페이스 장치(440)는 프로세서(410)로부터의 추론 그림자 마스크 영상 등을 출력하도록 구성될 수 있다. The output interface device 440 is configured to output data from the processor 410 . For example, the output interface device 440 may be configured to output an inferred shadow mask image from the processor 410 .

또한 본 발명의 실시 예에 따른 그림자 영역 검출을 위한 학습 방법 중 적어도 일부는 컴퓨팅 장치에서 실행되는 프로그램 또는 소프트웨어로 구현될 수 있고, 프로그램 또는 소프트웨어는 컴퓨터로 판독 가능한 매체에 저장될 수 있다.In addition, at least a part of the learning method for detecting a shadow region according to an embodiment of the present invention may be implemented as a program or software executed in a computing device, and the program or software may be stored in a computer-readable medium.

또한 본 발명의 실시 예에 따른 그림자 영역 검출을 위한 학습 방법 중 적어도 일부는 컴퓨팅 장치와 전기적으로 접속될 수 있는 하드웨어로 구현될 수도 있다.Also, at least a part of the learning method for detecting a shadow region according to an embodiment of the present invention may be implemented as hardware that can be electrically connected to a computing device.

이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리 범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리 범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improved forms of the present invention are also provided by those skilled in the art using the basic concept of the present invention as defined in the following claims. is within the scope of the right.

Claims

In a method for learning a neural network learning model to detect a shadow region in a learning device for shadow region detection,
outputting an inferred shadow mask image by inferring a shadow image for the input learning image using the weight and bias values set in the neural network learning model;
calculating an error for each pixel between the inference shadow mask image and the correct shadow mask image for the training image;
calculating a gradient value representing the degree of change from the inferred shadow mask image to each neighboring pixel for each pixel;
calculating a gradient error for each pixel between the gradient value for each pixel calculated from the inferred shadow mask image and the gradient value for each pixel of the correct shadow mask image, and
adjusting the weight and bias values set in the neural network learning model by using the pixel-by-pixel error and the pixel-by-pixel gradient error
learning methods that include