KR102211842B1

KR102211842B1 - Apparatus and method for recognizing a place based on Artificial Neural Network

Info

Publication number: KR102211842B1
Application number: KR1020190041544A
Authority: KR
Inventors: 김은태; 성홍제; 현준혁; 이수현; 우수한; 장현배
Original assignee: 연세대학교 산학협력단
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2021-02-02
Also published as: KR20200120987A; WO2020209487A1

Abstract

본 발명에 따르면, 인식하고자 하는 장소가 포함된 분석 대상 영상에 포함된 다수의 객체와 관련된 정보를 객체 특징값으로 추출하는 객체 정보 추출부, 분석 대상 영상에 포함된 다수의 객체와 관련된 장소 인식을 위한 객체-장소 상관도를 변환 파라미터로 나타내는 인공 신경망 기반의 변환 모델을 생성하고, 상기 객체 특징값을 기반으로 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신하는 학습 진행부를 포함하여 영상단위 장소 인식 성능을 향상시키는 상관관계 점수 행렬 생성 알고리즘을 이용한 인공 신경망 기반의 장소 인식 장치 및 방법이 개시된다.According to the present invention, an object information extraction unit that extracts information related to a plurality of objects included in an analysis target image including a location to be recognized as an object feature value, and recognizes a place related to a plurality of objects included in the analysis target image. An image unit location recognition performance including a learning processing unit that generates a transformation model based on an artificial neural network that represents the object-place correlation for a transformation parameter, and updates the transformation model by adjusting the transformation parameter based on the object feature value An apparatus and method for recognizing a place based on an artificial neural network using an algorithm for generating a correlation score matrix that improves

Description

Apparatus and method for recognizing a place based on artificial neural network using correlation score matrix generation algorithm

본 발명은 장소 인식 장치 및 방법에 관한 것으로, 특히 인공 신경망 기반의 장소 인식 장치 및 방법에 관한 것이다.The present invention relates to a place recognition apparatus and method, and more particularly, to an artificial neural network-based place recognition apparatus and method.

종래의 통계학의 상관 계수를 도출해내는 방식은 모든 영상에 대해 어떤 객체와 장소가 등장하는지에 대한 정답이 있는 경우, 모든 정답들을 통계적으로 분석하여 특정 객체와 장소와의 관계를 수치로 나타내는 방식을 이용한다.The method of deriving the correlation coefficient of the conventional statistics uses a method of expressing the relationship between a specific object and a place numerically by statistically analyzing all the correct answers when there is a correct answer for which object and place appear for all images. .

기존의 방법을 사용하기 위해서는 영상에 등장하는 객체와 장소에 대한 정보가 함께 필요로 하지만 기존에 존재하는 ImageNet과 Places 2와 같은 많은 양의 영상을 제공하는 데이터셋들은 객체 또는 장소 중 하나의 정보만을 제공하고 있다. In order to use the existing method, information on the objects and places appearing in the image is required together, but the existing datasets that provide a large amount of images, such as ImageNet and Places 2, only provide information on one of the objects or places. Are being provided.

따라서 기존의 데이터셋에 적용을 하기 위해서는 제공되지 않는 객체 또는 장소에 대한 정보를 직접 입력해야 한다. 또한, 이 방법은 기존의 영상에서의 객체 또는 장소 분류에 가장 많이 사용되는 Convolutional Neural Network (CNN) 구조에 사용하기 어렵다. CNN에 이 방식을 사용하게 되면 전체 영상에 대한 통계적으로 계산된 값이 고정되어 사용되어야 하기 때문에 오히려 새로운 영상에 대해 객체 또는 장소 분류에 방해가 되는 정보를 제공하게 된다.Therefore, in order to apply to the existing dataset, information on an object or place that is not provided must be entered directly. In addition, this method is difficult to use in the Convolutional Neural Network (CNN) structure, which is most often used for classifying objects or places in existing images. When this method is used for CNN, the statistically calculated values for the entire image must be fixed and used, so information that interferes with object or place classification for new images is provided.

본 발명은 상관관계 점수 행렬 생성 알고리즘을 이용한 인공 신경망 기반의 장소 인식 장치 및 방법으로 인식하고자 하는 장소가 포함된 분석 대상 영상에 포함된 다수의 객체와 관련된 정보를 객체 특징값으로 추출하는 객체 정보 추출부, 분석 대상 영상에 포함된 다수의 객체와 관련된 장소 인식을 위한 객체-장소 상관도를 변환 파라미터로 나타내는 인공 신경망 기반의 변환 모델을 생성하고, 상기 객체 특징값을 기반으로 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신하는 학습 진행부를 포함하여 영상단위 장소 인식 성능을 향상시키는데 그 목적이 있다.The present invention is an object information extraction for extracting information related to a plurality of objects included in an analysis target image including a place to be recognized by an artificial neural network-based place recognition apparatus and method using a correlation score matrix generation algorithm as an object feature value In addition, a transformation model based on an artificial neural network representing an object-place correlation for location recognition related to a plurality of objects included in the analysis target image is generated as a transformation parameter, and the transformation parameter is adjusted based on the object feature value. An object of the present invention is to improve the performance of recognizing a place in an image unit including a learning progress unit that updates the transformation model.

또한, 객체 또는 장소 중 하나의 정답만 존재하더라도 딥러닝의 약한 지도 학습 알고리즘으로 상관관계 점수 행렬 생성이 가능하도록 하는데 또 다른 목적이 있다.In addition, there is another purpose to enable the creation of a correlation score matrix with a weak supervised learning algorithm of deep learning even if only one correct answer among objects or places exists.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Still other objects, not specified, of the present invention may be additionally considered within the range that can be easily deduced from the following detailed description and effects thereof.

상기 과제를 해결하기 위해, 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치는, 인식하고자 하는 장소가 포함된 분석 대상 영상에 포함된 다수의 객체와 관련된 정보를 객체 특징값으로 추출하는 객체 정보 추출부 및 상기 분석 대상 영상에 포함된 다수의 객체와 관련된 장소 인식을 위한 객체-장소 상관도를 변환 파라미터로 나타내는 인공 신경망 기반의 변환 모델을 생성하고, 상기 객체 특징값을 기반으로 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신하는 학습 진행부를 포함한다.In order to solve the above problem, an artificial neural network-based place recognition apparatus according to an embodiment of the present invention extracts information related to a plurality of objects included in an analysis target image including a place to be recognized as an object feature value. An object information extraction unit and an artificial neural network-based transformation model representing an object-place correlation for place recognition related to a plurality of objects included in the analysis target image as a transformation parameter are generated, and the transformation based on the object feature values And a learning progress unit for updating the transformation model by adjusting parameters.

여기서, 상기 객체 정보 추출부는, 상기 인식하고자 하는 장소가 포함된 분석 대상 영상을 입력 받는 입력부, 객체 인식 데이터셋에 미리 학습된 합성곱 신경망(Convolutional Neural Network, CNN)을 이용하여 상기 분석 대상 영상에 포함된 다수의 객체를 추적하는 객체 추적부 및 추적한 다수의 객체와 관련된 정보를 객체 특징값으로 추출하는 객체 특징값 추출부를 포함한다.Here, the object information extracting unit includes an input unit for receiving an analysis target image including the location to be recognized, and a convolutional neural network (CNN) previously learned in an object recognition data set to the analysis target image. An object tracking unit for tracking a plurality of included objects and an object feature value extracting unit for extracting information related to the tracked plurality of objects as object feature values.

또한, 상기 인식하고자 하는 장소가 포함된 분석 대상 영상에서 영상 단위 장소 인식 데이터셋을 사용하여 상기 분석 대상 영상의 장소 정보를 장소 특징값으로 추출하는 장소 정보 추출부를 더 포함한다.In addition, it further includes a place information extracting unit for extracting place information of the analysis target image as a place feature value by using an image unit place recognition dataset from the analysis target image including the location to be recognized.

여기서, 상기 학습 진행부는, 상기 객체 정보 추출부에서 추출된 상기 객체 특징값을 기반으로 객체 점수 벡터를 산출하는 객체 점수 벡터 산출부, 상기 객체-장소 상관도를 나타내는 변환 파라미터로 상관관계 점수 행렬을 산출하는 상관관계 점수 행렬 산출부 및 상기 상관관계 점수 행렬과 상기 객체 점수 벡터의 곱의 연산을 수행하여 장소 점수 벡터를 산출하는 장소 점수 벡터 산출부를 포함한다.Here, the learning progress unit, an object score vector calculation unit that calculates an object score vector based on the object feature value extracted from the object information extraction unit, and a correlation score matrix as a transformation parameter representing the object-place correlation And a correlation score vector calculation unit that calculates a correlation score matrix calculation unit, and a place score vector calculation unit that calculates a place score vector by performing an operation of a product of the correlation score matrix and the object score vector.

여기서, 상기 학습 진행부는, 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터를 이용하여 상기 상관관계 점수 행렬의 손실값을 계산하고, 상기 손실값을 기반으로 상기 변환 파라미터를 조정하여 상기 인공 신경망을 트레이닝 하는 트레이닝부를 더 포함한다.Here, the learning progress unit calculates a loss value of the correlation score matrix using the place score vector calculated by the place score vector calculation unit, and adjusts the transformation parameter based on the loss value to the artificial neural network. It further includes a training unit to train.

여기서, 상기 객체 특징값은, n(여기서, n은 자연수)차원 어레이(tensor)의 엘리먼트들이며, 상기 객체 점수 벡터 산출부는, 상기 엘리먼트들 각각을 n개의 정규화된 벡터 데이터로 변환한다.Here, the object feature values are elements of an n (where n is a natural number) dimensional array, and the object score vector calculator converts each of the elements into n normalized vector data.

여기서, 상기 트레이닝부는, 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터로 장소 예측값을 출력하고, 상기 장소 예측값과 장소 특징값을 비교하여 상기 상관관계 점수 행렬의 손실값을 계산하는 손실값 계산부 및 상기 손실값을 기반으로 역전파(Back Propagation) 알고리즘을 이용하여 상기 변환 파라미터를 조정하는 조정부를 포함한다.Here, the training unit outputs a place prediction value using the place score vector calculated by the place score vector calculation unit, and compares the place prediction value with a place feature value to calculate a loss value of the correlation score matrix. And an adjustment unit for adjusting the transformation parameter using a back propagation algorithm based on the unit and the loss value.

여기서, 상기 트레이닝부는, 상기 장소 예측값과 장소 특징값을 비교하여 계산한 상기 손실값이 임계치보다 작을 때까지 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신한다.Here, the training unit updates the transform model by adjusting the transform parameter until the loss value calculated by comparing the place prediction value and the place feature value is less than a threshold value.

여기서, 상기 객체 인식 데이터셋은, 상기 분석 대상 영상에서 상기 인식하고자 하는 장소와 관련이 있는 객체들을 검출할 가능성 있는 영역들의 세트이다.Here, the object recognition data set is a set of areas in the analysis target image that can detect objects related to the location to be recognized.

본 발명의 일 실시예에 따른 상관관계 점수 행렬 생성 알고리즘 생성 장치는, 객체 정보 추출부에서 추출된 객체 특징값을 기반으로 객체 점수 벡터를 산출하는 객체 점수 벡터 산출부, 객체-장소 상관도를 나타내는 변환 파라미터로 상관관계 점수 행렬을 산출하는 상관관계 점수 행렬 산출부, 상기 상관관계 점수 행렬과 상기 객체 점수 벡터의 곱의 연산을 수행하여 장소 점수 벡터를 산출하는 장소 점수 벡터 산출부 및 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터를 이용하여 상기 상관관계 점수 행렬의 손실값을 계산하고, 상기 손실값을 기반으로 상기 변환 파라미터를 조정하여 인공 신경망을 트레이닝 하는 트레이닝부를 포함한다.The apparatus for generating an algorithm for generating a correlation score matrix according to an embodiment of the present invention includes an object score vector calculation unit that calculates an object score vector based on object feature values extracted from the object information extraction unit, and indicates an object-location correlation. A correlation score matrix calculation unit that calculates a correlation score matrix as a transformation parameter, a place score vector calculation unit that calculates a place score vector by calculating a product of the correlation score matrix and the object score vector, and the place score vector And a training unit for training an artificial neural network by calculating a loss value of the correlation score matrix using the place score vector calculated by the calculation unit, and adjusting the transformation parameter based on the loss value.

여기서, 상기 트레이닝부는, 상기 장소 예측값과 장소 특징값을 비교하여 계산한 상기 손실값이 임계치보다 작을 때까지 상기 변환 파라미터를 조정하여 변환 모델을 갱신한다.Here, the training unit updates the transform model by adjusting the transform parameter until the loss value calculated by comparing the place prediction value and the place feature value is less than a threshold value.

본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 방법은, 장소 정보 추출부가 인식하고자 하는 장소가 포함된 분석 대상 영상에서 영상 단위 장소 인식 데이터셋을 사용하여 상기 분석 대상 영상의 장소 정보를 장소 특징값으로 추출하는 단계, 객체 정보 추출부가 객체 인식 데이터셋에 미리 학습된 합성곱 신경망(Convolutional Neural Network, CNN)을 이용하여 상기 인식하고자 하는 장소가 포함된 분석 대상 영상에 포함된 다수의 객체와 관련된 정보를 객체 특징값으로 추출하는 단계 및 학습 진행부가 상기 분석 대상 영상에 포함된 다수의 객체와 관련된 장소 인식을 위한 객체-장소 상관도를 변환 파라미터로 나타내는 인공 신경망 기반의 변환 모델을 생성하고, 상기 객체 특징값을 기반으로 상기 변환 파라미터를 조정하여 변환 모델을 갱신하는 단계를 포함한다.In an artificial neural network-based place recognition method according to an embodiment of the present invention, the place information of the analysis target image is located in the analysis target image including the place to be recognized by the place information extraction unit. Extracting as a feature value, an object information extraction unit using a convolutional neural network (CNN) previously learned in an object recognition dataset, and a plurality of objects included in the analysis target image including the location to be recognized The step of extracting related information as an object feature value, and the learning progress unit generates a transformation model based on an artificial neural network that represents an object-place correlation for recognizing places related to a plurality of objects included in the analysis target image as a transformation parameter, And updating the transformation model by adjusting the transformation parameter based on the object feature value.

여기서, 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신하는 단계는, 객체 점수 벡터 산출부가 상기 객체 정보 추출부에서 추출된 상기 객체 특징값을 기반으로 객체 점수 벡터를 산출하는 단계, 상관관계 점수 행렬 산출부가 상기 객체-장소 상관도를 나타내는 변환 파라미터로 상관관계 점수 행렬을 산출하는 단계, 장소 점수 벡터 산출부가 상기 상관관계 점수 행렬과 상기 객체 점수 벡터의 곱의 연산을 수행하여 상기 장소 점수 벡터를 산출하는 단계 및 트레이닝부가 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터를 이용하여 상기 상관관계 점수 행렬의 손실값을 계산하고, 상기 손실값을 기반으로 상기 변환 파라미터를 조정하여 상기 인공 신경망을 트레이닝 하는 단계를 포함한다.Here, the step of updating the transformation model by adjusting the transformation parameter comprises: calculating an object score vector based on the object feature value extracted from the object information extraction unit by an object score vector calculation unit, calculating a correlation score matrix Calculating a correlation score matrix as a transformation parameter indicating the additional object-place correlation, a place score vector calculating unit calculating the place score vector by performing an operation of a product of the correlation score matrix and the object score vector The step and training unit calculates a loss value of the correlation score matrix using the place score vector calculated by the place score vector calculation unit, and trains the artificial neural network by adjusting the transformation parameter based on the loss value. Includes steps.

여기서, 상기 인공 신경망을 트레이닝 하는 단계는, 손실값 계산부가 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터로 장소 예측값을 출력하고, 상기 장소 예측값과 장소 특징값을 비교하여 상기 상관관계 점수 행렬의 손실값을 계산하는 단계 및 조정부가 상기 손실값을 기반으로 역전파(Back Propagation) 알고리즘을 이용하여 상기 변환 파라미터를 조정하는 단계를 포함한다.Here, in the training of the artificial neural network, the loss value calculation unit outputs a place prediction value using the place score vector calculated by the place score vector calculation unit, and the correlation score matrix is compared with the place prediction value and the place feature value. And calculating a loss value of, and adjusting the transformation parameter by using a back propagation algorithm based on the loss value.

여기서, 상기 인공 신경망을 트레이닝 하는 단계는, 상기 장소 예측값과 장소 특징값을 비교하여 계산한 상기 손실값이 임계치보다 작을 때까지 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신한다.Here, in the training of the artificial neural network, the transform model is updated by adjusting the transform parameter until the loss value calculated by comparing the place prediction value and the place feature value is less than a threshold value.

본 실시예의 다른 측면에 의하면, 인공 신경망 기반의 장소 인식 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.According to another aspect of the present embodiment, there is provided a computer-readable recording medium in which a program for executing a place recognition method based on an artificial neural network on a computer is recorded.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 인식하고자 하는 장소가 포함된 분석 대상 영상에 포함된 다수의 객체와 관련된 정보를 객체 특징값으로 추출하는 객체 정보 추출부, 분석 대상 영상에 포함된 다수의 객체와 관련된 장소 인식을 위한 객체-장소 상관도를 변환 파라미터로 나타내는 인공 신경망 기반의 변환 모델을 생성하고, 상기 객체 특징값을 기반으로 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신하는 학습 진행부를 포함하여 영상단위 장소 인식 성능을 향상시킬 수 있다.As described above, according to embodiments of the present invention, an object information extracting unit that extracts information related to a plurality of objects included in an analysis target image including a location to be recognized as an object feature value, and includes in the analysis target image Learning to generate a transformation model based on an artificial neural network that represents an object-place correlation for place recognition related to a plurality of objects as a transformation parameter, and to update the transformation model by adjusting the transformation parameter based on the object feature values It is possible to improve the performance of recognizing a place in an image unit including a progress unit.

또한, 객체 또는 장소 중 하나의 정답만 존재하더라도 딥러닝의 약한 지도 학습 알고리즘으로 상관관계 점수 행렬 생성이 가능하도록 할 수 있다.In addition, even if only one correct answer among objects or places exists, it is possible to generate a correlation score matrix with a weak supervised learning algorithm of deep learning.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if it is an effect not explicitly mentioned herein, the effect described in the following specification expected by the technical features of the present invention and the provisional effect thereof are treated as described in the specification of the present invention.

도 1은 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치를 나타낸 블록도이다.
도 2는 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 학습 진행부를 나타낸 블록도이다.
도 3은 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 합성곱 신경망을 포함한 예시적인 구조를 나타낸 도면이다.
도 4는 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 장소 점수 벡터 산출 과정을 시각화하여 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 분석에 따른 객체의 상관관계 가중치와 장소의 상관관계 가중치를 나타낸 그래프이다.
도 6은 본 발명의 또 다른 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 합성곱 신경망을 포함한 예시적인 구조를 나타낸 도면이다.
도 7 및 도 8은 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 방법을 나타낸 흐름도이다.1 is a block diagram illustrating an apparatus for recognizing a place based on an artificial neural network according to an embodiment of the present invention.
2 is a block diagram showing a learning progress unit of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.
3 is a diagram illustrating an exemplary structure including a convolutional neural network of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.
4 is a diagram illustrating a process of calculating a place score vector of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.
5 is a graph showing a correlation weight of an object and a correlation weight of a place according to an analysis of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.
6 is a diagram showing an exemplary structure including a convolutional neural network of a place recognition apparatus based on an artificial neural network according to another embodiment of the present invention.
7 and 8 are flowcharts illustrating a method of recognizing a place based on an artificial neural network according to an embodiment of the present invention.

이하, 본 발명에 관련된 상관관계 점수 행렬 생성 알고리즘을 이용한 인공 신경망 기반의 장소 인식 장치 및 방법에 대하여 도면을 참조하여 보다 상세하게 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다.Hereinafter, an apparatus and method for recognizing a place based on an artificial neural network using an algorithm for generating a correlation score matrix according to the present invention will be described in more detail with reference to the drawings. However, the present invention may be implemented in various different forms, and is not limited to the described embodiments. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다.The suffixes "module" and "unit" for components used in the following description are given or used interchangeably in consideration of only the ease of preparation of the specification, and do not have meanings or roles that are distinguished from each other by themselves.

본 발명은 상관관계 점수 행렬 생성 알고리즘을 이용한 인공 신경망 기반의 장소 인식 장치 및 방법에 관한 것이다.The present invention relates to a place recognition apparatus and method based on an artificial neural network using a correlation score matrix generation algorithm.

도 1은 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치를 나타낸 블록도이다.1 is a block diagram illustrating an apparatus for recognizing a place based on an artificial neural network according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치(10)는 장소 정보 추출부(100), 객체 정보 추출부(200), 학습 진행부(300)를 포함한다.Referring to FIG. 1, a place recognition apparatus 10 based on an artificial neural network according to an embodiment of the present invention includes a place information extracting unit 100, an object information extracting unit 200, and a learning progress unit 300. .

본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치(10)는 객체 정보에서 장소 정보를 인식하는 장치이다. 구체적으로, 객체 분류기에서 추출된 객체 정보와 장소 분류기에서 추출된 장소 정보를 사용한 객체와 장소의 상관관계 점수 행렬을 약한 지도 방식으로 학습하는 딥 러닝 알고리즘과 이를 사용하여 장소를 인식한다.The apparatus 10 for recognizing a place based on an artificial neural network according to an embodiment of the present invention is an apparatus for recognizing place information from object information. Specifically, a deep learning algorithm that learns the object-to-place correlation score matrix using the object information extracted from the object classifier and the place information extracted from the place classifier in a weak map method, and a place using the same.

콘볼루션 신경망(Convolutional Neural Network)은 심층 신경망(DNN: Deep Neural Network)의 한 종류로, 하나 또는 여러 개의 콘볼루션 계층(convolutional layer)과 통합 계층(pooling layer), 완전하게 연결된 계층(fully connected layer)들로 구성된 신경망이다.Convolutional Neural Network (DNN) is a type of deep neural network (DNN), one or several convolutional layers, a pooling layer, and a fully connected layer. ) Is a neural network.

CNN은 2차원 데이터의 학습에 적합한 구조를 가지고 있으며, 역전달(Backpropagation algorithm)을 통해 훈련될 수 있다. 영상 내 객체 분류, 객체 탐지 등 다양한 응용 분야에 폭넓게 활용되는 DNN의 대표적 모델 중 하나이다.CNN has a structure suitable for learning 2D data, and can be trained through a backpropagation algorithm. It is one of the representative models of DNN that is widely used in various application fields such as object classification and object detection in images.

장소 정보 추출부(100)는 상기 인식하고자 하는 장소가 포함된 분석 대상 영상에서 영상 단위 장소 인식 데이터셋을 사용하여 상기 분석 대상 영상의 장소 정보를 장소 특징값으로 추출한다.The location information extracting unit 100 extracts location information of the analysis target image as a location feature value by using an image unit location recognition dataset from the analysis target image including the location to be recognized.

객체 정보 추출부(200)는 인식하고자 하는 장소가 포함된 분석 대상 영상에 포함된 다수의 객체와 관련된 정보를 객체 특징값으로 추출한다.The object information extracting unit 200 extracts information related to a plurality of objects included in an analysis target image including a location to be recognized as an object feature value.

객체 정보 추출부(200)는 입력부(210), 객체 추적부(220), 객체 특징값 추출부(230)를 포함한다.The object information extracting unit 200 includes an input unit 210, an object tracking unit 220, and an object feature value extracting unit 230.

입력부(210)는 상기 인식하고자 하는 장소가 포함된 분석 대상 영상을 입력 받는다.The input unit 210 receives an analysis target image including the location to be recognized.

객체 추적부(220)는 객체 인식 데이터셋에 미리 학습된 합성곱 신경망(Convolutional Neural Network, CNN)을 이용하여 상기 분석 대상 영상에 포함된 다수의 객체를 추적한다.The object tracking unit 220 tracks a plurality of objects included in the analysis target image by using a convolutional neural network (CNN) previously learned in the object recognition dataset.

객체 인식 데이터셋은, 상기 분석 대상 영상에서 상기 인식하고자 하는 장소와 관련이 있는 객체들을 검출할 가능성 있는 영역들의 세트이다.The object recognition dataset is a set of regions in the analysis target image that are likely to detect objects related to the location to be recognized.

객체 특징값 추출부(230)는 추적한 다수의 객체와 관련된 정보를 객체 특징값으로 추출한다.The object feature value extraction unit 230 extracts information related to the tracked plurality of objects as object feature values.

학습 진행부(300)는 상기 분석 대상 영상에 포함된 다수의 객체와 관련된 장소 인식을 위한 객체-장소 상관도를 변환 파라미터로 나타내는 인공 신경망 기반의 변환 모델을 생성하고, 상기 객체 특징값을 기반으로 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신한다.The learning progress unit 300 generates a transformation model based on an artificial neural network that represents an object-place correlation for place recognition related to a plurality of objects included in the analysis target image as a transformation parameter, and based on the object feature values. The transformation model is updated by adjusting the transformation parameter.

본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치(10)는 영상 단위의 객체 및 장소 인식 학습 데이터를 이용하여 학습된 각각의 분류기를 사용하여 객체와 장소의 상관관계 점수 행렬을 생성할 수 있다.The artificial neural network-based place recognition apparatus 10 according to an embodiment of the present invention generates a correlation score matrix between an object and a place using each classifier learned using the object and place recognition learning data of an image unit. I can.

객체와 장소는 많은 연관성이 있다. 예를 들어 우리가 교실이라는 영상을 인식하기 위해선 칠판, 교탁, 책상 및 의자가 존재할 것으로 기대한다. 또한, 교실이라는 영상이 주어진 경우 의자에 앉아있는 사람을 학생, 교탁 앞에 서 있는 사람을 선생님으로 인식한다. 주어진 객체와 장소의 종류가 많아질 경우, 모든 객체와 장소 간의 상관관계를 작성하기 어려워지게 된다.There are many connections between objects and places. For example, in order to recognize the image of a classroom, we expect that there will be a blackboard, a school table, a desk, and a chair. Also, given the video of a classroom, a person sitting in a chair is recognized as a student, and a person standing in front of the school table is recognized as a teacher. When the number of given objects and places increases, it becomes difficult to create a correlation between all objects and places.

서로 다른 두 종류에 대해 상관관계를 도출해내는 대표적인 예로 통계학의 상관계수가 있다. 이 방식은 모든 영상에 대해 어떤 객체와 장소가 등장하는지에 대한 정답이 있는 경우, 모든 정답들을 통계적으로 분석하여 특정 객체와 장소와의 관계를 수치로 나타내는 방식이다. 이 방법을 사용하기 위해선 영상에 등장하는 객체와 장소에 대한 정보가 함께 필요로 한다. 하지만 기존에 존재하는 ImageNet과 Places 2와 같은 많은 양의 영상을 제공하는 데이터셋들은 객체 또는 장소 중 하나의 정보만을 제공하고 있다. 따라서 기존의 데이터셋에 적용을 하기 위해서는 제공되지 않는 객체 또는 장소에 대한 정보를 직접 입력해야 한다. 또한, 이 방법은 기존의 영상에서의 객체 또는 장소 분류에 가장 많이 사용되는 Convolutional Neural Network (CNN) 구조에 사용하기 어렵다. CNN에 이 방식을 사용하게 되면 전체 영상에 대한 통계적으로 계산된 값이 고정되어 사용되어야 하기 때문에 오히려 새로운 영상에 대해 객체 또는 장소 분류에 방해가 되는 정보를 제공하게 된다.A representative example of deriving correlations for two different types is the correlation coefficient of statistics. In this method, when there is a correct answer about which object and place appear for all images, all correct answers are statistically analyzed and the relationship between a specific object and place is expressed in numerical terms. In order to use this method, information on the objects and places appearing in the video is required together. However, existing datasets that provide a large amount of images, such as ImageNet and Places 2, only provide information on either object or place. Therefore, in order to apply to the existing dataset, information on an object or place that is not provided must be entered directly. In addition, this method is difficult to use in the Convolutional Neural Network (CNN) structure, which is most often used for classifying objects or places in existing images. When this method is used for CNN, the statistically calculated values for the entire image must be fixed and used, so information that interferes with object or place classification for new images is provided.

따라서 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치(10)와 상관관계 점수 행렬 생성 알고리즘 생성 장치로 구현 가능한 학습 진행부(300)는 인공 신경망의 형태로 모듈화가 가능한 객체와 장소의 상관관계 점수 행렬을 이용하는 것으로, 장소 인식을 위한 어떠한 인공 신경망 구조에도 적용이 가능하다는 장점이 있다. 또한 인공 신경망으로 구현에 용이하도록 종단간 학습을 할 수 있는 인공 신경망의 형태로 모듈화가 가능하다. 하나의 영상에 객체와 장소 두개의 정답 모두 필요했던 기존의 통계적 방식과는 다르게 객체 또는 장소 중 하나의 정답만 존재하더라도 딥러닝의 약한 지도 학습 알고리즘으로 상관관계 점수 행렬 생성이 가능하다.Therefore, the learning progress unit 300, which can be implemented as an artificial neural network-based place recognition apparatus 10 and a correlation score matrix generation algorithm generation apparatus, according to an embodiment of the present invention, is provided in the form of an artificial neural network. By using the correlation score matrix, there is an advantage that it can be applied to any artificial neural network structure for place recognition. In addition, it can be modularized in the form of an artificial neural network capable of end-to-end learning to facilitate implementation with an artificial neural network. Unlike the conventional statistical method, which required both an object and a place in one image, even if only one correct answer exists for an object or a place, a correlation score matrix can be generated with a weak supervised learning algorithm of deep learning.

도 2는 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 학습 진행부를 나타낸 블록도이다.2 is a block diagram showing a learning progress unit of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치(10)의 학습 진행부(300)는 객체 점수 벡터 산출부(310), 상관관계 점수 행렬 산출부(320), 장소 점수 벡터 산출부(330), 트레이닝부(340)를 포함한다.2, the learning progress unit 300 of the place recognition apparatus 10 based on an artificial neural network according to an embodiment of the present invention includes an object score vector calculation unit 310 and a correlation score matrix calculation unit 320 , A place score vector calculation unit 330, and a training unit 340.

객체 점수 벡터 산출부(310)는 상기 객체 정보 추출부에서 추출된 상기 객체 특징값을 기반으로 객체 점수 벡터를 산출한다.The object score vector calculation unit 310 calculates an object score vector based on the object feature value extracted by the object information extraction unit.

객체 특징값은, n(여기서, n은 자연수)차원 어레이(tensor)의 엘리먼트들이며, 객체 점수 벡터 산출부는, 상기 엘리먼트들 각각을 n개의 정규화된 벡터 데이터로 변환한다.The object feature values are elements of an n (here, n is a natural number) dimensional array, and the object score vector calculator converts each of the elements into n normalized vector data.

상관관계 점수 행렬 산출부(320)는 상기 객체-장소 상관도를 나타내는 변환 파라미터로 상관관계 점수 행렬을 산출한다.The correlation score matrix calculating unit 320 calculates a correlation score matrix as a transformation parameter representing the object-place correlation.

장소 점수 벡터 산출부(330)는 상기 상관관계 점수 행렬과 상기 객체 점수 벡터의 곱의 연산을 수행하여 장소 점수 벡터를 산출한다.The place score vector calculation unit 330 calculates a place score vector by multiplying the correlation score matrix and the object score vector.

트레이닝부(340)는 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터를 이용하여 상기 상관관계 점수 행렬의 손실값을 계산하고, 상기 손실값을 기반으로 상기 변환 파라미터를 조정하여 상기 인공 신경망을 트레이닝 한다.The training unit 340 calculates a loss value of the correlation score matrix using the place score vector calculated by the place score vector calculation unit, and adjusts the transformation parameter based on the loss value to generate the artificial neural network. Train.

트레이닝부(340)는 손실값 계산부(341), 조정부(343)를 포함한다.The training unit 340 includes a loss value calculation unit 341 and an adjustment unit 343.

손실값 계산부(341)는 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터로 장소 예측값을 출력하고, 상기 장소 예측값과 장소 특징값을 비교하여 상기 상관관계 점수 행렬의 손실값을 계산한다.The loss value calculation unit 341 outputs a place prediction value using the place score vector calculated by the place score vector calculation unit, and compares the place prediction value with the place feature value to calculate a loss value of the correlation score matrix.

조정부(343)는 상기 손실값을 기반으로 역전파(Back Propagation) 알고리즘을 이용하여 상기 변환 파라미터를 조정한다.The adjustment unit 343 adjusts the transformation parameter using a back propagation algorithm based on the loss value.

트레이닝부(340)는 상기 장소 예측값과 장소 특징값을 비교하여 계산한 상기 손실값이 임계치보다 작을 때까지 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신한다.The training unit 340 updates the transform model by adjusting the transform parameter until the loss value calculated by comparing the place predicted value and the place feature value is less than a threshold value.

장소 특징값은 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 장소 정보 추출부(100)에 포함되어 구비된 저장부(미도시)로부터 입력될 수 있으며, 별도의 데이터베이스(미도시)에 저장되어 있을 수 있다. 또는 인터넷을 통해서 원격의 서버로부터 복수의 입력 데이터를 입력 받을 수도 있다.The place feature value may be input from a storage unit (not shown) included in the place information extraction unit 100 of the place recognition device based on an artificial neural network according to an embodiment of the present invention, and a separate database (not shown). ) May be stored. Alternatively, a plurality of input data may be received from a remote server through the Internet.

본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 학습 진행부는 영상 단위 장소 인식 데이터를 사용하여 객체와 장소의 상관관계 점수 행렬을 생성하며, 인공 신경망 구조로 모듈화 하여 약한 지도 학습 알고리즘으로 상관관계 점수 행렬을 생성한다. The learning progress unit of the artificial neural network-based place recognition apparatus according to an embodiment of the present invention generates a correlation score matrix between objects and places using image unit place recognition data, and modulates the artificial neural network structure into a weak supervised learning algorithm. Generate a correlation score matrix.

도 3은 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 합성곱 신경망을 포함한 예시적인 구조를 나타낸 도면이다.3 is a diagram illustrating an exemplary structure including a convolutional neural network of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.

도 3은 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 전체적인 인공 신경망 구조를 나타낸 것이다.3 is a diagram illustrating an overall structure of an artificial neural network of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치는 영상 단위 장소 인식 데이터와 CNN을 사용하여 객체와 장소의 상관관계 점수 행렬을 약한 지도 학습 방법으로 생성한다. The apparatus for recognizing a place based on an artificial neural network according to an embodiment of the present invention generates a correlation score matrix between an object and a place by using image unit place recognition data and a CNN using a weak supervised learning method.

종래의 경우, 객체와 장소 상관관계 점수 행렬을 생성하기 위해 일반적으로 객체와 장소의 정보가 함께 제공되는 데이터셋이 필요하다. 하지만 기존의 공개되어있는 데이터셋들은 객체 또는 장소 하나의 정보만 제공하기 때문에 본 발명의 경우, 데이터셋을 이용하여 객체와 장소의 상관관계 점수 행렬을 생성할 수 있는 약한 지도 학습 방식을 제안한다.In the conventional case, in order to generate an object and place correlation score matrix, a data set provided with information of an object and a place is generally required. However, since existing datasets only provide information on one object or place, the present invention proposes a weak supervised learning method capable of generating a correlation score matrix between objects and places using the dataset.

도 3을 참조하면, 입력부(210)는 상기 인식하고자 하는 장소(212)가 포함된 분석 대상 영상(211)을 입력 받는다. Referring to FIG. 3, the input unit 210 receives an analysis target image 211 including the location 212 to be recognized.

여기서, 합성곱 신경망(Convolutional Neural Network, CNN)으로 구현되는 인공 신경망은 다수개의 컨볼루션 모듈(221, 222, 223)을 포함하며, 제1 컨볼루션 모듈은 제1 컨볼루션 레이어(221a, 221b, 221c), 제1 매핑부(ReLU)를 포함하여 구성될 수 있다.Here, the artificial neural network implemented as a convolutional neural network (CNN) includes a plurality of convolution modules 221, 222, 223, and the first convolution module includes first convolution layers 221a, 221b, 221c) and a first mapping unit (ReLU).

본 발명의 실시예에 따른 제1 컨볼루션 레이어(convolution layer)는 제1 국부 영상을 컨볼루션 연산을 통해 컨볼루션 필터링함에 따라 제1 컨볼루션 특징영상을 추출할 수 있다.The first convolution layer according to an embodiment of the present invention may extract a first convolutional feature image by convolution filtering the first local image through a convolution operation.

그리고, 본 발명의 실시예에 따른 제1 매핑부는, 활성화 함수인 ReLU(Rectified Linear Unit)일 수 있다. 제1 매핑부(ReLU)는 상기 제1 컨볼루션 레이어를 통해 추출된 제1 컨볼루션 특징영상을 미리 정해진 함수에 따라 매핑함으로써, 상기 제1 컨볼루션 특징영상이 선형화하여 활성화된 제1 매핑 영상을 산출할 수 있다.In addition, the first mapping unit according to an embodiment of the present invention may be a rectified linear unit (ReLU) that is an activation function. The first mapping unit (ReLU) maps the first convolutional feature image extracted through the first convolutional layer according to a predetermined function, so that the first convolutional feature image is linearized and activated. Can be calculated.

도 3에 나타난 바와 같이 인공 신경망 구조를 구축하기 위해 장소 인식 데이터셋과 객체 인식 데이터셋에 학습된 Deep CNN이 필요하다. 장소 인식 데이터셋으로 공개된 데이터셋인 Places2, SUN 397, MIT indoor67, Scene 15 등이 사용되는 것이 바람직하다. 그리고 Deep CNN으로는 기존에 많이 사용되는 AlexNet, ResNet, DenseNet 등을 ImageNet과 같은 객체 인식 데이터셋에 미리 학습한 구조를 사용하는 것이 바람직하다. As shown in FIG. 3, in order to construct an artificial neural network structure, a place recognition dataset and a deep CNN trained on an object recognition dataset are required. It is preferable to use data sets that are publicly available as place-aware datasets, such as Places2, SUN 397, MIT indoor67, and Scene 15. In addition, it is desirable to use a structure that has been learned in advance for an object recognition dataset such as ImageNet, such as AlexNet, ResNet, and DenseNet, which are widely used in the past as Deep CNN.

본 발명의 다양한 실시예에 의해서, 상기 공개된 데이터 셋들 이외에도, 인공 신경망 구조를 구축하기 위한 장소 인식 데이터셋과 객체 인식 데이터셋의 기능을 구현할 수 있는 데이터 셋들을 사용할 수 있다.According to various embodiments of the present invention, in addition to the disclosed data sets, a place recognition data set for constructing an artificial neural network structure and data sets capable of implementing the function of an object recognition data set may be used.

영상 단위 장소 인식 데이터셋을 사용하여 영상의 장소 정보를 얻어내고, Deep CNN에서 주어진 영상에 대해 객체의 정보를 추출해낸다. 이 두 개의 정보를 사용하여 객체와 장소의 상관관계 점수 행렬 학습을 진행한다. X는 Deep CNN에서 추출된 객체 점수 벡터이고, 객체와 장소의 상관관계 점수 행렬을 M, 이 상관관계 행렬을 거쳐서 나온 장소 점수 벡터를 Y라 할 때, 객체와 장소의 상관관계 점수 행렬을 사용한 연산은 [수학식 1]과 같이 진행된다.The location information of the image is obtained using the image unit location recognition dataset, and the information of the object for the given image is extracted from Deep CNN. Using these two pieces of information, the correlation score matrix between objects and places is trained. X is the object score vector extracted from Deep CNN, and the correlation score matrix between the object and the place is M , and the place score vector obtained through the correlation matrix is Y , the calculation using the correlation score matrix between the object and place. Proceeds as in [Equation 1].

여기서 객체의 종류 개수를 n, 장소의 종류 개수를 m이라 하면, (여기서, n과 m은 자연수이다.) 상기 [수학식 1]에 사용된 변수들의 차원은 각각

,

이다. 예로 들어, 책상과 칠판이 포함된 교실 영상의 경우, 먼저 객체 분류기인 객체 정보 추출부의 Deep CNN에서 객체를 찾아낸다. 하지만 이 분류기가 100% 완벽한 검출을 진행하진 못하기 때문에 책상과 칠판 이외의 다른 객체가 존재한다고 인식 할 수 있지만 그 점수는 매우 낮을 것이다. 따라서 이를 사용하여 추출된 객체 점수로 장소를 인식하는 것을 시각화 하면 하기 도 4와 같이 나타난다.Here, if the number of types of objects is n and the number of types of places is m (here, n and m are natural numbers), the dimensions of the variables used in [Equation 1] are respectively

,

to be. For example, in the case of a classroom image including a desk and a blackboard, the object is first found in Deep CNN of the object information extraction unit, which is an object classifier. However, since this classifier cannot perform 100% perfect detection, it can be recognized that objects other than desks and blackboards exist, but the score will be very low. Therefore, when visualizing the recognition of the place with the object score extracted using this, it appears as shown in FIG. 4 below.

도 4는 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 장소 점수 벡터 산출 과정을 시각화하여 나타낸 도면이다.4 is a diagram illustrating a process of calculating a place score vector of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.

도 4는 추출된 객체 점수로 장소를 인식하는 것을 시각화한 것이며, 산출된 장소 점수 벡터를 인공 신경망의 구조로 추가할 경우 하기 [수학식 2]와 같이 학습이 된다.4 is a visualization of recognition of a place by the extracted object score, and when the calculated place score vector is added to the structure of an artificial neural network, learning is performed as shown in [Equation 2] below.

여기서 L은 상기 도 3의 트레이닝부(340)에서 인공 신경망을 트레이닝을 수행하며 계산되는 상관관계 점수 행렬의 손실값이다. 학습이 진행되어 객체와 장소의 상관관계 점수행렬 M은 ΔM값에 의해 학습이 된다. 일반적으로 ΔM값이 계산이 되는지의 여부가 인공 신경망의 학습 가능 여부를 판단하게 된다.

는 인공신경망 에서의 일반적인 역전파된 오류이며

가 계산이 가능하다는 것을 상기 [수학식 2]에서 보여

의 계산이 가능하다. 즉, ΔM가 계산 가능하므로 객체와 장소의 상관관계 점수행렬은 인공신경망 구조로 학습이 가능하다.Here, L is a loss value of the correlation score matrix calculated by training the artificial neural network in the training unit 340 of FIG. 3. As learning progresses, the correlation score matrix M between objects and places is learned by the value of ΔM . In general, whether or not the ΔM value is calculated determines whether or not the artificial neural network can be trained.

Is a common backpropagated error in artificial neural networks.

It is shown in [Equation 2] that is possible to calculate

Can be calculated. That is, since ΔM can be calculated, the correlation score matrix between objects and places can be learned with an artificial neural network structure.

또한, 이 상관관계 점수 행렬이 오류를 역전파 할 수 있음을 [수학식 3]에서 나타낼 수 있다.In addition, it can be expressed in [Equation 3] that this correlation score matrix can backpropagate errors.

객체와 장소의 상관관계 점수 행렬의 입력이었던 X에 대해 미분식을 계산할 수 있으면 오류를 역전파 할 수 있다. 상기 [수학식 3]에서 이 미분식이 계산 가능함을 보임으로써 상관관계 점수 행렬은 오류를 역전파하는데 문제가 없음을 나타낸다.If the differential equation can be calculated for X , which was the input of the correlation score matrix between objects and places, errors can be backpropagated. By showing that this differential equation can be calculated in [Equation 3], the correlation score matrix indicates that there is no problem in backpropagating errors.

상기 [수학식 2]와 [수학식 3]에서 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 객체와 장소의 상관관계 점수 행렬은 학습이 가능하고 오류를 역전파할 수 있음을 보였다. 따라서 제안하는 상관관계 점수 행렬은 인공 신경망의 구조로 모듈화가 가능하고 이를 사용한 전체 인공신경망 구조는 종단간 학습이 가능함을 알 수 있다.In [Equation 2] and [Equation 3], it is understood that the correlation score matrix between the object and the place of the artificial neural network-based place recognition apparatus according to an embodiment of the present invention can be learned and errors can be backpropagated. Showed. Therefore, it can be seen that the proposed correlation score matrix can be modularized into the structure of an artificial neural network, and the entire artificial neural network structure using this can be learned end-to-end.

도 5는 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 분석에 따른 객체의 상관관계 가중치와 장소의 상관관계 가중치를 나타낸 그래프이다.5 is a graph showing a correlation weight of an object and a correlation weight of a place according to an analysis of a place recognition apparatus based on an artificial neural network according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 장치에서 상기 도 5에 예시된 구조로 학습이 완료된 객체와 장소의 상관관계 점수 행렬을 분석하기 위해 먼저 객체와 장소가 서로에게 얼마나 많은 영향을 미치는지 분석할 수 있다. In the artificial neural network-based location recognition apparatus according to an embodiment of the present invention, in order to analyze the correlation score matrix between the learned object and the place with the structure illustrated in FIG. 5, how much influence the object and the place have on each other. You can analyze if it's crazy.

도 5의 (a)는 객체와 장소의 상관관계 점수 행렬

의 절대값을 모든 장소에 대해서만 더하여 객체의 상관관계 가중치

를 계산한 것이고, 도 5의 (b)는 모든 객체에 대하여 더한 장소의 상관관계 가중치

를 계산하여, 오름차순으로 정렬하여 그래프로 출력한 것이다.5A is a score matrix for correlation between objects and places

The correlation weight of the object by adding the absolute value of

Is calculated, and (b) of FIG. 5 is the correlation weight of the added place for all objects

Is calculated, sorted in ascending order, and printed as a graph.

여기서, Deep CNN 학습에 사용된 데이터 셋은 ImageNet으로 1000개의 객체 종류가 있으며, 분류할 장소에 사용된 데이터 셋은 Places 2으로 365개의 장소 종류가 있다. 대체적으로 장소에 영향을 미치는 객체들은 특정 객체에 치우치지 않고 고르게 분포되어 있다. 반대로 장소의 경우 특정 장소들이 객체에 많은 영향을 미치는 것을 확인할 수 있다. 따라서 어떤 장소들이 객체에 많은 영향을 미치는지 확인하기 위해 가중치가 높은 장소를 내림차순으로 5개를 추출하여 하기 [표 1]과 같이 출력할 수 있다.Here, the data set used for Deep CNN training is ImageNet, and there are 1000 object types, and the data set used for the place to be classified is Places 2 and 365 place types. In general, objects affecting a place are evenly distributed without being biased to a specific object. Conversely, in the case of places, it can be seen that certain places have a lot of influence on the object. Therefore, in order to check which places have a lot of influence on the object, 5 places with high weight can be extracted in descending order and output as shown in [Table 1] below.

상기 [표 1]을 분석한 결과 주로 야외의 장소들이 객체에 많은 영향을 받았으며 field/wild 가 가장 큰 영향을 받음을 알 수 있다.As a result of analyzing the above [Table 1], it can be seen that mainly outdoor places were affected by the object, and field/wild was most affected.

실제로 field/wild 장소에는 많은 객체들이 존재하며 같은 종류의 장소이더라도 각각의 영상에서는 다른 종류의 객체가 나타나고 있다. 이는 여러 객체가 장소에 골고루 영향을 미치는 결과와 부합하다. 또한 field/wild 장소가 객체에 많은 영향을 받는 것으로 학습이 잘 되었음을 확인할 수 있다. Actually, there are many objects in the field/wild place, and even if the place is the same type, different types of objects appear in each image. This is consistent with the result that several objects evenly affect the place. In addition, it can be confirmed that the learning is well done because the field/wild place is affected by the object a lot.

위에서 제안된 방식대로 장소 인식 데이터만을 사용하여 객체와 장소의 상관관계 점수 행렬을 만들 수 있다. 다른 두 종류에 대한 상관관계를 파악하기 위해선 하나의 영상에 두 종류에 대한 모든 정보가 있었어야 했던 기존의 통계적 방식과는 달리 하나의 종류에 대한 정보가 제공되는 데이터셋은 공개된 데이터셋으로 쉽게 찾을 수 있다. 따라서 이 방식을 사용하게 되면 추가적인 노동력 없이 딥 러닝 방식으로 쉽게 학습이 가능하다.According to the method proposed above, the correlation score matrix between objects and places can be created using only place recognition data. Unlike the previous statistical method, where information on the two types had to be present in one image to determine the correlation between the other two types, a dataset that provides information on one type is easily found as an open data set. I can. Therefore, if this method is used, it is possible to learn easily with the deep learning method without additional labor.

도 6은 본 발명의 또 다른 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 합성곱 신경망을 포함한 예시적인 구조를 나타낸 도면이다.6 is a diagram showing an exemplary structure including a convolutional neural network of a place recognition apparatus based on an artificial neural network according to another embodiment of the present invention.

도 6에 나타난 바와 같이 본 발명의 또 다른 실시예에 따른 인공 신경망 기반의 장소 인식 장치의 합성곱 신경망을 포함한 예시적인 구조를 구축하며, 특정 CNN에 국한되지 않고 일반화 되었음을 보이기 위해 기존에 많이 사용되는 AlexNet, ResNet-18, ResNet-50, DenseNet-161에 적용하는 것이 바람직하다. 또한 그 결과는 [표 2]와 같이 나타난다.As shown in FIG. 6, an exemplary structure including a convolutional neural network of a place recognition device based on an artificial neural network according to another embodiment of the present invention is constructed, and is not limited to a specific CNN, but is commonly used to show that it has been generalized. It is preferable to apply to AlexNet, ResNet-18, ResNet-50, and DenseNet-161. In addition, the results are shown in [Table 2].

[표 2]에 나타난 바와 같이 모든 CNN에서 성능이 오르는 것을 확인할 수 있다. 또한 도 6의 구조는 종단간 학습이 가능하여 인공신경망 학습에 매우 용이한 구조이다. 만약 도 6에서 제안된 방식을 통계학의 상관계수로 바꿔서 사용한다면 종단간 학습이 불가능할 뿐 아니라 상관계수에 어울리는 추가적인 인공신경망 모듈이 추가되어야 한다. 즉, 제안된 방식은 추가적인 모듈 없이도 사용이 가능하며, 어떠한 CNN에도 적용이 가능하다는 장점이 있다. As shown in [Table 2], it can be seen that the performance increases in all CNNs. In addition, the structure of FIG. 6 is very easy for learning an artificial neural network because end-to-end learning is possible. If the method proposed in FIG. 6 is used by changing the correlation coefficient of statistics, end-to-end learning is not possible, and an additional artificial neural network module suitable for the correlation coefficient must be added. That is, the proposed method has the advantage that it can be used without an additional module and can be applied to any CNN.

도 7 및 도 8은 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 방법을 나타낸 흐름도이다.7 and 8 are flowcharts illustrating a method of recognizing a place based on an artificial neural network according to an embodiment of the present invention.

도 7을 참조하면, 본 발명의 일 실시예에 따른 인공 신경망 기반의 장소 인식 방법은, 장소 정보 추출부가 인식하고자 하는 장소가 포함된 분석 대상 영상에서 영상 단위 장소 인식 데이터셋을 사용하여 상기 분석 대상 영상의 장소 정보를 장소 특징값으로 추출하는 단계(S100)에서 시작한다.Referring to FIG. 7, in an artificial neural network-based place recognition method according to an embodiment of the present invention, the analysis target is performed by using an image unit place recognition dataset in an analysis target image including a place to be recognized by a place information extraction unit. It starts in step S100 of extracting the location information of the image as a location feature value.

단계 S200에서 객체 정보 추출부는 객체 인식 데이터셋에 미리 학습된 합성곱 신경망(Convolutional Neural Network, CNN)을 이용하여 상기 인식하고자 하는 장소가 포함된 분석 대상 영상에 포함된 다수의 객체와 관련된 정보를 객체 특징값으로 추출한다.In step S200, the object information extraction unit uses a convolutional neural network (CNN) learned in advance in the object recognition dataset to collect information related to a plurality of objects included in the analysis target image including the location to be recognized. Extracted as a feature value.

단계 S300에서 학습 진행부는 상기 분석 대상 영상에 포함된 다수의 객체와 관련된 장소 인식을 위한 객체-장소 상관도를 변환 파라미터로 나타내는 인공 신경망 기반의 변환 모델을 생성하고, 상기 객체 특징값을 기반으로 상기 변환 파라미터를 조정하여 변환 모델을 갱신한다.In step S300, the learning progress unit generates a transformation model based on an artificial neural network representing an object-place correlation for place recognition related to a plurality of objects included in the analysis target image as a transformation parameter, and the object feature value The transformation model is updated by adjusting the transformation parameters.

도 8을 참조하면, 변환 파라미터를 조정하여 상기 변환 모델을 갱신하는 단계(S300)는,Referring to FIG. 8, the step (S300) of updating the transformation model by adjusting a transformation parameter,

단계 S310에서 객체 점수 벡터 산출부가 상기 객체 정보 추출부에서 추출된 상기 객체 특징값을 기반으로 객체 점수 벡터를 산출한다.In step S310, the object score vector calculation unit calculates an object score vector based on the object feature value extracted by the object information extraction unit.

단계 S320에서 상관관계 점수 행렬 산출부가 상기 객체-장소 상관도를 나타내는 변환 파라미터로 상관관계 점수 행렬을 산출한다.In step S320, the correlation score matrix calculator calculates a correlation score matrix as a transformation parameter indicating the object-place correlation.

단계 S330에서 장소 점수 벡터 산출부가 상기 상관관계 점수 행렬과 상기 객체 점수 벡터의 곱의 연산을 수행하여 상기 장소 점수 벡터를 산출한다.In step S330, the place score vector calculation unit calculates the place score vector by multiplying the correlation score matrix and the object score vector.

단계 S340에서 트레이닝부가 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터를 이용하여 상기 상관관계 점수 행렬의 손실값을 계산하고, 상기 손실값을 기반으로 상기 변환 파라미터를 조정하여 상기 인공 신경망을 트레이닝 한다.In step S340, the training unit calculates a loss value of the correlation score matrix using the place score vector calculated by the place score vector calculation unit, and trains the artificial neural network by adjusting the transformation parameter based on the loss value. do.

구체적으로, 인공 신경망을 트레이닝 하는 단계(S340)는, 손실값 계산부가 상기 장소 점수 벡터 산출부에서 산출된 상기 장소 점수 벡터로 장소 예측값을 출력하고, 상기 장소 예측값과 장소 특징값을 비교하여 상기 상관관계 점수 행렬의 손실값을 계산하는 단계 및 조정부가 상기 손실값을 기반으로 역전파(Back Propagation) 알고리즘을 이용하여 상기 변환 파라미터를 조정하는 단계를 포함하며, 상기 장소 예측값과 장소 특징값을 비교하여 계산한 상기 손실값이 임계치보다 작을 때까지 상기 변환 파라미터를 조정하여 상기 변환 모델을 갱신한다.Specifically, in the step of training the artificial neural network (S340), the loss value calculator outputs a place prediction value using the place score vector calculated by the place score vector calculator, and compares the place prediction value with the place feature value to obtain the correlation. Computing a loss value of the relationship score matrix, and adjusting the transformation parameter by using a back propagation algorithm based on the loss value, and comparing the predicted location value with the location feature value. The transformation model is updated by adjusting the transformation parameter until the calculated loss value is less than a threshold value.

또한, 인공 신경망 기반의 장소 인식 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, it provides a computer-readable recording medium in which a program for executing an artificial neural network-based place recognition method on a computer is recorded.

이러한 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(Floptical disk)와 같은 자기-광매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Such a computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler but also high-level language codes that can be executed by a computer using an interpreter or the like. The above-described hardware device may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

이상의 설명은 본 발명의 일 실시예에 불과할 뿐, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명의 본질적 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현할 수 있을 것이다. 따라서 본 발명의 범위는 전술한 실시예에 한정되지 않고 특허 청구 범위에 기재된 내용과 동등한 범위 내에 있는 다양한 실시 형태가 포함되도록 해석되어야 할 것이다.The above description is only an embodiment of the present invention, and those of ordinary skill in the technical field to which the present invention pertains may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the scope of the present invention is not limited to the above-described embodiments, and should be construed to include various embodiments within the scope equivalent to those described in the claims.

10: 인공 신경망 기반의 장소 인식 장치
100: 장소 정보 추출부
200: 객체 정보 추출부
300: 학습 진행부
310: 객체 점수 벡터 산출부
320: 상관관계 점수 행렬 산출부
330: 장소 점수 벡터 산출부
340: 트레이닝부10: artificial neural network based place recognition device
100: location information extraction unit
200: object information extraction unit
300: Learning Progress
310: object score vector calculation unit
320: correlation score matrix calculation unit
330: place score vector calculation unit
340: training department

Claims

An object information extracting unit for extracting information related to a plurality of objects included in an analysis target image including a location to be recognized as an object feature value; And
A transformation model based on an artificial neural network representing an object-place correlation for place recognition related to a plurality of objects included in the analysis target image is generated as a transformation parameter, and the transformation parameter is adjusted based on the object feature value. Includes; a learning progress unit for updating the transformation model,
The learning progress unit may include an object score vector calculation unit that calculates an object score vector based on the object feature value extracted by the object information extraction unit;
A correlation score matrix calculating unit that calculates a correlation score matrix as a transformation parameter representing the object-location correlation;
A place score vector calculating unit that calculates a place score vector by performing an operation of a product of the correlation score matrix and the object score vector; And
The artificial neural network calculates a loss value of the correlation score matrix using the previously extracted location feature value and the location score vector calculated by the location score vector calculation unit, and adjusts the transformation parameter based on the loss value. Includes; a training unit to train
The correlation score matrix is an artificial neural network-based place recognition apparatus, characterized in that a correlation index representing a correlation between an object included in each combination variable among the analysis target images and a place is expressed as a matrix.

The method of claim 1,
The object information extraction unit,
An input unit receiving an analysis target image including the location to be recognized;
An object tracking unit for tracking a plurality of objects included in the analysis target image by using a convolutional neural network (CNN) learned in advance on an object recognition dataset; And
And an object feature value extracting unit that extracts information related to a plurality of tracked objects as object feature values.

The method of claim 1,
An artificial neural network comprising: a place information extracting unit that extracts place information of the analysis target image as a place feature value using an image unit place recognition data set from the analysis target image including the place to be recognized. Based place recognition device.

delete

The method of claim 3,
The object feature value,
n (where n is a natural number) are elements of a dimensional array (tensor),
The object score vector calculation unit,
An artificial neural network-based place recognition apparatus, characterized in that converting each of the elements into n normalized vector data.

The method of claim 3,
The training unit,
A loss value calculator configured to calculate a loss value of the correlation score matrix by outputting a place prediction value using the place score vector calculated by the place score vector calculating unit, and comparing the place prediction value with a place feature value; And
And an adjustment unit that adjusts the transformation parameter by using a back propagation algorithm based on the loss value.

The method of claim 7,
The training unit,
And updating the transformation model by adjusting the transformation parameter until the loss value calculated by comparing the place prediction value and the place feature value is less than a threshold value.

The method of claim 2,
The object recognition data set,
An artificial neural network-based place recognition apparatus, characterized in that, in the analysis target image, it is a set of areas capable of detecting objects related to the location to be recognized.

An object score vector calculation unit that calculates an object score vector based on the object feature values extracted from the object information extraction unit;
A correlation score matrix calculator that calculates a correlation score matrix as a transformation parameter representing an object-place correlation;
A place score vector calculating unit that calculates a place score vector by performing an operation of a product of the correlation score matrix and the object score vector; And
The loss value of the correlation score matrix is calculated using the previously extracted place feature value and the place score vector calculated by the place score vector calculation unit, and an artificial neural network is constructed by adjusting the transformation parameter based on the loss value. Including; training unit for training,
The correlation score matrix generation algorithm apparatus, characterized in that the correlation score matrix expresses a correlation index indicating a correlation between an object and a place included in each combination variable among images to be analyzed.

The method of claim 10,
The training unit,
A place prediction value is output from the place score vector calculated by the place score vector calculation unit, and a place of the analysis target image using an image unit place recognition dataset in the analysis target image including the place prediction value and the place to be recognized A loss value calculation unit that compares the location feature values from which information is extracted and calculates a loss value of the correlation score matrix; And
And an adjustment unit that adjusts the transform parameter using a back propagation algorithm based on the loss value.

The method of claim 11,
The training unit,
And updating the transform model by adjusting the transform parameter until the loss value calculated by comparing the place prediction value with the place feature value is less than a threshold value.

Extracting place information of the analysis target image as a place feature value using an image unit place recognition dataset from an analysis target image including a place to be recognized by a place information extraction unit;
The object information extraction unit uses a convolutional neural network (CNN) learned in advance in the object recognition dataset, and uses information related to a plurality of objects included in the analysis target image including the location to be recognized as object feature values. Extracting; And
The learning progress unit generates a transformation model based on an artificial neural network representing an object-place correlation for recognizing a place related to a plurality of objects included in the analysis target image as a transformation parameter, and calculates the transformation parameter based on the object feature value. Including; adjusting and updating the transformation model
The step of updating the transformation model by adjusting the transformation parameter,
Calculating, by an object score vector calculation unit, an object score vector based on the object feature values extracted by the object information extraction unit;
Calculating, by a correlation score matrix calculator, a correlation score matrix as a transformation parameter representing the object-place correlation;
Calculating, by a place score vector calculating unit, a product of the correlation score matrix and the object score vector; And
The training unit calculates a loss value of the correlation score matrix using the extracted place feature value and the place score vector calculated by the place score vector calculation unit, and adjusts the transformation parameter based on the loss value. Including; training an artificial neural network
The correlation score matrix is an artificial neural network-based place recognition method, characterized in that a correlation index representing a correlation between an object and a place included in each combination variable among the analysis target images is expressed as a matrix.

delete

The method of claim 13,
Training the artificial neural network,
Calculating a loss value of the correlation score matrix by a loss value calculation unit outputting a place prediction value using the place score vector calculated by the place score vector calculation unit, and comparing the place prediction value with a place feature value; And
And adjusting the transformation parameter by using a back propagation algorithm based on the loss value by an adjustment unit.

The method of claim 15,
Training the artificial neural network,
And updating the transform model by adjusting the transform parameter until the loss value calculated by comparing the place prediction value and the place feature value is less than a threshold value.

A computer-readable recording medium storing a program for executing the method of claim 13, 15, or 16 on a computer.