KR20170064985A

KR20170064985A - Method and device for processing graphics data in graphics processing unit

Info

Publication number: KR20170064985A
Application number: KR1020160128568A
Authority: KR
Inventors: 안쿠르 데시왈; 솜야 싱할; 케샤반 바라다라잔; 소마 콜리
Original assignee: 삼성전자주식회사
Priority date: 2015-12-02
Filing date: 2016-10-05
Publication date: 2017-06-12
Also published as: KR102636101B1

Abstract

GPU에서 그래픽 데이터를 처리하는 방법 및 장치를 제공한다. 극점 데이터를 이미지의 후보 DOG 레이어를 수신한다. 이미지의 중간 레이어로서 후보 DOG 레이어를 획득한다. 또한, 후보 DOG 레이어의 값들을 하위 DOG 레이어 및 상위 DOG 레이어의 값들과 비교함으로써 극점 데이터를 나타내는 극점 데이터를 획득한다. 획득된 극점 데이터들을 저장한다.A method and apparatus for processing graphics data on a GPU are provided. The poles data is received in the candidate DOG layer of the image. Obtain the candidate DOG layer as an intermediate layer of the image. Further, pole data representing the pole data is obtained by comparing the values of the candidate DOG layer with the values of the lower DOG layer and the upper DOG layer. And stores the acquired pole data.

Description

TECHNICAL FIELD [0001] The present invention relates to a method and apparatus for processing graphic data in a GPU,

본 개시는 이미지 처리 방법 및 장치와 관련된 것으로서, 보다 상세하게는, 타일 기반 그래픽 처리 유닛(Graphics Processing Unit, 이하 GPU)에서 극 값을 획득하는 방법 및 장치와 관련된다.This disclosure relates to an image processing method and apparatus, and more particularly to a method and apparatus for obtaining a pole value in a tile-based graphics processing unit (GPU).

일반적으로, 객체 획득 방법은 물체 감지, 물체 인식 및 얼굴 감지 등의 컴퓨터 비전 어플리케이션들(computer vision applications)에서 이용될 수 있는 이미지 내 지역 특징(local feature) 또는 특징점(interest point)을 식별한다. 객체 획득 방법은 컴퓨터화된(computerized) 물체 인식, 물체 획득, 이미지 매칭 및 3D(three dimensional) 재구성에 대한 다양한 접근법을 제공한다. 이미지 내 지역 특징 또는 특징점을 식별하기 위해 GPU에서 다양한 연산이 수행된다.In general, object acquisition methods identify local features or interest points in an image that can be used in computer vision applications such as object detection, object recognition, and face detection. Object acquisition methods provide various approaches to computerized object recognition, object acquisition, image matching, and 3D reconstruction. Various operations are performed in the GPU to identify local features or feature points in the image.

이미지 내 특징점은 극 값들 검색 이후에 수행되는 일련의 필터링 연산과 같은 이미지 함수에 기초하여 정의될 수 있다. 극 값들은 객체의 중요한 특징 중 하나이다. 극 값들은 일반적으로 이미지의 참조 프레임에 대한 객체의 극좌, 극우, 극상 및 극하 점들로 정의된다. 이와 같은 극점 데이터들에 의해 객체를 포함하는 직사각형 모양의 바운딩 박스(bounding box)가 정의된다. 객체의 면밀한 특성을 식별하기 위해 분석될 이미지 영역을 한정하는데 바운딩 박스가 이용된다.The feature points in the image can be defined based on an image function such as a series of filtering operations performed after the search for pole values. The pole values are one of the important features of the object. The pole values are generally defined as the extremum, extreme, extreme, and extreme points of the object relative to the reference frame of the image. A bounding box of a rectangular shape including the object is defined by the pole data. A bounding box is used to define the image area to be analyzed in order to identify the intimate characteristics of the object.

SIFT(Scale Invariant Feature Transform)는 빛, 이미지 노이즈, 회전, 스케일, 관점 등의 변화에 대해 불변하는 지역 특징 기술자들(descriptors)을 획득하고 추출하는 방법이다. 물체 인식, 얼굴 인식, 물체 획득, 이미지 매칭, 3D 구조 구성, 스테레오 대응(stereo correspondence) 및 움직임 추적(motion tracking)과 같은 컴퓨터 비전 문제들에 SIFT 적용될 수 있다. SIFT는 시간 소모가 큰 작업일 수 있으며, 몇몇 경우(예를 들어, 온라인 물체 인식)에 SIFT 특징들이 실시간으로 추출 및 매칭되어야 한다. SIFT 특징 추출은 GPU에서 실행 또는 가속된다. 그러나 SIFT는 연속적이며, 단일 프로세서 시스템에서 동작하도록 한다. SIFT의 병렬화는 부하 불균형을 야기하여 스케일링 효율을 저하시킬 수 있다.Scale Invariant Feature Transform (SIFT) is a method for obtaining and extracting unchanging local feature descriptors for changes in light, image noise, rotation, scale, perspective, and so on. SIFT can be applied to computer vision problems such as object recognition, face recognition, object acquisition, image matching, 3D structure construction, stereo correspondence and motion tracking. SIFT can be a time-consuming task, and SIFT features should be extracted and matched in real time in some cases (e.g., online object recognition). SIFT feature extraction is executed or accelerated in the GPU. However, SIFT is continuous and allows it to operate on a single processor system. Parallelization of SIFT may cause load imbalance and may degrade the scaling efficiency.

종래 시스템들에서는 객체 획득을 위해 다양한 방법을 이용하였다. 그러나, 종래의 방법들은 고전력 및 연산 리소스가 필요하였다. GPU들의 메모리와 연산 능력은 제한되어 있고 GPU는 에너지에 민감하므로(energy sensitive), 에너지를 효율적으로 이용할 수 있는 극 값 획득 방법의 필요성이 대두된다.Conventional systems use various methods for object acquisition. Conventional methods, however, required high power and computing resources. The memory and computing power of GPUs is limited and GPUs are energy sensitive, so there is a need for a method of obtaining a pole value that can efficiently use energy.

타일 기반 GPU에서 극 값을 획득하는 방법을 제공한다. 또한, 후보(candidate) DOG(Difference of Gaussian) 레이어(layer)를 수신하는 방법을 제공한다. 이미지 타일들에 기초하여 후보 DOG 레이어 연산이 수행된다. 이미지의 중간 레이어(intermediate layer)로서 후보 DOG 레이어를 획득하는 방법을 제공한다. 후보 DOG 레이어의 값들과 하위 및 상위 DOG 레이어들의 값들을 비교함으로써 극점을 나타내는 극점 데이터를 획득하는 방법을 제공한다. 극 값 버퍼(extrema buffer unit)에 획득된 극점 데이터를 저장하는 방법을 제공한다. Provides a method for obtaining pole values in a tile-based GPU. Also provided is a method of receiving a candidate Difference of Gaussian (DOG) layer. A candidate DOG layer operation is performed based on the image tiles. A method for acquiring a candidate DOG layer as an intermediate layer of an image is provided. And comparing the values of the candidate DOG layer with the values of the lower and upper DOG layers to obtain pole data representing the pole. And a method of storing the acquired pole data in an extreme buffer unit.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 1 측면은, 이미지의 DOG(Difference of Gaussian) 레이어를 수신하는 단계; 수신된 DOG 레이어로부터 중간 레이어로서 상기 이미지의 후보 DOG(candidate Difference of Gaussian) 레이어를 상기 DOG 레이어로부터 획득하는 단계; 상기 후보 DOG 레이어의 값들을 하위(previous) DOG 레이어 및 상위(next) DOG 레이어의 값들과 비교함으로써 하나 이상의 극점을 나타내는 극점 데이터를 획득하는 단계; 및 상기 하나 이상의 극점 데이터를 버퍼에 저장하는 단계;를 포함하는, 타일 기반 GPU(Graphics Processing Unit)에서 극 값(extrema)을 획득하는 방법을 제공할 수 있다.As a technical means for achieving the above technical object, one aspect of the present invention provides a method of generating a digital image, comprising: receiving a Difference of Gaussian (DOG) layer of an image; Obtaining a candidate Difference of Gaussian (DOG) layer of the image from the DOG layer as an intermediate layer from the received DOG layer; Obtaining pole data representing at least one pole by comparing values of the candidate DOG layer with values of a previous DOG layer and a next DOG layer; And storing the one or more pole data in a buffer. The method may further include obtaining a pole extreme from a tile-based graphics processing unit (GPU).

또한, 본 발명의 2 측면은, 이미지의 후보 DOG 레이어를 수신하고, 중간 레이어로서 상기 이미지의 상기 후보 DOG 레이어를 획득하고, 상기 후보 DOG 레이어의 값들을, 하위 DOG 레이어 및 상위 DOG 레이어의 값들과 비교함으로써 하나 이상의 극점 데이터를 획득하는, GPU; 및 상기 하나 이상의 극점 데이터를 저장하는 버퍼;를 포함하는, GPU를 포함하는 극점 데이터 획득 장치를 제공할 수 있다.According to another aspect of the present invention, there is provided an image processing method including receiving a candidate DOG layer of an image, acquiring the candidate DOG layer of the image as an intermediate layer, calculating values of the candidate DOG layer with values of a lower DOG layer and a higher DOG layer A GPU for obtaining one or more pole data by comparison; And a buffer for storing the one or more pole data. The present invention can provide a pole data acquiring device including a GPU.

또한, 본 발명의 3 측면은, 제 1 측면의 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공할 수 있다.In addition, the three aspects of the present invention can provide a computer-readable recording medium on which a program for causing a computer to execute the method of the first aspect is recorded.

도 1은 일 실시예에 따른, 스케일-공간 피라미드에서의 가우시안 이미지를 설명하는 도면이다.
도 2는 일 실시예에 따른, 극 값을 결정하는 방법을 설명하는 도면이다.
도 3은 일 실시예에 따른, GPU를 포함하는 극 값 획득 시스템의 블록도이다.
도 4는 일 실시예에 따른, 이미지의 DOG 레이어 연산 과정을 설명하는 도면이다.
도 5은 일 실시예에 따른, 극점 데이터들을 획득하는 과정을 설명하는 도면이다.
도 6는 일 실시예에 따른, 특징점 로컬라이제이션 과정을 설명하는 도면이다.
도 7는 일 실시예에 따른, GPU에서 극점 데이터를 획득하는 방법을 설명하는 흐름도이다.
도 8은 일 실시예에 따른, GPU에서 극점 데이터를 획득하는 방법을 실행하는 연산 환경을 설명하는 블록도이다.1 is a diagram illustrating a Gaussian image in a scale-space pyramid, according to one embodiment.
2 is a diagram illustrating a method of determining a pole value, according to one embodiment.
3 is a block diagram of a pole value acquisition system including a GPU, in accordance with one embodiment.
4 is a diagram for explaining a DOG layer calculation process of an image according to an embodiment.
5 is a view for explaining a process of obtaining pole data according to an embodiment.
6 is a diagram for explaining a minutiae point localization process according to an embodiment.
7 is a flow diagram illustrating a method for obtaining pole data in a GPU, in accordance with one embodiment.
8 is a block diagram illustrating an arithmetic environment for performing a method of obtaining pole data in a GPU, in accordance with one embodiment.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Whenever a component is referred to as "including" an element throughout the specification, it is to be understood that the element may include other elements, not the exclusion of any other element, unless the context clearly dictates otherwise.

본 발명 실시예들을 이해하는데 도움을 주기 위해 극 값 획득과 관련된 내용이 개시된다.To assist in understanding embodiments of the present invention, content related to polar value acquisition is disclosed.

GPU는 DOG 피라미드(pyramid) 생성 및 극 값 획득 방법을 이용하여 특징점을 획득할 수 있다.The GPU can acquire feature points using the DOG pyramid generation and the pole acquisition method.

DOG 피라미드에 대해 설명한다. DOG는 동일한 그레이스케일 이미지(grayscale image)를 두 개의 저역통과 필터로 필터링(블러)한 버전들의 차를 계산하여 얻어지는 대역통과 필터 연산자를 의미할 수 있다. 각각의 필터링된 버전들은 서로 다른 반지름을 갖는 두 개의 이차원(bi-dimensional) 가우시안 필터들로 이미지를 컨볼빙(convolving)함으로써 획득될 수 있다. 예를 들면, 아래에 개시된 [수학식 1]에 따른 관계가 성립할 수 있다.The DOG pyramid will be described. DOG may refer to a bandpass filter operator that is obtained by calculating the difference between versions that filtered (blurred) the same grayscale image with two lowpass filters. Each filtered version can be obtained by convolving the image with two bi-dimensional Gaussian filters with different radii. For example, the relationship according to the following equation (1) can be established.

[수학식 1]과 관련하여, G(x, y, σ) = e^{-(x² + y^2-)/2σ²}/2?σ², I(x, y)는 입력 이미지, k ∈ R, "*"는 콘볼루션 연산자이다.With respect to formula 1], G (x, y , σ) = e ^ {- (x 2 + y 2-) / 2σ 2}? / 2 σ 2, I (x, y) is the input image, k ∈ R, and "*" is the convolution operator.

전형적 멕시코 모자 전달 곡선을 갖는 DOG 필터는, 잘 알려진 스케일 정규화 LOG(scale normalized Laplacian of Gaussian)의 근사치이며, 윤곽선 획득(edge detection)에 이용될 수 있다. DOG 연산자는 여러 이미지 획득 알고리즘의 개시 단계 역할을 하며, 가우시안 필터들의 분리성(separability)으로 인해 LOG 보다 연산량이 적으므로 효율적일 수 있다.A DOG filter with a typical Mexican hat transfer curve is an approximation of the well-known scale normalized Laplacian of Gaussian (LOG) and can be used for edge detection. The DOG operator acts as a starting point for various image acquisition algorithms and can be efficient because it has less computation than LOG due to the separability of Gaussian filters.

DOG 이미지는 각 이미지들 L(x, y, σ i+1) 및 L(x, y, σ i) 쌍에 대해 획득된다. 특징점의 복수의 후보들은 각 입력 이미지들의 서로 다른 스케일들에 대응하는 DOG 이미지들로부터 획득될 수 있다.A DOG image is obtained for each of the images L (x, y, σ i + 1) and L (x, y, σ i) pairs. Multiple candidates of feature points may be obtained from DOG images corresponding to different scales of each input image.

디지털 이미지들은 이산 영역에 속하므로, 2D 가우시안 필터를 두 개의 1D 컨볼루션들의 조합으로 표현할 수 있다. 따라서, 1D 가우시안 커널(Gaussian Kernel)과 컨볼루션 후 그 결과를 동등한 2D 필터의 상보적 커넬로 컨볼브함으로써, (i, j)의 모든 픽셀에 대한 이산 스케일-공간을 계산할 수 있다. 상술한 바와 같이 연산을 분리함으로써 계산의 복잡성을 O(n²)에서 O(2n)으로 낮출 수 있다.Since digital images belong to a discrete domain, a 2D Gaussian filter can be represented by a combination of two 1D convolutions. Thus, the discrete scale-space for all pixels of (i, j) can be computed by convolving the result with a 1D Gaussian kernel and convolving the result into a complementary kernel of an equivalent 2D filter. By separating the operations as described above, the complexity of the computation can be reduced from O (n ² ) to O (2n).

도 1을 참조하면, 스케일-공간 피라미드가 구성되는 과정은 다음과 같다. 1. 점점 증가하는 σ값들(즉, 스케일이 증가)을 갖는 가우시안 필터들로 입력 이미지들을 블러링할 수 있다. 2. (식 1)에서와 같이, 인접 σ를 갖는 블러링된 이미지로부터 DOG들을 계산할 수 있다. 3. 입력 이미지의 2배 다운 샘플링(즉, 옥타브)된 버전들에 대해 위 1, 2 과정을 반복할 수 있다.Referring to FIG. 1, the process of constructing the scale-space pyramid is as follows. 1. Blur the input images with Gaussian filters with increasing σ values (ie, increased scale). 2. As in (1), we can calculate DOGs from blurred images with adjacent sigma. 3. You can repeat steps 1 and 2 above for double-sampling (ie, octave) versions of the input image.

확립된 이론에 따르면, 이산 영역에서의 파라미터 k는 2^1/S과 동일하게 설정되는데, 여기서 S+3은 각 옥타브에 대한 스케일 수를 나타낸다. 소프트웨어 구현과 마찬가지로 하드웨어(이하 HW)의 정확도를 유지하고 제안된 프로세서의 크기를 제한하기 위해, S = 2로 설정된다. 그 결과 각 옥타브는 5개의 스케일들을 포함하며, 첫 번째 옥타브(10)에서 사용된 기존 이미지보다 2, 4 및 8배 더 작게 다운 샘플링된 이미지들이 나머지 옥타브들(12)에 포함된다. 제안된 디자인의 확장성 측면에서, 전력/속도 효율을 개선할 뿐만 아니라 프로세서의 크기를 줄일 수 있도록 스케일-공간 피라미드의 규모는 변할 수 있다.According to the established theory, the parameter k in the discrete domain is set equal to 2 ^{1 / S} , where S + 3 represents the number of scales for each octave. As with the software implementation, S = 2 is set to maintain the accuracy of the hardware (HW) and to limit the size of the proposed processor. As a result, each octave contains five scales, and the down-sampled images are included in the remaining octaves 12 by 2, 4, and 8 times smaller than the existing image used in the first octave 10. [ In terms of the scalability of the proposed design, the scale of the scale-space pyramid can be varied to improve the power / speed efficiency as well as reduce the size of the processor.

가우시안 필터들이 무한 영역을 갖지만, 일측의 범위를 6σ+1로 제한할 때 적절한 근사치 값을 얻을 수 있다. 여기서 σ는 일반적 가우시안 커널의 표준편차를 의미한다. 일측의 범위를 위와 같이 제한하면, 근사치로 계산되어 간과된 값 대비 가우시안 커널의 평균 값의 비는 1000배 이상 크기 때문에 충분히 필터의 정확도를 유지할 수 있다.Although the Gaussian filters have infinite regions, a proper approximation value can be obtained when limiting the range of one side to 6σ + 1. Where σ is the standard deviation of the general Gaussian kernel. If the range of one side is limited as above, the ratio of the average value of the Gaussian kernel to the overlooked value is calculated by approximation, which is 1000 times or more, so that the accuracy of the filter can be sufficiently maintained.

최초 표준편자 값을 σ₀=1.4라고 했을 때, 위의 5개의 스케일들은 σ= {1.4; 2; 2.8; 4; 5.6}이 되고 각 스케일들에 따른 필터의 크기는 순서대로 9X9, 13X13, 17X17, 25X25, 35X35가 된다.When the initial standard value is σ ₀ = 1.4, the above five scales are σ = {1.4; 2; 2.8; 4; 5.6}, and the size of the filter according to each scale becomes 9X9, 13X13, 17X17, 25X25, 35X35 in order.

아래에 설명되었듯이, 필터를 실제로 구현할 때 모든 스케일들의 동기화를 간단히 할 수 있도록 다른 표준편차 값이 선택될 수 있다.As described below, other standard deviation values can be selected to simplify the synchronization of all scales when the filter is actually implemented.

가우시안 필터들의 분리가능 특성을 이용하여 1차원 필터들의 연산으로 계산 양을 줄일 수 있다 하더라도, 적절한 크기의 필터들에 컨볼루션을 수행하기 위해 많은 수의 MAC(multiply-accumulator) 연산자들이 요구된다. 모든 DOG 연산 파이프라인은 부동 소수점 연산(floating-point arithmetic), 32-bit 단정도(single precision) IEEE-754 제약조건 (이하 "FP32")에 의해 제한된다. FP32 유닛들에는 보통 CPU들로부터 또는 CPU들로의 데이터 패스를 동기화하는데 사용될 추가 로직이 요구된다. 이러한 추가 로직은 SoCs(System on Chips) 내의 동기식/비동기식(tightly/loosely coupled) 보조 프로세서들만큼 일반적으로 구현된다. 따라서, 속도 및 코드 압축 측면에서, 위와 같은 방법은 정수만을 이용하는 연산보다 더 낮은 효율을 갖게 된다. 또한 DOG의 기존 하드웨어를 설계할 때, FP32를 이용하면 크기가 크고 제한된 플랫폼에 적합하지 않은 설계가 이뤄진다. 이러한 맥락에서, 고정 소수점(fixed-point) 접근방식은 이전에 요구되었던 물리적 리소스들을 줄임으로써 효율적 시스템을 얻는데 도움이 될 수 있다. 이와 같은 목적에서, 2D 풀(full) 가우시안 커널을 구현하기 위해 고정 소수점 연산에서 요구되는 최소 비트 수와, 이것의 1+1D 분리가능 상대를 교환하는 것이 효과적인지를 증명하기 위한 테스트가 이뤄져 왔다. 또한, 2D 풀 (비분리된) 커널을 설명하기 위해, 중간결과인 중간-가우시안 결과를 10 비트로 코딩하고, 피라미드의 2D로 필터링된 픽셀들을 14 비트로 코딩함으로써 제한된 차이가 획득된다는 것이 증명되었다.Even though the amount of computation can be reduced by the operation of one-dimensional filters using the separable nature of the Gaussian filters, a large number of multiply-accumulator (MAC) operators are required to perform convolution on appropriately sized filters. All DOG operation pipelines are limited by floating-point arithmetic, the 32-bit single precision IEEE-754 constraint ("FP32"). FP32 units typically require additional logic to be used to synchronize the data paths from or to the CPUs. These additional logic is implemented as generally as synchronous / loosely coupled coprocessors in SoCs (System on Chips). Thus, in terms of speed and code compression, the above method has a lower efficiency than an operation using only integers. When designing legacy hardware for DOG, the FP32 is designed to fit large, limited platforms. In this context, a fixed-point approach can help to obtain an efficient system by reducing previously required physical resources. For this purpose, tests have been conducted to demonstrate the effectiveness of exchanging the minimum number of bits required in a fixed-point operation and its 1 + 1D separable counterpart to implement a full Gaussian kernel. In addition, to illustrate a 2D full (non-decoupled) kernel, it has been proven that a limited difference is obtained by coding intermediate results of intermediate-Gaussian results with 10 bits and coding the 2D filtered pixels of the pyramid with 14 bits.

또한, DOG 필터링 프로세스는 각 입력 이미지에 대한 DOG 이미지들 쌍을 획득하기 위해 다른 스케일에 대응하는 스무딩된 이미지들에 적용될 수 있다. 예를 들어, 각 입력 이미지에 대한 각각의 스케일들에 대응하는 5개의 스무딩된 이미지들(10 및 12)로부터 4개의 DOG 이미지들(14 및 16) 쌍이 생성될 수 있다. In addition, the DOG filtering process may be applied to smoothed images corresponding to different scales to obtain a pair of DOG images for each input image. For example, four DOG images 14 and 16 pairs may be generated from five smoothed images 10 and 12 corresponding to respective scales for each input image.

도 2를 참조하면, DOG 이미지 Di(x,y,σ) (20)내 각 픽셀은 26개 비교 픽셀들과 인접하여 표시된 픽셀(X로 표시)로 설정될 수 있다. 비교 픽셀들은, DOG 이미지 Di-1(x,y,σ) (22)의 3x3 픽셀 영역의 8개의 픽셀이고, DOG 이미지 Di+1(x,y,σ) (24)의 3x3 픽셀 영역의 9개의 픽셀이다. 그 다음, 표시된 픽셀 및 비교 픽셀들의 데이터 중에서 표시된 픽셀의 데이터가 극점 데이터(즉, 극대 또는 극소)인지 결정될 수 있다. 표시된 픽셀이 극점으로 판단된 경우, 표시된 픽셀은 후보 객체 특징점으로 설정될 수 있다. 특징점 후보가 객체 특징점으로 확정되면, 객체 특징점을 갖는 DOG 이미지 Di의 스케일 값 σ-i는 객체 특징점의 이미지 특징을 계산하는데 이용될 수 있다.Referring to Fig. 2, each pixel in the DOG image Di (x, y, sigma) 20 may be set to a pixel (denoted X) displayed adjacent to 26 comparison pixels. The comparison pixels are 8 pixels in the 3x3 pixel area of the DOG image Di-1 (x, y, sigma) 22 and 9 pixels in the 3x3 pixel area of the DOG image Di + 1 (x, y, Pixels. Then, it can be determined whether the data of the displayed pixels out of the data of the displayed pixels and the comparison pixels is pole data (i.e., maximum or minimum). If the displayed pixel is determined to be a pole, the displayed pixel may be set as a candidate object feature point. Once the minutiae candidate is determined as the object minutiae, the scale value sigma -i of the DOG image Di with the minutiae object minus points can be used to calculate the image feature of the object minutiae.

극 값 획득에 대해 설명한다. DOG 피라미드가 생성되면, DOG 이미지 내 각 픽셀을 동일한 스케일의 8개의 인접 픽셀들 및 두 인접 스케일들(20 및 30)의 9개의 픽셀들과 비교함으로써 특징점들이 식별될 수 있다. 픽셀이 극 값(극대 또는 극소)인 경우, 해당 픽셀은 이후로도 유효한 특징점으로 간주될 수 있다. 각 옥타브에서 5개의 스케일들로부터 4개의 DOG들이 계산될 수 있으므로, 극 값 획득은 각 옥타브에 대해 병렬적 2개의 파이프들에 의해 3개의 DOG들의 첫 번째 및 두 번째 그룹들을 각각 비교함으로써 이뤄질 수 있다.The pole value acquisition will be described. Once the DOG pyramid is created, minutiae points can be identified by comparing each pixel in the DOG image with 8 adjacent pixels of the same scale and 9 pixels of the two adjacent scales 20 and 30. If the pixel is a polar value (maximum or minimum), then the pixel may be considered a valid feature point from now on. Since four DOGs can be computed from five scales in each octave, pole-value acquisition can be accomplished by comparing the first and second groups of three DOGs by two parallel pipes for each octave .

본 명세서에서, 극 값을 획득은 극 값의 검출, 극 값을 나타내는 데이터의 획득, 생성 또는 수신하는 경우를 포함할 수 있다.In this specification, acquiring a pole value may include the detection of a pole value, the case of acquiring, generating or receiving data representing a pole value.

본 개시에서는 타일 기반 GPU에서 극 값을 획득획득하는 방법을 제공한다. 또한, 이미지의 후보 DOG 레이어를 수신하는 방법을 포함한다. 일 실시예에서, 후보 DOG 레이어는 이미지의 타일들에 기초하여 계산된다. 이미지의 중간 레이어로서의 후보 DOG 레이어를 획득하는 방법을 포함한다. 후보 DOG 레이어의 값들을 하위 및 상위 DOG 레이어의 값들과 비교함으로써 극 점을 획득하는 방법을 포함한다. 극 점은 극대 값 또는 극소 값 중 하나를 포함한다. 또한, 획득된 극점 데이터들을 저장하는 방법을 포함한다. 뿐만 아니라, 특징점 로컬라이제이션(localization)을 획득하기 위해 셰이더 코어(shader core)로 극점 데이터들을 전달하는 방법을 포함한다.The present disclosure provides a method of obtaining and acquiring pole values in a tile-based GPU. The method also includes receiving a candidate DOG layer of the image. In one embodiment, the candidate DOG layer is calculated based on the tiles in the image. And obtaining a candidate DOG layer as an intermediate layer of the image. And comparing the values of the candidate DOG layer with values of the lower and higher DOG layers to obtain the pole point. The pole point includes either a maximum value or a minimum value. The method also includes storing the acquired pole data. In addition, it includes a method of transferring pole data to a shader core to obtain feature localization.

일 실시예에서, 후보 DOG 레이어의 값과 하위 DOG 레이어의 값을 비교함으로써, 후보 DOG 레이어 및 하위 DOG 레이어의 극점 데이터를 획득하는 방법을 포함한다. 또한, 후보 DOG 레이어 및 하위 DOG 레이어의 극점 데이터와 상위 DOG 레이어의 값을 비교함으로써, 후보 DOG 레이어, 하위 DOG 레이어 및 상위 DOG 레이어의 극점 데이터를 획득하는 방법을 포함한다.In one embodiment, a method of obtaining pole data of a candidate DOG layer and a lower DOG layer by comparing a value of a candidate DOG layer with a value of a lower DOG layer. The method also includes a method of obtaining pole data of the candidate DOG layer, the lower DOG layer, and the upper DOG layer by comparing the pole data of the candidate DOG layer and the lower DOG layer with the value of the upper DOG layer.

일 실시예에서, 하위 DOG 레이어의 값들은 타일 버퍼에 저장될 수 있다.In one embodiment, the values of the lower DOG layer may be stored in a tile buffer.

본 개시에서는 에너지 효율적으로 타일 기반 GPU에서 극 값을 획득하는 방법을 제공한다. 본 개시에서 설명되는 방법들은 GPU에서 실행될 수 있고, 더 구체적으로는 에너지 민감형 모바일(energy sensitive mobile) GPU들에서 실행될 수 있다. 극 값 획득을 수행하기 위해 모바일 GPU들에서 이용할 수 있는 Z 비교 회로(Z comparison circuit)을 이용할 수 있다.The present disclosure provides a method for obtaining pole values in a tile-based GPU energy-efficient. The methods described in this disclosure can be implemented in GPUs, and more particularly, in energy sensitive mobile GPUs. A Z comparison circuit available in mobile GPUs can be used to perform pole-value acquisitions.

본 개시된 그래픽 처리 방법은 프로그래머블(programmable) GPU들 및 논프로그래머블(non-programmable) 하드웨어에 파이프라인 될 수 있는 세 가지 다른 패스들을 통해 실행될 수 있다. 일 실시예에서, GPU 그래픽 고정 함수 하드웨어(GPUs graphics fixed function hardware)는 본 개시의 방법들을 실행하는데 이용된다. 또한, 본 개시의 방법을 실행하는데 있어, 극점에 대응하는 레이어 ID를 저장하는 버퍼(혹은 극 값 버퍼), 극점들의 최종 목록을 결정하는 상태 머신(state machine)과 같은 추가 구성 요소들이 더 필요할 수 있다. 일 실시예에서, 온칩 프레임 버퍼 메모리(on chip frame buffer memory)가 중간 데이터를 저장하는데 사용됨으로써, 중간 데이터를 저장하는데 더 많은 메모리를 제공할 수 있다. 온칩 프레임 버퍼 메모리의 사용으로, 공유 메모리를 다른 패스들을 위해 비울 수 있다. 본 개시의 방법에서 GPU 하드웨어가 효율적으로 이용됨으로써, 처리 속도가 높아지고 에너지 절약이 가능하다.The disclosed graphics processing methods may be implemented through three different paths that can be pipelined to programmable GPUs and non-programmable hardware. In one embodiment, GPU graphics fixed function hardware is used to implement the methods of the present disclosure. Further, in carrying out the method of the present disclosure, additional components may be needed such as a buffer (or pole value buffer) for storing the layer ID corresponding to the pole, and a state machine for determining a final list of poles have. In one embodiment, on-chip frame buffer memory is used to store intermediate data, thereby providing more memory for storing intermediate data. With the use of on-chip frame buffer memory, the shared memory can be emptied for other passes. By using the GPU hardware efficiently in the method of the present disclosure, the processing speed is increased and energy saving is possible.

이하에서는 도 3 내지 6을 참조하여 본 개시의 방법을 상세히 설명하도록 한다.Hereinafter, the method of the present disclosure will be described in detail with reference to Figs.

도 3은 일 실시예에 따른, GPU(102)를 포함하는 극 값 획득 시스템의 블록도이다. 도 3에 도시된 바에 따르면, 시스템은 GPU(102)를 포함할 수 있다. 일 실시예에서, GPU(102)는 셰이더 코어(104), 타일 버퍼(106), 비교부(108) 및 극 값 버퍼(110)를 포함할 수 있다. 이하에서, 타일 버퍼(106) 및 극 값 버퍼(110)는 동일한 버퍼를 의미할 수 있다.3 is a block diagram of a pole value acquisition system including a GPU 102, in accordance with one embodiment. As shown in FIG. 3, the system may include a GPU 102. In one embodiment, the GPU 102 may include a shader core 104, a tile buffer 106, a comparison unit 108, and a pole value buffer 110. In the following, tile buffer 106 and pole value buffer 110 may refer to the same buffer.

셰이더 코어(104)는 후보 DOG 레이어를 계산할 수 있다. 일 실시예에서, 후보 DOG 레이어는 이미지의 타일들에 기초하여 계산될 수 있다.The shader core 104 may calculate the candidate DOG layer. In one embodiment, the candidate DOG layer may be computed based on the tiles in the image.

또한, 셰이더 코어(104)는 극점들의 목록을 수신할 수 있다. 셰이더 코어(104)는 수신된 극점들의 목록을 이용하여 특징점들의 로컬라이제이션 및 특징점들에 대한 서술자들의 생성을 수행할 수 있다.In addition, the shader core 104 may receive a list of pole points. The shader core 104 may utilize the list of received poles to perform localization of the feature points and generation of descriptors for the feature points.

타일 버퍼(106)는 DOG 레이어의 값들을 저장할 수 있다. 예를 들어, 후보 DOG 레이어를 "A 레이어"라고 했을 때, 하위 DOG 레이어인 "A-1 레이어"의 값들은 타일 버퍼(106)에 저장될 수 있다. 타일 버퍼(106)는 하나 이상의 컴퓨터로 읽을 수 있는 기록매체를 포함할 수 있다. 타일 버퍼(106)는 비휘발성 저장 요소들을 포함할 수 있다. 비휘발성 저장 요소들은, 마그네틱 하드 디스크, 광 디스크, 플로피 디스크, 플래시 메모리, 또는 EPROM(electrically programmable memoires) 또는 EEPROM(electrically erasable programmable memoires)을 포함할 수 있다. 또한, 타일 버퍼(106)는 비일시적(non-transitory) 저장 매체일 수 있다. "비일시적"이란 용어는 반송파(carrier wave) 또는 프로파게이티드 신호(propagated signal)로 구현되지 않은 저장 매체를 나타낼 수 있다. 그러나, "비일시적"이란 용어는 타일 버퍼(106)가 고정되었다는 의미로 해석되어서는 안 된다. 일 실시예에서, 타일 버퍼(106)는 메모리보다 더 많은 양의 정보를 저장할 수 있다. 일 실시예에서, 비일시적 저장 매체(예를 들어, RAM(Random Access Memory)또는 캐시)는 시간에 따라 변화하는 데이터를 저장할 수 있다.The tile buffer 106 may store the values of the DOG layer. For example, when the candidate DOG layer is referred to as an "A layer ", the values of the" A-1 layer " The tile buffer 106 may comprise one or more computer readable media. The tile buffer 106 may include non-volatile storage elements. Non-volatile storage elements may include magnetic hard disks, optical disks, floppy disks, flash memory, or electrically programmable memories (EPROM) or electrically erasable programmable memories (EEPROM). In addition, the tile buffer 106 may be a non-transitory storage medium. The term "non-transient" may refer to a storage medium that is not implemented as a carrier wave or a propagated signal. However, the term "non-transient" should not be construed as meaning that the tile buffer 106 is fixed. In one embodiment, the tile buffer 106 may store a greater amount of information than the memory. In one embodiment, a non-volatile storage medium (e.g., RAM (Random Access Memory) or cache) may store data that varies over time.

일 실시예에서, 비교부(108)는 모바일 GPU들 내에 존재하는 Z 비교 회로이다. GPU(102) 내 Z 비교 회로는 극 값 획득에 이용되는 것으로 이해될 수 있다. 일 실시예에서, 비교부(108)는 이미지의 후보 DOG 레이어를 수신할 수 있다. 비교부(108)는 중간 레이어로서의 후보 DOG 레이어를 획득할 수 있다.In one embodiment, the comparator 108 is a Z compare circuit present in mobile GPUs. The Z compare circuit in the GPU 102 can be understood to be used for polar value acquisition. In one embodiment, the comparing unit 108 may receive the candidate DOG layer of the image. The comparing unit 108 can acquire a candidate DOG layer as an intermediate layer.

또한, 비교부(108)는 후보 DOG 레이어의 값들을 하위 DOG 레이어 및 상위 DOG 레이어의 값들과 비교함으로써 극점 데이터를 획득할 수 있다. 예를 들어, 후보 DOG 레이어를 "A 레이어", 하위 DOG 레이어를 "A-1 레이어", 상위 DOG 레이어를 "A+1 레이어"라 했을 때, 비교부(108)는 "A 레이어" 픽셀 값들을 "A-1 레이어" 및 "A+1 레이어" 픽셀 값들과 비교할 수 있다.Also, the comparing unit 108 can obtain the pole data by comparing the values of the candidate DOG layer with the values of the lower DOG layer and the upper DOG layer. For example, when the candidate DOG layer is referred to as "A layer", the lower DOG layer is referred to as "A-1 layer", and the upper DOG layer is referred to as "A + A " layer "and" A + 1 layer "pixel values.

일 실시예에서, 비교부(108)는, 후보 DOG 레이어의 값과 하위 DOG 레이어의 대응 값을 비교함으로써 후보 DOG 레이어 및 하위 DOG 레이어의 극점 데이터를 획득할 수 있다. 즉, 후보 DOG 레이어 및 하위 DOG 레이어의 극점 데이터를 획득하는데 있어서, 비교부(108)는 "A 레이어"의 픽셀 값들과 "A-1 레이어"의 대응 픽셀 값을 비교할 수 있다.In one embodiment, the comparison unit 108 may obtain the pole data of the candidate DOG layer and the lower DOG layer by comparing the value of the candidate DOG layer with the corresponding value of the lower DOG layer. That is, in obtaining the pole data of the candidate DOG layer and the lower DOG layer, the comparing unit 108 may compare the pixel values of the "A layer" with the corresponding pixel values of the "A-1 layer".

또한, 비교부(108)는 후보 DOG 레이어 및 하위 DOG 레이어의 극점 데이터를 상위 DOG 레이어의 값들과 비교함으로써, 후보 DOG 레이어, 하위 DOG 레이어 및 상위 DOG 레이어의 극점 데이터를 획득할 수 있다. 후보 DOG 레이어, 하위 DOG 레이어 및 상위 DOG 레이어의 극점 데이터를 획득하는데 있어, 비교부(108)는 "A 레이어" 및 "A-1 레이어"에 대한 획득된 극점 데이터를 "A+1 레이어"의 픽셀 값들과 비교할 수 있다.In addition, the comparing unit 108 can obtain the pole data of the candidate DOG layer, the lower DOG layer, and the upper DOG layer by comparing the pole data of the candidate DOG layer and the lower DOG layer with the values of the upper DOG layer. In obtaining the pole data of the candidate DOG layer, the lower DOG layer, and the upper DOG layer, the comparator 108 compares the obtained pole data for the "A layer" and the "A-1 layer" Pixel values.

획득된 극점 데이터들은 극 값 버퍼(110)에 저장될 수 있다. 또한, 획득된 극 값에 대응하는 DOG 레이어의 ID는 극 값 버퍼(110)에 저장될 수 있다. 극 값 버퍼(110)는 하나 이상의 컴퓨터로 읽을 수 있는 기록 매체를 포함할 수 있다. 극 값 버퍼(110)는 비휘발성 저장 요소들을 포함할 수 있다. 비휘발성 저장 요소들은, 마그네틱 하드 디스크, 광 디스크, 플로피 디스크, 플래시 메모리, 또는 EPROM(electrically programmable memoires) 또는 EEPROM(electrically erasable programmable memoires)을 포함할 수 있다. 또한, 극 값 버퍼(110)는 비일시적(non-transitory) 저장 매체일 수 있다. "비일시적"이란 용어는 반송파(carrier wave) 또는 프로파게이티드 신호(propagated signal)로 구현되지 않은 저장 매체를 나타낼 수 있다. 그러나, "비일시적"이란 용어는 극 값 버퍼(110)가 고정되었다는 의미로 해석되어서는 안 된다. 일 실시예에서, 극 값 버퍼(110)는 메모리보다 더 많은 양의 정보를 저장할 수 있다. 일 실시예에서, 비일시적 저장 매체(예를 들어, RAM(Random Access Memory)또는 캐시)는 시간에 따라 변화하는 데이터를 저장할 수 있다.The obtained pole data can be stored in the pole value buffer 110. [ In addition, the ID of the DOG layer corresponding to the obtained pole value can be stored in the pole value buffer 110. [ The polar value buffer 110 may comprise one or more computer readable recording media. The polar value buffer 110 may comprise non-volatile storage elements. Non-volatile storage elements may include magnetic hard disks, optical disks, floppy disks, flash memory, or electrically programmable memories (EPROM) or electrically erasable programmable memories (EEPROM). Also, the pole value buffer 110 may be a non-transitory storage medium. The term "non-transient" may refer to a storage medium that is not implemented as a carrier wave or a propagated signal. However, the term "non-transient" should not be interpreted to mean that the pole value buffer 110 is fixed. In one embodiment, the pole value buffer 110 may store a greater amount of information than the memory. In one embodiment, a non-volatile storage medium (e.g., RAM (Random Access Memory) or cache) may store data that varies over time.

도 4는 일 실시예에 따른, 이미지의 DOG 레이어 연산 과정을 설명하는 도면이다. 이미지의 DOG 레이어 연산 과정에 대한 제 1 패스가 도 4에 도시된다. 제 1 패스는 셰이더 코어(104)에서 실행된다. 제 1 패스는 제어부(202), 복수의 프로세싱 엘리먼트들(PE) (204a-204d), 공유 메모리(206), L2/LL 캐시(208), DRAM(Dynamic Random Access Memory) (210), ROP(Raster Operations Pipeline) (212)를 포함할 수 있다. 제 1 패스의 입력은 가우시안 피라미드 및 DOG 피라미드를 만든 이미지의 타일이다. 도 4에 도시된 바대로, 입력 타일은 DRAM(210)으로부터 L2/LL 캐시(208)를 거쳐 획득되며, 획득된 타일은 복수의 PE들(204a-204d)로 전달된다. 공유 메모리(206)는 복수의 PE들(204a-204d)이 공유하는 메모리 시스템이다. 복수의 PE들은 DOG 피라미드를 계산한다. ROP(212)는 픽셀 데이터를 수신한다. 각 옥타브의 DOG 레이어 각각은 극 값 획득을 위해 하나씩 ROP(212)로 전달된다. 또한, ROP(212)는 설정가능 기능들(configurable functions)을 이용하여 픽셀별로 데이터를 병합(merge)할 수 있다. 제 1 패스(22)의 출력은 입력 타일로부터 생성된 DOG 레이어의 부분 집합이다. 일 실시예에서, 생성된 DOG 레이어의 값들은 타일 버퍼(106)에 저장될 수 있다. 또한, 일 실시예에서, 후보 DOG 레이어, 하위 DOG 레이어 및 상위 DOG 레이어는 제 1 패스의 출력으로서 획득될 수 있다.4 is a diagram for explaining a DOG layer calculation process of an image according to an embodiment. A first pass for the DOG layer operation of the image is shown in FIG. The first pass is executed in the shader core 104. The first pass includes a control unit 202, a plurality of processing elements (PE) 204a-204d, a shared memory 206, an L2 / LL cache 208, a Dynamic Random Access Memory (DRAM) Raster Operations Pipeline) 212. The input of the first pass is the tile of the image that created the Gaussian pyramid and the DOG pyramid. As shown in FIG. 4, the input tile is obtained from the DRAM 210 via the L2 / LL cache 208, and the obtained tile is transferred to the plurality of PEs 204a-204d. The shared memory 206 is a memory system shared by a plurality of PEs 204a-204d. The plurality of PEs calculates the DOG pyramid. The ROP 212 receives pixel data. Each DOG layer of each octave is delivered to the ROP 212 one by one for polar value acquisition. In addition, the ROP 212 may merge data on a pixel-by-pixel basis using configurable functions. The output of the first pass 22 is a subset of the DOG layer generated from the input tiles. In one embodiment, the values of the generated DOG layer may be stored in the tile buffer 106. Further, in one embodiment, the candidate DOG layer, the lower DOG layer, and the upper DOG layer may be obtained as the output of the first pass.

도 5는 일 실시예에 따른, 극점 데이터들의 획득 방법을 설명하는 도면이다. 극 값 획득에 대한 제 2 패스가 도 5에 도시되었다. 제 2 패스는 비교부(108)에서 실행된다. 제 2 패스의 입력은 셰이더 코어(104)내 복수의 PE들(204a-204d)로부터 하나씩 수신된 DOG 레이어의 부분 집합이다.5 is a view for explaining a method of obtaining pole data according to an embodiment. A second pass for pole value acquisition is shown in Fig. The second pass is performed in the comparator 108. [ The input of the second pass is a subset of the DOG layer received from the plurality of PEs 204a-204d in the shader core 104 one by one.

도 5에 도시된 바대로, 비교부(108)는 셰이더 코어(104)로부터 이미지의 DOG 레이어를 수신할 수 있다. 비교부(108)은 중간 레이어로서의 후보 DOG 레이어를 획득할 수 있다. 비교부(108)는 타일 버퍼(106) 내에 저장된 DOG 레이어의 값들을 획득할 수 있다.As shown in FIG. 5, the comparing unit 108 may receive the DOG layer of the image from the shader core 104. The comparing unit 108 can acquire a candidate DOG layer as an intermediate layer. The comparing unit 108 may obtain the values of the DOG layer stored in the tile buffer 106. [

비교부(108)는 후보 DOG 레이어의 값들을 하위 DOG 레이어 및 상위 DOG 레이어의 값들과 비교할 수 있다. 예를 들어, 후보 DOG 레이어를 "A 레이어"라고 했을 때, "A 레이어"가 수신된 경우 비교부(108)는 타일 버퍼(106)로부터 "A-1 레이어"(후보 레이어를 기준으로 했을 때 하위 레이어)의 값을 수신한다. 비교부(108)는 "A 레이어"의 값을 "A-1 레이어"의 대응 값과 비교할 수 있다. 값들을 비교한 후, 극 값 버퍼(110) 내의 극소 및/또는 극대 값, 극소 레이어(minima layer) 및/또는 극대 레이어(maxima layer)의 ID가 업데이트될 수 있다. 일 실시예에서, 후보 DOG 레이어의 값이 하위 및 상위 DOG 레이어의 값들보다 더 큰 경우, 해당 후보 DOG 레이어의 값으로 극 값 버퍼(110) 내 극대 값이 업데이트되고 극대 레이어의 ID는 "A 레이어"가 될 수 있다.The comparing unit 108 may compare the values of the candidate DOG layer with the values of the lower DOG layer and the upper DOG layer. For example, when the candidate DOG layer is referred to as the "A layer ", when the" A layer "is received, the comparing unit 108 extracts" A- Lower layer). The comparing unit 108 can compare the value of the "A layer " with the corresponding value of the" A-1 layer ". After comparing the values, the IDs of the minima and / or maxima, the minima layer, and / or the maxima layer in the pole value buffer 110 may be updated. In one embodiment, if the value of the candidate DOG layer is greater than the values of the lower and upper DOG layers, the maximum value in the pole value buffer 110 is updated with the value of the candidate DOG layer, ".

위 예시에서, "A 레이어"가 마지막 레이어인 경우, 비교부(108)는 극 값 버퍼(110)로부터 최종 극 값들 및 극 값들 각각에 대응되는 레이어 ID를 극 값 버퍼(110)로부터 획득할 수 있다. 비교부(108)에서 획득된 극 값들에 대응되는 레이어 ID에 기초하여 극 값들을 26개의 주변 값들과 비교하고, 주변 값들보다 크거나(극대) 작은(극소) 극 값들의 목록을 생성하여 생성된 목록을 셰이더 코어(104)로 전달한다. 도 5에는 26개의 인접 극점들의 목록이 극대 값들로 이뤄진 것을 도시하고 있지만, 26개의 인접 극점들의 목록은 극소 값들로 이뤄질 수 있다.In the above example, if the "A layer" is the last layer, the comparator 108 may obtain from the pole value buffer 110 the layer ID corresponding to each of the last pole values and pole values from the pole value buffer 110 have. And compares the pole values with the 26 surrounding values based on the layer IDs corresponding to the pole values obtained in the comparison unit 108 and generates a list of smaller (maximum) pole values than the surrounding values And delivers the list to the shader core 104. Although FIG. 5 shows that the list of 26 adjacent pole points is made up of the maximum values, the list of 26 adjacent pole points can be made to the minimum values.

도 6은 일 실시예에 따른, 특징점 로컬라이제이션 과정을 설명하는 도면이다. 도 6에는 특징점 로컬라이제이션 대한 제 3 패스가 도시된다. 제 3 패스는 셰이더 코어(104)에서 실행된다. 제 3 패스는 제어부(202), PE들 (204a-204d), 공유 메모리(206), L2/LL 캐시(208), DRAM(Dynamic Random Access Memory) (210)을 포함할 수 있다. 제 3 패스의 입력은 제 2 패스에서 획득된 극점들(극대 값들 및/또는 극소 값들)의 목록이고, 출력은 PE들(204a-204d)에 의해 생성된 특징점 서술자들의 목록이다. 제 3 패스에서, 제어부(202)는 특징점들의 로컬라이제이션 및 특징점들에 대한 서술자들을 생성하기 위해 선택된 극점 데이터들을 프로세싱 엘리먼트(204)로 보낸다. 특징점 로컬라이제이션 및 특징점 서술자 생성에 대한 자세한 과정은 생략한다.6 is a diagram for explaining a feature point localization process according to an embodiment. The third pass for feature point localization is shown in Fig. The third pass is executed in the shader core 104. The third path may include a control unit 202, PEs 204a to 204d, a shared memory 206, an L2 / LL cache 208, and a dynamic random access memory (DRAM) The input of the third pass is a list of pole points (maximal values and / or minima values) obtained in the second pass and the output is a list of minutiae descriptors generated by PEs 204a-204d. In the third pass, the control unit 202 sends the selected pole data to the processing element 204 to generate descriptors for the localization and feature points of the feature points. Detailed process of feature point localization and feature point descriptor generation is omitted.

도 7은 일 실시예에 따른, GPU(102)에서 극점 데이터를 획득하는 방법을 설명하는 흐름도이다. 단계 502에서, 비교부(108)는 하나 이상의 DOG 레이어를 셰이더 코어(104) 내 PE들(204a-204d)로부터 수신할 수 있다. 단계 504에서, 비교부(108)는 이미지의 중간 DOG 레이어로서 후보 DOG 레이어를 획득할 수 있다.7 is a flow diagram illustrating a method for obtaining pole data in GPU 102, in accordance with one embodiment. In step 502, the comparing unit 108 may receive one or more DOG layers from the PEs 204a-204d in the shader core 104. [ In step 504, the comparing unit 108 may obtain a candidate DOG layer as an intermediate DOG layer of the image.

단계 506에서, 비교부(108)는 후보 DOG 레이어의 값들을 하위 DOG 레이어 및 상위 DOG 레이어의 값들과 비교함으로써 극점 데이터를 획득할 수 있다.In step 506, the comparing unit 108 may obtain the pole data by comparing the values of the candidate DOG layer with the values of the lower DOG layer and the upper DOG layer.

단계 508에서, 극 값 버퍼(110)는 획득된 극점 데이터들을 저장할 수 있다. 단계 510에서, 비교부(108)는 특징점 로컬라이제이션을 위해 극점 데이터들을 셰이더 코어(104)로 전달할 수 있다. 이후 셰이더 코어(104)는 전달받은 극점 데이터들을 이용하여 특징점 로컬라이제이션을 수행할 수 있다. In step 508, the pole value buffer 110 may store the acquired pole data. In step 510, the comparison unit 108 may pass the pole data to the shader core 104 for feature point localization. The shader core 104 may then perform the feature point localization using the received pole data.

이상의 단계들은 다른 순서대로 또는 동시에 진행될 수 있다. 또한, 다른 예시들에서 위 단계 중 일부가 변형, 생략될 수 있고 새로운 단계가 추가될 수 있다.The above steps may be performed in a different order or simultaneously. Further, in other examples, some of the above steps may be modified or omitted, and new steps may be added.

도 8은 일 실시예에 따른, GPU에서 극점 데이터를 획득하는 방법을 실행하는 연산 환경을 설명하는 블록도이다. 도 8에 도시된 바에 따르면, 연산 환경(602)은 제어부(604), 연산부(606)를 구비한 적어도 하나의 프로세서(608), 메모리(또는 저장부)(610), 입출력부(612) 및 복수의 네트워크 디바이스(614)를 포함할 수 있다. 프로세서(608)는 알고리즘의 명령을 처리할 수 있다. 프로세서(608)는 제어부(604)로부터 명령을 수신할 수 있다. 또한, 연산부(606)를 이용하여 명령의 실행에 포함된 논리 연산 및 산술 연산을 수행할 수 있다.8 is a block diagram illustrating an arithmetic environment for performing a method of obtaining pole data in a GPU, in accordance with one embodiment. 8, the computing environment 602 includes at least one processor 608 having a control unit 604, an operation unit 606, a memory (or storage unit) 610, an input / output unit 612, And may include a plurality of network devices 614. The processor 608 may process the instructions of the algorithm. The processor 608 may receive an instruction from the control unit 604. [ Further, the arithmetic unit 606 can be used to perform logical operations and arithmetic operations included in the execution of the instruction.

전반적 연산 환경(602)은 복수의 동질 및/또는 이질의 코어들, 복수의 다른 종류의 CPU들, 매체 및 가속장치들로 구성될 수 있다. 프로세서(608)는 알고리즘의 명령을 처리할 수 있다. 또한, 복수의 프로세서들(608)는 단일 칩 또는 다수의 칩들에 위치할 수 있다.The overall computing environment 602 may comprise a plurality of coherent and / or heterogeneous cores, a plurality of different types of CPUs, media, and accelerators. The processor 608 may process the instructions of the algorithm. In addition, the plurality of processors 608 may be located on a single chip or multiple chips.

실행에 필요한 명령어들 및 코드들을 포함하는 알고리즘은 메모리(610)에 저장될 수 있다. 실행 중에, 명령어들이 저장되어 있던 메모리(610)로부터 호출되어(fetched), 프로세서(608)에 의해 처리될 수 있다.Algorithms, including the instructions and codes necessary for execution, may be stored in the memory 610. During execution, the instructions may be fetched from the stored memory 610 and processed by the processor 608.

하드웨어를 구현하는 경우, 입출력부(612) 및 다양한 네트워크 디바이스(614)가 연산 환경에 연결될 수 있다.When hardware is implemented, the input / output unit 612 and various network devices 614 may be connected to the computing environment.

이상의 실시예들은 적어도 하나의 하드웨어 디바이스에서 실행되고, 구성요소들을 제어하기 위해 네트워크 관리 기능을 수행하는, 적어도 하나의 소프트웨어 프로그램을 통해 실시될 수 있다. 도 3 및 도 8에 도시된 블록들은 적어도 하나의 하드웨어 디바이스 또는 하드웨어 및 소프트웨어 모듈의 조합일 수 있다.The above embodiments may be implemented through at least one software program that runs on at least one hardware device and performs network management functions to control the components. The blocks shown in Figures 3 and 8 may be at least one hardware device or a combination of hardware and software modules.

상기 살펴 본 실시 예들에 따른 장치는 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. The apparatus according to the above embodiments may include a processor, a memory for storing and executing program data, a permanent storage such as a disk drive, a communication port for communicating with an external device, a touch panel, a key, The same user interface device, and the like. Methods implemented with software modules or algorithms may be stored on a computer readable recording medium as computer readable codes or program instructions executable on the processor. Here, the computer-readable recording medium may be a magnetic storage medium such as a read-only memory (ROM), a random-access memory (RAM), a floppy disk, a hard disk, ), And a DVD (Digital Versatile Disc). The computer-readable recording medium may be distributed over networked computer systems so that computer readable code can be stored and executed in a distributed manner. The medium is readable by a computer, stored in a memory, and executable on a processor.

본 실시 예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들은 특정 기능들을 실행하는 다양한 개수의 하드웨어 또는/및 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 실시 예는 하나 이상의 마이크로프로세서들의 제어 또는 다른 제어 장치들에 의해서 다양한 기능들을 실행할 수 있는, 메모리, 프로세싱, 로직(logic), 룩 업 테이블(look-up table) 등과 같은 직접 회로 구성들을 채용할 수 있다. 구성 요소들이 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있는 것과 유사하게, 본 실시 예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 실시 예는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. "매커니즘", "요소", "수단", "구성"과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 상기 용어는 프로세서 등과 연계하여 소프트웨어의 일련의 처리들(routines)의 의미를 포함할 수 있다.This embodiment may be represented by functional block configurations and various processing steps. These functional blocks may be implemented in a wide variety of hardware and / or software configurations that perform particular functions. For example, embodiments may include integrated circuit components such as memory, processing, logic, look-up tables, etc., that may perform various functions by control of one or more microprocessors or other control devices Can be employed. Similar to how components may be implemented with software programming or software components, the present embodiments may be implemented in a variety of ways, including C, C ++, Java (" Java), an assembler, and the like. Functional aspects may be implemented with algorithms running on one or more processors. In addition, the present embodiment can employ conventional techniques for electronic environment setting, signal processing, and / or data processing. Terms such as "mechanism", "element", "means", "configuration" may be used broadly and are not limited to mechanical and physical configurations. The term may include the meaning of a series of routines of software in conjunction with a processor or the like.

본 실시 예에서 설명하는 특정 실행들은 예시들로서, 어떠한 방법으로도 기술적 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. The specific implementations described in this embodiment are illustrative and do not in any way limit the scope of the invention. For brevity of description, descriptions of conventional electronic configurations, control systems, software, and other functional aspects of such systems may be omitted. Also, the connections or connecting members of the lines between the components shown in the figures are illustrative of functional connections and / or physical or circuit connections, which may be replaced or additionally provided by a variety of functional connections, physical Connection, or circuit connections.

본 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 포함하는 것으로서(이에 반하는 기재가 없다면), 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 마지막으로, 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 한정되는 것은 아니다. 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 기술적 사상을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.In this specification (particularly in the claims), the use of the terms "above" and similar indication words may refer to both singular and plural. In addition, when a range is described, it includes the individual values belonging to the above range (unless there is a description to the contrary), and the individual values constituting the above range are described in the detailed description. Finally, if there is no explicit description or contradiction to the steps constituting the method, the steps may be performed in an appropriate order. It is not necessarily limited to the description order of the above steps. The use of all examples or exemplary terms (e. G., The like) is merely intended to be illustrative of technical ideas and is not to be limited in scope by the examples or the illustrative terminology, except as by the appended claims. It will also be appreciated by those skilled in the art that various modifications, combinations, and alterations may be made depending on design criteria and factors within the scope of the appended claims or equivalents thereof.

Claims

A method of processing graphic data,
Receiving a Difference of Gaussian (DOG) layer of an image;
Obtaining a candidate Difference of Gaussian (DOG) layer of the image as an intermediate layer from the received DOG layer;
Obtaining pole data representing at least one pole by comparing values of the candidate DOG layer with values of a previous DOG layer and a next DOG layer; And
Storing the pole data representing the at least one pole in a buffer;
/ RTI >

The method according to claim 1,
Transmitting the stored one or more pole data to a shader core to perform key point localization;
&Lt; / RTI >

The method according to claim 1,
Wherein the obtaining of the pole data comprises:
Obtaining pole data of the candidate DOG layer and the lower DOG layer by comparing the value of the candidate DOG layer with a corresponding value of the lower DOG layer; And
Obtaining pole data of the candidate DOG layer, the lower DOG layer, and the upper DOG layer by comparing pole data of the candidate DOG layer and the lower DOG layer with values of a higher DOG layer;
/ RTI >

The method according to claim 1,
Wherein values of the DOG layer are stored in a buffer.

The method according to claim 1,
Wherein the DOG layer is calculated in the shader core based on tiles of the image.

The method according to claim 1,
Wherein the at least one pole data comprises at least one of a maximum value and a minimum value.

1. A pole data processing apparatus comprising a GPU,
Receives a candidate DOG layer of the image,
Obtaining the candidate DOG layer of the image as an intermediate layer,
A GPU for obtaining pole data indicative of one or more poles by comparing values of the candidate DOG layer with values of a lower DOG layer and a higher DOG layer; And
A buffer for storing the at least one pole data;
/ RTI >

8. The method of claim 7,
Further includes a shader core,
The shader core includes:
Receiving the stored one or more pole data,
And obtain feature point localization of the received pole data.

8. The method of claim 7,
The GPU includes:
Acquiring pole data of the candidate DOG layer and the lower DOG layer by comparing the value of the candidate DOG layer with a corresponding value of the lower DOG layer,
And obtains pole data of the candidate DOG layer, the lower DOG layer, and the upper DOG layer by comparing the pole data of the candidate DOG layer and the lower DOG layer with the value of the upper DOG layer.

8. The method of claim 7,
Wherein values of the DOG layer are stored in a buffer.

11. The method of claim 10,
Further includes a shader core,
Wherein the shader core computes the DOG layer based on tiles of the image.

8. The method of claim 7,
Wherein the at least one pole data comprises one of a maximum value and a minimum value.

A computer-readable recording medium storing a program for causing a computer to execute the method of any one of claims 1 to 6.