KR100598134B1

KR100598134B1 - Method and system for vector data compression using k-means clustering

Info

Publication number: KR100598134B1
Application number: KR1020050024399A
Authority: KR
Inventors: 박수홍; 이동헌; 전우제
Original assignee: 인하대학교 산학협력단
Priority date: 2005-03-24
Filing date: 2005-03-24
Publication date: 2006-07-10

Abstract

본 발명은 K평균 군집화 및 사전기반 압축 기법을 통해 공간 데이터가 차지하는 저장 공간의 크기를 효율적으로 줄일 수 있도록 하는 K평균 군집화를 이용한 벡터 데이터 압축 방법 및 시스템을 제공한다.The present invention provides a vector data compression method and system using K-average clustering to efficiently reduce the size of storage space occupied by spatial data through K-average clustering and dictionary-based compression.

이러한 본 발명은 각 기하 객체를 시작점 좌표와 디퍼런셜 벡터의 나열로 분리한 후, 상기 디퍼런셜 벡터를 길이와 각도로 각각 분리하고, 분리된 길이와 각도에 대하여 각각 K평균 군집화를 수행하여 군집 중심을 엔트리로 갖는 사전을 제작하며, 각각의 길이와 각도에 대하여 근사하여 사전의 엔트리를 가리키는 포인터로 변환하는 과정을 거쳐 모바일 환경에 적합한 벡터 데이터 압축 방법을 제공함을 특징으로 한다.The present invention divides each geometric object into a sequence of starting point coordinates and differential vectors, and then separates the differential vectors into lengths and angles, and performs K-average clustering for the separated lengths and angles, respectively, to enter a cluster center. It provides a vector data compression method suitable for a mobile environment through the process of making a dictionary having a function and converting it into a pointer indicating an entry of the dictionary by approximating each length and angle.

모바일, 디퍼런셜 벡터, K평균 군집화, 사전기반압축, 벡터 데이터 Mobile, differential vector, K-means clustering, dictionary-based compression, vector data

Description

TECHNICAL AND SYSTEM FOR VECTOR DATA COMPRESSION USING K-MEANS CLUSTERING

도 1은 사전기반 압축의 설명을 위한 라인 스트링의 예를 나타낸 도.1 illustrates an example of a line string for explaining dictionary-based compression.

도 2는 도 1의 사전 구성도.FIG. 2 is a preliminary diagram of FIG. 1.

도 3은 일반적인 군집화 과정을 나타낸 흐름도.3 is a flow chart illustrating a general clustering process.

도 4는 본 발명에 따른 K평균 군집화를 이용한 벡터 데이터 압축 방법의 전체적인 흐름도.4 is a general flowchart of a vector data compression method using K mean clustering according to the present invention;

도 5는 본 발명에서 인접 시작점이 같은 군집이 된 경우에 대한 설명도.5 is an explanatory diagram for the case where the adjacent starting point is the same cluster in the present invention.

도 6은 본 발명을 구현하기 위한 하드웨어 구성도.6 is a hardware block diagram for implementing the present invention.

도 7은 본 발명에서의 실험 데이터를 나타낸 도.7 shows experimental data in the present invention.

도 8은 도 7의 실험 데이터에 대한 디퍼런셜 벡터를 나타낸 도.FIG. 8 is a diagram illustrating differential vectors for the experimental data of FIG. 7. FIG.

도 9는 시작점을 제외한 디퍼런셜 벡터의 확대도.9 is an enlarged view of the differential vector excluding the starting point.

도 10은 본 발명의 실험에서 조합 가능한 벡터 사전을 나타낸 도.10 shows a vector dictionary which can be combined in an experiment of the present invention.

도 11은 종래 압축 방법에 따른 벡터 사전을 나타낸 도.11 illustrates a vector dictionary according to a conventional compression method.

도 12는 본 발명에 대한 결과 분석을 위한 표.12 is a table for analyzing the results for the present invention.

도 13 내지 도 15는 본 발명에서 각각 길이와 각도의 사전 크기를 늘려가면 서 실험한 결과를 나타낸 도.13 to 15 is a view showing the results of the experiment while increasing the pre-size of the length and angle in the present invention, respectively.

도 16는 종래 압축 방법에 대한 연구 결과 분석표.16 is a study result analysis table for the conventional compression method.

도 17은 압축률과 위치오차의 관계를 나타낸 그래프.17 is a graph showing the relationship between compression ratio and position error.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100 : 입력부 200 : 디퍼런셜 벡터 추출부100 input unit 200 differential vector extracting unit

300 : 압축부 400 : 제어부300: compression unit 400: control unit

500 : 저장부 510 : 시작점 좌표 DB500: storage unit 510: starting point coordinate DB

520 : 길이사전 DB 530 : 각도사전 DB520: length dictionary DB 530: angle dictionary DB

540 : 길이사전 포인터 DB 550 : 각도사전 포인터 DB540: length dictionary pointer DB 550: angle dictionary pointer DB

본 발명은 모바일 환경에서 사용되는 벡터 형태의 데이터를 압축하는 방법 및 시스템에 관한 것으로, 특히 K평균 군집화 및 사전기반 압축 기법을 통해 공간 데이터가 차지하는 저장 공간의 크기를 효율적으로 줄일 수 있도록 하는 K평균 군집화를 이용한 벡터 데이터 압축 방법 및 시스템에 관한 것이다.The present invention relates to a method and system for compressing data in a vector form used in a mobile environment, and in particular, a K-average for efficiently reducing the amount of storage space occupied by spatial data through K-average clustering and dictionary-based compression techniques. A vector data compression method and system using clustering.

최근 이동전화, PDA, 카 네비게이션 단말기 등 모바일 기기의 사용이 빠른 속도로 증가하고 있다. 이들 모바일 기기에서는 경로탐색, 지도 서비스 등을 위해서 공간 데이터의 사용이 필수적으로 요구된다.Recently, the use of mobile devices such as mobile phones, PDAs, car navigation terminals, etc. is increasing rapidly. In these mobile devices, the use of spatial data is indispensable for route searching and map services.

그러나 이러한 모바일 기기는 여전히 데스크톱 환경에 비하여 제한적인 연산 수행능력과 저장 공간의 한계가 존재한다. 따라서 레스터 데이터에 비해서 상대적으로 적은 저장 공간을 차지하는 벡터 데이터 조차도 여전히 큰 부담이 된다. However, these mobile devices still have limited computational performance and limited storage space compared to desktop environments. Therefore, even vector data, which takes up relatively little storage space compared to the raster data, is still a big burden.

모바일 환경에서 사용되는 벡터 형태의 데이터에는 실제로 거리 측정 또는 경로탐색을 위한 데이터와 배경으로 사용되는 맵 데이터가 있다. 맵 데이터의 경우에는 약간의 위치 오차가 포함되더라도 그 오차정도가 눈으로 구분할 수 없을 정도의 수준이라면 받아들여 질 수 있다. The vector data used in the mobile environment includes data for distance measurement or route search and map data for background. In the case of map data, even if a slight position error is included, it can be accepted if the error degree is indistinguishable from the eye.

현재까지 데이터 압축 분야에서 벡터 데이터에 대한 압축 기법은 레스터 데이터에 비하여 연구가 다양하지 못하였다. 대표적인 사전기반의 벡터 데이터 압축에 관한 연구로는 2000년에 발표한 "Design Algorithms for Vector Map Compression(D. Salomon. 'Data Compression: the Complete Reference.'Springer-Verlag, 2nd edition, 2000.)"가 있는데, 이 연구에서는 벡터 데이터를 각각의 디퍼런셜 벡터로 쪼개어서 FHM(Fibonacci, Huffman, and Markov)방법을 이용하여 미리 제작된 사전에 근사화하는 방법을 통하여 데이터 량을 줄이는 방법을 제시한다. 이 연구에서는 미리 제작된 사전을 사용하는 방식으로 사전을 제작하는 과정을 생략할 수 있지만 사전이 데이터의 특성을 반영할 수 없으므로 복원시에 많은 위치오차를 갖는다. To date, the compression technique for vector data in the field of data compression has not been as diverse as that of raster data. A typical dictionary-based study on vector data compression was published in 2000, "Design Algorithms for Vector Map Compression (D. Salomon. 'Data Compression: the Complete Reference.'Springer-Verlag, 2nd edition, 2000.)". In this study, we propose a method to reduce the amount of data by dividing the vector data into individual differential vectors and approximating them in advance using pre-fabricated methods using FHM (Fibonacci, Huffman, and Markov). In this study, the process of making a dictionary can be omitted by using a pre-made dictionary, but since the dictionary cannot reflect the characteristics of the data, it has a lot of positional errors in restoration.

또 다른 연구로는 데이터 특성을 찾아낼 수 있는 데이터 마이닝 기법 중 K평균 군집화 기법을 적용하여 사전을 제작하는 방법에 대하여 연구한 "Vector Map Compression: A Clustering Approach(Shashi Shekhar, Yan Huang, Judy Djugash, Changqing Zhou, 'Vector Map Compression: A Clustering Approach', Proceeding of the tenth ACM international symposium on Advances in geographic information systema, 2002, 74-80)"가 있다. 이 연구의 결과는 FHM방식을 이용하는 것보다 복원시 위치 정확도 측면에서 더 좋은 성능을 보이지만 실제 데이터에 적용했을 때 여전히 만족할 만한 위치 정확도를 얻을 수 없는 단점이 있었다.In another study, "Vector Map Compression: A Clustering Approach (Shashi Shekhar, Yan Huang, Judy Djugash, Changqing Zhou, 'Vector Map Compression: A Clustering Approach', Proceeding of the tenth ACM international symposium on Advances in geographic information systema, 2002, 74-80). The results of this study show better performance in terms of positional accuracy during restoration than the FHM method, but still have a disadvantage in that it is still not satisfactory when applied to actual data.

본 발명은 이러한 점을 감안한 것으로, 본 발명의 목적은 각 기하 객체의 상대적인 위치를 표현하는 디퍼런셜 벡터(Differential Vector)를 길이와 각도로 나누어 각각을 K평균 군집화 기법을 적용한 군집화로 사전을 제작함으로써 벡터 데이터에 대하여 최소의 연산을 통해 구분하기 어려운 정도의 위치 정확도 손실 차이를 가지면서 향상된 압축률을 얻을 수 있도록 한 K평균 군집화를 이용한 벡터 데이터 압축 방법 및 시스템을 제공함에 있다.The present invention has been made in view of the above, and an object of the present invention is to create a dictionary by dividing a differential vector representing a relative position of each geometric object by a length and an angle and clustering each of them by applying a K-average clustering technique. The present invention provides a method and system for compressing vector data using K-average clustering to obtain an improved compression ratio while having a difference in position accuracy loss that is difficult to distinguish through minimal computation.

상기 목적을 달성하기 위한 본 발명에 따른 K평균 군집화를 이용한 벡터 데이터 압축 방법은, 각 기하 객체를 시작점 좌표와 디퍼런셜 벡터의 나열로 분리하는 단계; 상기 디퍼런셜 벡터를 길이와 각도로 분리하는 단계; 상기 분리된 길이와 각도에 대하여 각각 K평균 군집화를 수행하여 군집의 평균값을 엔트리로 갖는 사전을 제작하는 단계; 각각의 길이와 각도에 대하여 근사화하여 사전의 엔트리를 가리 키는 포인터로 변환하는 단계;를 포함하는 것을 특징으로 한다.In accordance with an aspect of the present invention, a vector data compression method using K mean clustering comprises: separating each geometric object into a sequence of starting point coordinates and differential vectors; Separating the differential vector into length and angle; Performing a K-average clustering on the separated lengths and angles to produce a dictionary having an average value of the cluster as an entry; And approximating each length and angle to convert the pointer to a pointer indicating a dictionary entry.

또한, 본 발명의 K평균 군집화를 이용한 벡터 데이터 압축 시스템은, 입력부를 통해 입력되는 각 기하 객체에 대한 데이터로부터 각 기하 객체를 시작점 좌표와 디퍼런셜 벡터의 나열로 분리하는 디퍼런셜 벡터 추출부; 상기 디퍼런셜 벡터 추출부에서 추출된 디퍼런셜 벡터를 길이와 각도로 각각 분리하여 K평균 군집화를 적용하여 길이와 각도 각각에 대한 사전을 제작하고, 제작된 사전을 이용하여 각 개체를 개체가 속하는 군집의 대표 값을 가리키는 포인터로 변환하여 압축하는 압축부; 상기 디퍼런셜 벡터 추출부로부터의 시작점 좌표와 압축부로부터의 길이사전, 각도사전, 길이사전 포인터, 각도사전 포인터 등이 저장되는 각각의 데이터 베이스를 갖는 저장부; 및 상기 각부를 제어하는 제어부;로 구성되는 것을 특징으로 한다.In addition, the vector data compression system using the K mean clustering of the present invention, the differential vector extraction unit for separating each geometric object into a sequence of the starting point coordinates and the differential vector from the data for each geometric object input through the input unit; The differential vectors extracted by the differential vector extracting unit are separated into lengths and angles, respectively, and a K mean clustering is applied to produce a dictionary for each of lengths and angles. A compression unit that converts and compresses a pointer to a value; A storage unit having respective databases storing starting point coordinates from the differential vector extracting unit, a length dictionary, an angle dictionary, a length dictionary pointer, an angle dictionary pointer, etc., from the compression unit; And a control unit for controlling the respective units.

이하, 본 발명을 첨부된 도면을 참조로 하여 보다 상세하게 설명한다. 단, 하기 실시예는 본 발명을 예시하는 것일 뿐 본 발명의 내용이 하기 실시예에 한정되는 것은 아니다.Hereinafter, the present invention will be described in more detail with reference to the accompanying drawings. However, the following examples are merely to illustrate the present invention is not limited to the contents of the present invention.

우선, 본 발명에 적용되는 사전기반 압축 기법 및 K평균 군집화(K-Means Clustering) 기법에 대하여 살펴본다.First, a look at the dictionary-based compression technique and the K-Means Clustering technique applied to the present invention.

본 발명에서 사용될 사전 기반(Keith C.Clarke, 1990)의 접근 방법은 공간 데이터만을 위하여 개발된 접근 방법은 아니다. 여러 다른 분야에 적용되어 사용되고 있는 기술로, 본 발명에서는 벡터 데이터 압축 방법을 사전 기반의 접근 방법을 사용하여 설계한다. The dictionary-based approach (Keith C. Clarke, 1990) to be used in the present invention is not an approach developed solely for spatial data. As a technique applied to various other fields, the present invention designs a vector data compression method using a dictionary-based approach.

사전 기반의 접근 방법은 표현 하고자 하는 데이터를 엔트리로 하는 사전을 제작한 후, 실제 데이터에는 값(value)을 저장하는 것이 아닌 사전의 엔트리를 가리키는 포인터의 나열로서 데이터를 표현하는 방법이다.The dictionary-based approach is to create a dictionary with entries of the data to be expressed and then represent the data as a list of pointers to entries in the dictionary, rather than storing values in the actual data.

이를 벡터 데이터에 대하여 적용하면 다음과 같다. 하나의 라인 스트링(LineString)이 도 1과 같이 2차원 공간상에 존재 할 때, 이를 저장할 수 있는 방법은 여러 가지가 있다.Applying this to vector data is as follows. When a line string (LineString) exists in the two-dimensional space as shown in Figure 1, there are several ways to store it.

OGC의 표준(OGIS, 1999)에서 제시하는 방법으로는 타입 정보, 전체 포인트의 수 등과 함께 8바이트 크기를 갖는 x와 y의 배열을 이용하여 실제 좌표를 저장하고, 표현 방법으로는 'LINESTRING (60 120, 100 180, 200 100, 260 140, 160 160, 220 240)'과 같은 방법을 이용하여 표현한다. 이것을 사전 기반의 방법으로 저장하는 방법은 다음과 같다. 우선 LineString이 갖는 개별적인 점 좌표들을 엔트리로 갖는 도 2의 표와 같이 사전을 제작한다.The method proposed by the standard of OGC (OGIS, 1999) saves the actual coordinates using an array of x and y having 8 byte size together with type information and the total number of points. 120, 100 180, 200 100, 260 140, 160 160, 220 240). The method of storing this in a dictionary-based method is as follows. First, a dictionary is produced as shown in the table of FIG. 2 having individual point coordinates of LineString as an entry.

도 2의 표에서는 LineString이 가지고 있는 모든 점의 좌표의 중복을 제거하여 구성된 것이다. 그리고 LineString을 표현하는데에는 실제 좌표를 넣는 것이 아닌, 사전에 구성된 엔트리를 가리키고 있는 포인터의 배열로 나타낸다. 따라서 다음과 같이 '(1, 2, 3, 4, 5, 6)'으로 저장이 되어 있는 구조이다. 실제 이렇게 저장된 공간 객체를 사용하고자 할 때는 객체 좌표를 대신하고 있는 배열이 가리키는 사전의 엔트리를 검색하여 좌표로 표현하게 된다. In the table of FIG. 2, the coordinates of all points of the LineString are eliminated. LineStrings are not represented by actual coordinates, but rather by arrays of pointers to preconfigured entries. Therefore, the structure is stored as '(1, 2, 3, 4, 5, 6)' as follows. In fact, when you want to use this stored spatial object, it searches for the entry of the dictionary pointed to by the array that replaces the object coordinate and expresses it as coordinates.

이러한 데이터 구조를 사전 기반의 접근이라고 한다. 위의 예와 같이 단일 객체에 대하여 사전 기반 접근법을 이용하여 데이터 구조를 만든다 할지라도 전체 데이터에 대한 크기는 줄어들지 않는다. 사전을 구성하는 엔트리에 엔트리를 가리키는 포인터의 크기가 추가되어 전체 데이터 량은 단순히 좌표를 기록하는 방식보다 포인터의 크기만큼 데이터 량이 늘어난다. 본 발명에서는 다수의 공간 객체에 대하여 유사한 형태의 중복이 많이 일어나는 점을 엔트리로 제작하여 사전을 제작하여 데이터의 량을 줄이는 방법에 대하여 제시하게 된다.This data structure is called a dictionary-based approach. As shown in the example above, even if you create a data structure using a dictionary-based approach to a single object, the size of the entire data is not reduced. Since the size of the pointer pointing to the entry is added to the entry constituting the dictionary, the total amount of data is increased by the size of the pointer rather than simply recording coordinates. The present invention proposes a method of reducing the amount of data by making a dictionary by making entries having a lot of similar forms of overlap for a plurality of spatial objects.

다음은 K평균 군집화에 대하여 살펴본다.Next, we look at the K-mean clustering.

군집분석은 대규모의 데이터베이스에서 의미 있는 정보를 추출해 분석하는 데이터 마이닝 기법 중 한 가지이다. 여기에서 군집이라는 것은 유사한 데이터들의 집합이다(Jiawei Jan, Micheline Kanber, 2000). 즉, 하나의 군집에는 유사한 데이터들이 포함되고 다른 군집에는 유사도가 떨어지는 데이터들의 집합이 되도록 만들어 주는 것이 군집화이다. 따라서 군집분석은 서로 섞여있는 이질적인 대상을 유사도(Similarity)에 의해 몇 개의 균질적인 군집(Cluster)으로 분류하는 분석방법이다.Cluster analysis is one of the data mining techniques that extracts and analyzes meaningful information from large databases. Clustering here is a set of similar data (Jiawei Jan, Micheline Kanber, 2000). In other words, clustering is such that one cluster contains similar data and the other cluster is a set of less similar data. Therefore, cluster analysis is an analysis method that classifies heterogeneous objects mixed into several homogeneous clusters by similarity.

군집화 과정은 도 3과 같이 4단계의 과정으로 이루어진다.Clustering process is a four-step process as shown in FIG.

첫 번째 변수 측정의 단계에서는 개체들을 군집화 하는데 이용될 수 있는 각 개체의 특성을 측정하는 변수들을 구한다. 즉, n개의 개체에 대하여 m개의 변수를 측정한다. m은 개체가 가지고 있는 변수들 중 실제 군집화에 사용할 변수들의 수가 된다. 두 번째 단계에서는 유사성을 측정하는 과정인데, 측정한 m개의 변수를 이용하여 전체 개체들 사이의 거리(Distance) 또는 비유사성(Dissimilarity)을 계산하는 과정이다. 개체들 사이의 거리를 측정하는 방법은 군집화 방법에 따라서 다르게 계산될 수 있다. 이렇게 모든 개체들 사이의 거리를 계산하여 비유사성을 나타내는 거리 행렬을 만든다. 비유사성은 값이 작을수록 두 개체사이가 가깝다. 개체 사이가 가깝다는 것은 개체들에서 군집화를 위해 측정한 m개의 변수들이 유사한 특성을 갖는다는 것이다. 세 번째 단계에서는 선정된 군집화 방법으로 계산된 비유사성 행렬을 이용하여 거리가 가까운 개체들을 하나의 군집으로 묶어주는 과정이다. 마지막 분석의 단계는 각 군집의 성격, 상호 관계를 판단하는 과정이다. 이 과정은 군집화 과정이 완료된 결과를 통하여 개별 군집이 어떠한 의미를 갖는지, 군집들 사이의 관계가 어떠한지를 판단하는 분석과정이다.In the first stage of variable measurement, variables are determined that measure the characteristics of each individual that can be used to cluster the individuals. That is, m variables are measured for n individuals. m is the number of variables in the object to be used for actual clustering. The second step is to measure the similarity, which is to calculate the distance or dissimilarity between the entire entities using the m measured variables. The method of measuring the distance between the objects may be calculated differently according to the clustering method. In this way, the distance between all the objects is calculated to form a distance matrix that represents dissimilarity. Dissimilarity, the smaller the value, the closer the two individuals are. Closeness between individuals means that the m variables measured for clustering in individuals have similar characteristics. In the third step, the dissimilarity matrix computed by the selected clustering method is used to group objects with close distance into one cluster. The final step of the analysis is the process of determining the nature and interrelationship of each community. This process is an analysis process to determine what the individual clusters mean and the relationship between the clusters through the result of the clustering process.

군집분석에는 크게 계층적 군집화 기법과 비 계층적 군집화기법으로 나눌 수 있다.Cluster analysis can be divided into hierarchical clustering technique and non-hierarchical clustering technique.

비 계층적 군집화 기법은 주어진 n개의 개체를 선정된 분할 방법을 이용하여 k개의 군집으로 분할하는 방법이다. 비 계층적 군집화 방법에는 몇 가지 조건이 필요한데 첫 번째는 항상 k가 n 보다 작아야 한다는 것이다. 최종 군집의 수가 개체의 수보다 작아야 하고 각 군집에는 최소한 한 개 이상의 개체가 포함되어야 한다. 마지막으로 각 개체는 하나의 군집에만 속해야 한다는 것이다. 비 계층적 군집화 기법 중 대표적인 것으로는 K평균 군집화 기법이 있다.Non-hierarchical clustering is a method of dividing a given n objects into k clusters using a predetermined partitioning method. Non-hierarchical clustering requires several conditions. The first is that k must always be less than n. The number of final clusters must be less than the number of individuals and each cluster must contain at least one individual. Finally, each individual must belong to only one cluster. A representative of the non-hierarchical clustering techniques is the K-average clustering technique.

계층적 군집화 기법은 주어진 개체 집합을 계층적으로 분해하는 군집화 기법이다. 계층적 군집화 기법에는 세부적으로 상향식(Bottom-Up)방법과 하향식(Top-Down)의 두 가지 방법이 있다. 상향식 방법은 각각의 개체를 다른 군집으로 할당하고 유사성을 측정하여 유사성이 높게 나타나는 군집을 하나로 병합하는 방법으로 최종 군집이 한 개로 병합되거나 주어진 조건을 만족할 때 까지 반복하는 방법이다. 하향식 방법은 상향식과 반대로 모든 개체를 하나의 군집으로 시작하여 하나의 군집이 하나의 개체로만 이루어지거나 주어진 조건을 만족할 때 까지 분할하는 과정을 반복하여 군집화 하는 방법이다. 대표적인 계층적 군집화 기법에는 단일 결합법, 완전 결합법, 평균 결합법, Ward 법 등이 있다. 계층적 방법은 초기의 부적절한 병합이나 분할이 일어났을 경우 회복할 수 없다는 단점이 있다.Hierarchical clustering is a clustering technique that hierarchically decomposes a given set of objects. There are two hierarchical clustering techniques, a bottom-up method and a top-down method. Bottom-up method is to assign each individual to a different cluster, measure similarity, and merge the clusters with high similarity into one, and repeat until the final cluster is merged into one or satisfies a given condition. The top-down method is a method of clustering by repeating the process of dividing all the objects into one cluster and splitting the cluster until one cluster consists of only one object or satisfies a given condition. Representative hierarchical clustering techniques include single joining, full joining, mean joining, and Ward. Hierarchical methods have the disadvantage that they cannot be recovered if an initial improper merge or split occurs.

본 발명에 적용될 K평균 군집화 기법은 MacQueen(MacQueen, 1967)에 의해 개발된 비 계층적 군집법이다. 즉 n개의 개체를 k개의 군집으로 유사성이 높은 것을 하나의 군집으로 묶어주는 분할하는 방법이다. The K-average clustering technique to be applied to the present invention is a non-hierarchical clustering method developed by MacQueen (MacQueen, 1967). In other words, it divides n entities into k clusters and divides them into one cluster.

첫 번째 단계는 입력 인자로 군집 중심 K개를 입력받고 입력받은 K 만큼의 군집 중심을 선정하는 과정이다. 초기 군집중심은 임의로 선정된다. 다음 단계는 각각의 개체를 선정된 군집에 할당하는 과정이다. 이 단계에서는 유사성을 계산하여 유사성이 가장 높은 군집으로 할당한다. 유사성을 계산하는 방법은 여러 가지가 있는데, K평균 군집화 기법에서는 각 군집에 할당된 개체들의 평균을 이용하여 새로운 군집의 중심값을 계산한다. 2차원 이상의 변수에 대한 군집화에 대해서는 평균값 대신 무게중심(Centroid)값을 이용하여 군집 중심을 재계산 한다. 이러한 과정을 주어진 조건을 만족하거나 군집 중심의 이동이 없어질 때 까지 반복한다.The first step is to select K cluster centers as input parameters and select K cluster centers. Initial cluster centers are chosen arbitrarily. The next step is to assign each individual to the selected cluster. In this step, similarity is calculated and assigned to the cluster with the highest similarity. There are several ways to calculate the similarity. The K-average clustering method calculates the center value of a new cluster using the average of the objects assigned to each cluster. For clustering of two or more variables, the center of cluster is recalculated using the centroid instead of the mean. This process is repeated until either the given conditions are met or there is no shift in the center of the cluster.

군집화 과정 중 두 번째 단계의 개체를 가장 유사한 군집에 할당 하는 방법은 여러 가지가 있다. Minkowski 거리, Euclidean 거리, 표준화 거리, Mahalanobis 거리 등이 개체 사이의 유사성을 측정하는 도구로 사용이 가능하다. K평균 군집화 기법을 적용하는 데에 가장 많이 사용되는 유사도 측정 도구로는 Euclidean 거리이다. 본 발명에서는 2차원 공간상의 기하학적인 좌표를 이용하여 연산을 해야 하기 때문에 Euclidean 거리를 유사도 측정의 도구로 이용하였다. Euclidean 거리를 계산하는 식을 n 차원에 공간에서 적용가능 하도록 확장한 식은 다음과 같다.There are several ways to assign the second stage of the clustering process to the most similar cluster. Minkowski distance, Euclidean distance, standardized distance, and Mahalanobis distance can be used as tools to measure similarity between objects. Euclidean distance is the most commonly used measure of similarity in applying the K-average clustering technique. In the present invention, the Euclidean distance is used as a tool for measuring similarity because the computation must be performed using geometric coordinates in two-dimensional space. The equation for calculating Euclidean distance is extended to be applicable in space in n dimension.

이 거리 계산을 이용하면 개체와 군집의 유사성을 측정한 후 개체와 유사성이 가장 높은 군집으로 개체를 할당하여 군집을 재계산 한다. 이러한 과정을 재 할당이 일어나지 않거나 주어진 조건을 만족할 때 까지 반복하여 최종 결과를 얻는다. Using this distance calculation, the similarity between the individual and the cluster is measured, and the cluster is recalculated by assigning the object to the cluster having the highest similarity with the individual. This process is repeated until reassignment does not occur or a given condition is met to obtain the final result.

K평균 군집화 기법은 전체 데이터의 내부적인 구조에 대한 사전 지식이 없어도 의미있는 정보를 찾아낼 수 있다는 장점이 있다(Anil K. Jain, M. Narasimha Murty, 1999). 또한, 관찰 값과 군집중심 사이의 거리 관계를 데이터의 형태에 맞게 정의한다면 대부분의 형태의 데이터에 적용이 가능하다. 초기의 잘못된 군집에 개체가 속하더라도 반복을 통하여 타당한 군집으로 재 할당이 이루어진다. K평균 군집화 기법은 초기값 K 이외에 다른 사전정보를 요구하지 않으므로 적용하기에 용 이하다. 하지만 초기값 K를 적합하게 선정하지 못한다면, 만족할만한 군집화 결과 얻을 수 없다. 또 비유사성 거리를 정의하는 과정에서 여러 가지 자료유형의 측정 척도(Measurement scale)(Jay lee, David w. s. Wong, 2000)가 상이한 경우 하나의 거리로서 정의하는 것이 어렵다. 또한 사전에 주어진 군집 목적이 없으므로 결과 해석에 어려움이 따르기도 한다.The K-average clustering technique has the advantage of finding meaningful information without prior knowledge of the internal structure of the entire data (Anil K. Jain, M. Narasimha Murty, 1999). In addition, if the distance relation between the observed value and the cluster center is defined according to the data type, it can be applied to most types of data. Even if an object belongs to the initial wrong cluster, iteration is reassigned to a valid cluster. The K-average clustering technique is easy to apply because it does not require any prior information other than the initial value K. However, if the initial value K is not properly selected, satisfactory clustering results cannot be obtained. In the process of defining dissimilarity distances, it is difficult to define them as one distance if the measurement scale of different data types (Jay lee, David w. S. Wong, 2000) is different. There is also difficulty in interpreting the results because there is no prior clustering objective.

본 발명은 모바일 환경에서 배경으로 사용되는 맵 데이터에 대하여 상기와 같은 K평균 군집화 기법을 이용한 손실 압축 기법을 이용함과 더불어 시전기반의 접근방법을 통해 전체 데이터에 대한 압축률 보다는 실제 사용가능하도록 데이터의 손실률을 최소화하는 방향으로 압축하며, 압축률과 데이터 손실에 따르는 위치 정확도 관계에서 위치 정확도를 높일 수 있도록 하며, 이러한 개념을 바탕으로 하는 본 발명을 도 4의 동작 흐름도를 참조로 설명한다.The present invention uses the lossy compression method using the K-average clustering technique for the map data used as the background in the mobile environment, and the loss rate of the data so that it can be actually used rather than the compression rate for the entire data through the cast-based approach. Compression in the direction of minimizing and increasing the position accuracy in relation to the compression accuracy and the positional accuracy due to data loss, the present invention based on this concept will be described with reference to the operation flowchart of FIG.

먼저, 벡터 데이터를 압축하는 데에 K평균 군집화 기법을 적용하기 위해서는 기하 객체를 시작점 좌표와 디퍼런셜 벡터의 나열로 분리하는 과정이 필요하다.First, in order to apply the K-means clustering technique to compress vector data, a process of separating geometric objects into starting point coordinates and a sequence of differential vectors is required.

디퍼런셜 벡터는 현재 점의 위치를 표현하기 위하여 이전 점의 좌표 혹은 해당 객체가 시작하는 점의 좌표와의 차이를 이용하여 상대적인 위치를 표현하는 벡터이다. 본 발명에서는 개별 좌표의 해당 좌표와 이전 좌표의 차이를 이용하여 디퍼런셀 벡터를 추출한다(S100). The differential vector is a vector representing a relative position by using a difference between the coordinate of the previous point or the coordinate of the point where the object starts to express the position of the current point. In the present invention, the difference cell vector is extracted using the difference between the corresponding coordinates of the individual coordinates and the previous coordinates (S100).

그리고 추출된 디퍼런셜 벡터를 길이와 각도로 분리하여 각각에 대하여 K평균 군집화를 적용하여 사전을 제작한다(S200,S300). Then, the extracted differential vectors are separated into lengths and angles, and a dictionary is produced by applying K average clustering to each of them (S200 and S300).

이때, 디퍼런셜 벡터의 길이와 각도에 대하여 K평균 군집화 기법을 이용하여 군집화를 수행하며, 군집내의 할당된 개체들을 이용하여 중심을 계산한다. 본 발명에서는 2차원의 디퍼런셜 벡터를 1차원의 길이와 각도로 분리하였으므로 각각의 평균값을 이용하여 중심을 계산한다. 이렇게 계산된 군집내의 대표 값을 엔트리로 하는 사전을 제작한다. 그리고 사전기반 압축 기법으로 상기 제작된 사전을 이용하여 각 개체를 개체가 속하는 군집의 대표 값을 가리키는 포인터로 바꾸어 준다(S400).At this time, clustering is performed on the length and angle of the differential vector using the K-average clustering technique, and the center is calculated by using the allocated objects in the cluster. In the present invention, since the two-dimensional differential vector is separated into one-dimensional length and angle, the center is calculated using each average value. A dictionary is created with entries of representative values in the cluster calculated in this way. The object is converted into a pointer indicating a representative value of a cluster to which the object belongs by using the prepared dictionary using a dictionary-based compression technique (S400).

이러한 일련의 과정을 통하여 압축된 데이터로는 공간 객체의 절대적인 위치를 표현하는 개체의 시작점 집합, 시작점을 기준으로 상대적인 위치를 표현하는 디퍼런셜 벡터의 길이와 각도의 사전, 또 이를 가리키고 있는 두 개의 포인터 배열 집합 등이다.Compressed data through this series of processes is the set of the starting point of the object representing the absolute position of the spatial object, the dictionary of the length and angle of the differential vector representing the relative position from the starting point, and the array of two pointers pointing to it. And so on.

상기에서 공간 객체의 시작점의 경우 아무런 처리를 하지 않고 그대로를 저장한다. 디퍼런셜 벡터가 개별 점 데이터의 상대적인 위치를 표현한다면, 객체의 시작점은 좌표상의 절대적인 위치를 가지고 있는 값이다. 절대적인 위치를 표현하는 기준과 각각의 상대좌표를 나타내는 디퍼런셜 벡터가 함께 존재 해야만 공간 객체를 재구성 할 수 있다.In the above case, the start point of the spatial object is stored as it is without any processing. If the differential vector represents the relative position of individual point data, the starting point of the object is a value with an absolute position in coordinates. The spatial object can be reconstructed only when there is a reference for absolute position and a differential vector for each relative coordinate.

이러한 시작점까지도 함께 군집화를 적용한다고 할 때, 객체의 개수 즉 시작점의 개수보다 군집의 개수가 단 한 개라도 적게 된다면 도 5와 같은 두 개 이상의 폴리곤이 하나로 합쳐져 나타나게 된다.When the clustering is applied to the starting point as well, if the number of clusters is smaller than the number of objects, that is, the starting point, two or more polygons as shown in FIG. 5 are combined into one.

객체의 절대적인 위치를 표현할 수 있는 좌표까지도 다른 상대적인 위치를 나타내는 디퍼런셜 벡터와 함께 군집화를 적용할 때 전체 객체의 수 이하로 군집의 수를 결정한다면 전체 데이터에 대한 위치 오차는 줄어들 수 도 있지만 도 5와 같 은 현상으로 인하여 데이터에 대한 왜곡이 심하게 되어 사용이 불가능한 데이터가 된다. 절대적인 위치를 표현하는 점까지 군집화를 적용하면서 군집의 수를 객체의 수 이상으로 늘려준다면 참조해야 하는 사전의 엔트리 수가 많아지게 된다. 사전의 엔트리 수가 늘어나게 되면 그에 따라서 사전을 가리키고 있는 포인터의 크기가 커져야 한다. 예를 들면 사전의 엔트리 수가 256개 일 경우에는 포인터의 크기가 최소 8비트 로서 사전의 엔트리를 참조할 수 있는 반면 엔트리의 수가 512개로 늘어날 경우 9비트의 포인터가 필요하다.If the number of clusters is determined to be less than or equal to the total number of objects when clustering is applied together with differential vectors representing other relative positions even coordinates that can represent the absolute position of the object, the position error for the entire data may be reduced. Due to the same phenomenon, the data is severely distorted and becomes unusable data. If you apply clustering to a point that represents an absolute position and increase the number of clusters beyond the number of objects, the number of entries in the dictionary to be referred to increases. As the number of entries in the dictionary increases, the pointer to the dictionary must grow in size accordingly. For example, if the number of entries in the dictionary is 256, the pointer size can refer to the dictionary entry as at least 8 bits, whereas if the number of entries increases to 512, a 9-bit pointer is required.

또한, 본 발명에서 군집화를 적용하기 전에 디퍼런셜 벡터를 길이와 각도로 분리하는 이유는 다음과 같다.In addition, the reason for separating the differential vector into length and angle before applying the clustering in the present invention is as follows.

첫 번째로는 디퍼런셜 벡터 추출 과정에서 각각의 벡터가 도로와 유사한 각도로 많이 분포하기 때문이다. 이러한 경우 유사한 값을 하나의 값으로 묶어주는 군집화 기법을 적용하였을 때 더 위치 정확도 측면에서 이익을 볼 수 있기 때문이다. 두 번째로는 사전을 형성하였을 때, 가질 수 있는 값의 경우의 수가 많아진다는 것이다. 2차원 공간에서 군집화를 하여 10개의 엔트리를 갖는 사전을 만들었다면 실제 가질 수 있는 값 역시도 10개 이다. 하지만 두 개의 인자로 나누어서 각각 5개의 엔트리를 갖는 사전 두 개를 제작하였다면, 실제 표현할 수 있는 값은 5 x 5개가 된다. 따라서 더 적은 사전을 가지고도 더 많은 표현이 가능하다는 점에서 분리하였다. 마지막으로 높은 위치 정확도가 요구되는 맵 데이터를 손실 압축할 경우 군집의 수 K가 커지게 된다. 이러한 경우 압축 과정에서 요구되는 비용이 높아지는 이유에서 길이와 각도를 분리하여 군집화 과정을 거치게 된다.First, in the differential vector extraction process, each vector is distributed at a similar angle to the road. In this case, when the clustering technique of grouping similar values into one value is applied, it is more advantageous in terms of location accuracy. Second, when the dictionary is formed, the number of possible values increases. If you create a dictionary with 10 entries by clustering in two-dimensional space, you can actually have 10 values. However, if you create two dictionaries with five entries each divided by two arguments, the actual representable value is 5 x 5. Therefore, it is separated in that more expression is possible with less dictionary. Finally, the lossy compression of map data requiring high positional accuracy results in a large number of clusters K. In this case, because the cost required for the compression process is increased, the length and the angle are separated to perform the clustering process.

도 6은 이러한 본 발명을 구현하기 위한 하드웨어 구성도를 도시한 것으로, 입력부(100)를 통해 기하 객체에 대한 데이터가 입력되면 디퍼런셜 벡터 추출부(200)에서 해당 기하 객체를 시작점 좌표와 디퍼런셜 벡터의 나열로 분리하게 되며, 압축부(300)에서 디퍼런셜 벡터를 길이와 각도로 각각 분리하여 K평균 군집화를 적용하여 길이와 각도 각각에 대한 사전을 제작하고, 제작된 사전을 이용하여 각 개체를 개체가 속하는 군집의 대표 값을 가리키는 포인터로 변환하여 압축을 행하게 되며, 이러한 과정에서 생성되는 시작점 좌표, 길이사전, 각도사전, 길이사전 포인터, 각도사전 포인터 등이 제어부(400)의 제어에 따라 저장부(500)의 시작점 좌표 DB(510), 길이사전 DB(520), 각도사전 DB(530), 길이사전 포인터 DB(540), 각도사전 포인터 DB(550) 등의 데이터 베이스에 각각 저장되게 된다.FIG. 6 is a diagram illustrating a hardware configuration for implementing the present invention. When data about a geometric object is input through the input unit 100, the differential vector extracting unit 200 converts the geometric object into a starting point coordinate and a differential vector. The compression unit 300 separates the differential vectors into lengths and angles, and then applies a K-average clustering to produce a dictionary for each of lengths and angles. Compression is performed by converting a pointer to a representative value of a cluster to which the cluster belongs, and a starting point coordinate, a length dictionary, an angle dictionary, a length dictionary pointer, an angle dictionary pointer, etc. generated in this process are stored under the control of the controller 400. The starting point coordinate DB (510), length dictionary DB (520), angle dictionary DB (530), length dictionary pointer DB (540), angle dictionary pointer DB (550), etc. Each will be stored.

한편, 상기와 같은 과정을 거쳐 압축이 완료된 데이터를 압축 과정의 역 과정을 통하여 원 데이터로 재구성할 수 있다. 이를 위해 모바일 기기에는 상기 압축 과정의 역과정을 수행하는 복원부를 구비하게 되며, 상기와 같이 압축된 데이터를 수신받아 이를 저장 공간에 저장하여 복원부를 통해 복원하게 된다.On the other hand, the data that has been compressed through the above process can be reconstructed into the original data through the reverse process of the compression process. To this end, the mobile device is provided with a restoring unit that performs the reverse process of the compression process, and receives the compressed data as described above, stores it in a storage space, and restores it through the restoring unit.

즉, 모바일 기기의 복원부에서는 모바일 기기의 저장 공간으로부터 각각의 길이와 각도 사전을 가리키고 있는 포인터를 실제 길이와 각도를 갖도록 엔트리를 가져온다. 이 과정을 거치면 포인터의 배열 집합이 아닌 실제 길이와 각도의 배열 집합을 얻을 수 있게 된다 이렇게 얻어진 길이와 각도를 이용하여 계산하면 점의 상대적인 위치 좌표를 가지고 있는 디퍼런셜 벡터의 집합으로 재구성이 된다. 이렇게 만들어진 디퍼런셜 벡터와 공간 객체의 절대적인 위치 좌표를 갖는 객체의 시작 점을 이용하여 원래의 공간 객체로의 재구성이 가능하다.That is, the restoring unit of the mobile device obtains an entry from the storage space of the mobile device so that the pointer indicating the respective length and angle dictionary has the actual length and angle. This process yields an array of arrays of actual lengths and angles rather than arrays of pointers. When calculated using these lengths and angles, they are reconstructed into sets of differential vectors with relative position coordinates. By using the differential vector and the starting point of the object having absolute position coordinates of the spatial object, the original spatial object can be reconstructed.

상기와 같은 본 발명의 실험예를 살펴본다.Look at the experimental example of the present invention as described above.

연구지역은 서울시 양천구와 강서구 일대(8.8km x 10.1km)를 선정하였고, 실험에 사용된 데이터는 연구 지역의 1/1,000 수치지도에서 건물 코드를 갖는 지형지물을 추출한 데이터이다. 총 72016개의 폴리콘 데이터이며, 전체 점의 개수는 566607개로, 실험 데이터는 도 7에 도시한 바와 같다.The research area was selected around Yangcheon-gu and Gangseo-gu, Seoul (8.8km x 10.1km), and the data used in the experiment was extracted from features of building code from 1 / 1,000 numerical map of the study area. A total of 72016 polycon data, the total number of points is 566607, the experimental data as shown in FIG.

먼저, 실험 데이터에 대하여 군집기법을 적용하기 위한 디퍼런셜 벡터를 추출하였다. 연구에서 사용한 디퍼런셜 벡터는 현재 점 좌표에서 이전 점 좌표와의 차이를 이용하여 추출하고, 각 객체의 시작점의 경우에는 원점과의 차이를 이용하였다. First, differential vectors for applying clustering techniques to the experimental data were extracted. The differential vector used in this study was extracted from the current point coordinates using the difference from the previous point coordinates, and in the case of the starting point of each object, the difference from the origin point was used.

다음 과정은 이렇게 계산된 디퍼런셜 벡터에서 각 폴리곤 데이터의 시작점 벡터를 따로 저장하는 것이다. 도 8의 디퍼런셜 벡터 중심부분에 시작점을 제외한 나머지 디퍼런셜 벡터가 전체 점의 수에서 폴리곤수를 뺀 만큼 밀집되어 있는 것을 확인할 수 있다. 데이터가 시작점을 제외한 디퍼런셜 벡터의 평균 길이보다 커지는 넓은 지역을 포함할 경우 벡터의 길이와 각도로 군집화를 수행하더라도 좋은 결과를 얻을 수 없었다.The next step is to store the starting vector of each polygon data separately from the calculated differential vector. It can be seen that the differential vectors excluding the starting point are concentrated in the central portion of the differential vector of FIG. 8 by subtracting the number of polygons from the total number of points. If the data included a large area larger than the average length of the differential vector except for the starting point, clustering by the length and angle of the vector did not yield good results.

도 9는 시작점을 제외한 디퍼런셜 벡터의 확대도로, 이렇게 얻어진 결과를 각각 길이와 각도로 분리하여 K평균 군집화 기법 및 사전기반 압축기법을 적용하여 두개의 사전과 포인터를 얻는 과정으로서 압축이 된다.9 is an enlarged view of the differential vector excluding the starting point, and the obtained results are separated into lengths and angles, respectively, and are compressed as a process of obtaining two dictionaries and pointers by applying the K-average clustering technique and the dictionary-based compressor technique.

실험에서 K평균 군집화 과정은 통계 패키지인 SPSS v10을 이용하여 수행하였 다. 초기값 K를 거리와 각도 모두 256, 512, 1024개로 늘려가면서 군집화하였다. 도 10은 이와 같이 하여 생성된 거리와 각도의 사전으로 조합 가능한 경우의 수를 표현한 것이다.In the experiment, K-means clustering was performed using SPSS v10, a statistical package. The initial value K was clustered by increasing the distance and angle to 256, 512, and 1024. Fig. 10 shows the number of cases where the distance and angle generated in this way can be combined in advance.

도 10에서 좌로부터 각도, 길이의 사전 수가 각각 256, 512, 1024개로서 가질 수 있는 모든 경우의 수는 65536, 262144, 1048576개가 된다. 이는 도 9에서 나타나는 디퍼런셜 벡터가 도 10의 사전에 가장 가까운 값으로 근사화될 때 사전이 클수록 적은 오차를 포함할 수 있는 것이다. In FIG. 10, the number of all cases that can have 256, 512, and 1024 dictionaries from the left as the number is 65536, 262144, and 1048576, respectively. This means that when the differential vector shown in FIG. 9 is approximated to a value closest to the dictionary of FIG. 10, the larger the dictionary, the smaller the error may be included.

도 11은 기존의 연구 방법으로 실험 데이터에 적용하였을 때 가질 수 있는 사전의 분포로, 도 11은 각각 좌로부터 256, 512, 1024개의 벡터를 갖는 사전을 표현한 것이다. 이는 동일한 데이터 량을 갖는 사전을 두가지 다른 방식으로 제작하였을 때 사전이 표현할 수 있는 벡터수의 차이이다. FIG. 11 is a distribution of dictionaries that can be applied to experimental data using conventional research methods, and FIG. 11 represents a dictionary having 256, 512, and 1024 vectors from left. This is the difference in the number of vectors the dictionary can represent when a dictionary with the same amount of data is produced in two different ways.

각 디퍼런셜 벡터를 사전에 근사화하는 방법으로 압축된 데이터를 복원과정을 거쳐 원본과의 비교를 통해 데이터 손실에 따르는 위치 오차 정도, 최종 데이터 크기, 압축률을 계산해 보았다. 각각 사전 크기를 65536, 262144, 1048576개로 늘려가며 실험하였다. 압축된 데이터의 크기는 폴리곤의 시작점을 저장하는 부분, 디퍼런셜 벡터의 거리와 각도를 대표하는 두개의 사전, 사전을 가리키는 두개의 포인터의 다섯 부분을 합한 것이다. 또 압축률은 In order to approximate each differential vector in advance, the compressed data was recovered and compared with the original to calculate the positional error, final data size, and compression rate due to data loss. Experiments were made by increasing the dictionary sizes to 65536, 262144, and 1048576, respectively. The size of the compressed data is the sum of the parts storing the starting point of the polygon, the two dictionaries representing the distance and angle of the differential vector, and the five parts of the two pointers to the dictionaries. The compression rate

압축률(％) = ((원본 데이터 크기 - 압축 데이터 크기)/ 원본 데이터 크기)X100의 식으로 계산하였다. 원본 데이터의 크기는 9065712바이트이다.Compression ratio (%) = ((original data size-compressed data size) / original data size) was calculated by the formula. The size of the original data is 9065712 bytes.

도 12는 결과 분석을 위한 표로, 세 번의 실험에서 각각 위치 오차 수준이 상당히 작은 수준으로 나타나는 것을 볼 수 있다. 12 is a table for analyzing the results, and it can be seen that the position error level is significantly smaller in each of three experiments.

도 13, 도 14, 도 15는 각각 길이와 각도의 사전 크기를 늘려가면서 실험한 결과로, 도 13은 사전크기 256X256이고, 도 14는 사전크기 512X512이며, 도 15는 사전크기 1024X1024이다. 도 15의 경우 데이터 크기를 약 25%정도로 줄이면서 정확도를 유지하는 결과를 보였다.13, 14 and 15 are the results of experiments while increasing the dictionary size of the length and angle, respectively, Figure 13 is a dictionary size 256X256, Figure 14 is a dictionary size 512X512, Figure 15 is a dictionary size 1024X1024. In the case of FIG. 15, the data size was reduced to about 25% while maintaining accuracy.

본 발명에서 제시한 방법의 결과와 유사한 결과를 보이도록 기존 연구의 사전 크기를 조절하였다. 또한, 인접한 두 객체가 합쳐지지 않도록 시작점을 사전에 포함하도록 제작하여 실험하였다. 실험결과는 도 16의 표와 같다. 3번의 실험에서 사전의 크기를 72272, 82528, 73040개로 늘려가면서 실험을 하였다. 사전은 시작점 72016개를 포함하고 각각 256, 512, 1024개의 초기값을 이용하여 K평균 군집화를 통한 결과이다. 도 12과 도 16의 표에서 알 수 있듯이 본 발명의 압축률이 높음을 알 수 있다.The prior size of the existing study was adjusted to show results similar to those of the method presented in the present invention. In addition, experiments were made to include the starting point in advance so that two adjacent objects do not merge. The experimental results are shown in the table of FIG. In three experiments, experiments were carried out with increasing the size of the dictionary to 72272, 82528, and 73040. The dictionary is the result of K mean clustering using 72016 starting points and using 256, 512, and 1024 initial values, respectively. As can be seen from the tables of FIG. 12 and FIG. 16, it can be seen that the compression ratio of the present invention is high.

도 17은 압축률과 위치오차의 관계를 나타낸 것으로, 기존 연구의 결과와의 비교에서 비슷한 수준의 압축률을 보이는 경우 더 향상된 위치 정확도를 보이고 있다.FIG. 17 shows the relationship between the compression rate and the positional error, and shows better positional accuracy when the compression rate is similar to that of the previous studies.

상기에서 살펴본 바와 같이, 본 발명은 하나의 객체 내에서 상대적 위치를 나타내는 디퍼런셜 벡터의 다양한 경우를 포괄하기 위하여 길이와 각도 두 인자로 분리하여 군집화를 수행하고, 좌표계 내에서 절대적인 위치를 정의하기 위하여 각각의 공간 객체에서 하나의 점, 본 발명에서는 시작점의 변형을 가하지 않고 그대로 저장함으로써 압축률을 떨어트리지 않고 위치 정확도를 향상시킬 수 있으며, 벡 터 데이터에 대하여 손실을 최소화하면서 데이터의 크기는 약 25% 수준으로 낮출 수 있게 된다.As described above, the present invention performs clustering by dividing the length and angle into two factors to cover various cases of the differential vector representing the relative position in one object, and to define the absolute position in the coordinate system. In the present invention, one point of the spatial object can be stored as it is without modification of the starting point, thereby improving the positional accuracy without reducing the compression rate. The data size is about 25% while minimizing the loss of the vector data. Can be lowered.

상술한 바와 같이, 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위내에서 본 발명을 다양하게 수정 또는 변경하여 실시할 수 있다. As described above, although described with reference to a preferred embodiment of the present invention, those skilled in the art various modifications of the present invention without departing from the spirit and scope of the invention described in the claims below Or it can be changed.

이상에서 살펴본 바와 같이, 본 발명에 따른 K평균 군집화를 이용한 벡터 데이터 압축 방법 및 시스템은, 각 기하 객체의 상대적인 위치를 표현하는 디퍼런셜 벡터를 길이와 각도로 나누어 각각을 K평균 군집화 기법을 적용하여 군집화하여 각각 길이 및 각도 사전을 제작한 후, 사전기반 압축 기법을 적용하여 군집화에 의해 제작된 사전을 이용하여 각 개체를 개체가 속하는 군집의 대표 값을 가리키는 포인터로 바꾸어 줌으로써 벡터 데이터에 대하여 최소의 연산을 통해 구분하기 어려운 정도의 위치 정확도 손실 차이를 가지면서 향상된 압축률을 얻을 수 있게 되며, 이에 따라 모바일 기기에서 차지하는 저장 공간을 줄일 수 있게 된다.As described above, the vector data compression method and system using the K mean clustering according to the present invention divides the differential vector representing the relative position of each geometric object by length and angle, and clusters each by applying the K mean clustering technique. After the length and angle dictionaries are created, using the dictionary-based compression technique, each object is converted to a pointer to the representative value of the cluster to which the object belongs, using a dictionary produced by clustering to minimize the computation of vector data. Through this, it is possible to obtain an improved compression rate with a difference in location accuracy loss that is difficult to distinguish, thereby reducing the storage space occupied by the mobile device.

Claims

In a method for compressing vector data in a mobile environment,

Separating each geometric object into a sequence of starting point coordinates and differential vectors;

Separating the differential vector into length and angle;

Performing a K-average clustering on the separated lengths and angles to produce a dictionary having an average value of the cluster as an entry; And

Approximating each length and angle to convert to a pointer to a dictionary entry;

Vector data compression method using the K mean clustering, characterized in that comprises a.

The method of claim 1, wherein the differential vector is obtained using a difference between a corresponding coordinate of an individual coordinate of each geometric object and a previous coordinate.

The method of claim 1, wherein the starting point coordinates representing the poetic point of the geometric object are not modified.

In a system for compressing vector data in a mobile environment,

A differential vector extracting unit that separates each geometric object into a sequence of starting point coordinates and differential vectors from data about each geometric object input through the input unit;

The differential vectors extracted by the differential vector extracting unit are separated into lengths and angles, respectively, and a K mean clustering is applied to produce a dictionary for each of lengths and angles. A compression unit that converts and compresses a pointer to a value;

A storage unit having respective databases storing starting point coordinates from the differential vector extracting unit, a length dictionary, an angle dictionary, a length dictionary pointer, an angle dictionary pointer, etc. from the compression unit; And

A control unit controlling the respective units;

Vector data compression system using the K mean clustering, characterized in that comprises a.

The vector data compression system of claim 4, wherein the differential vector is obtained by using a difference between a corresponding coordinate of an individual coordinate of each geometric object and a previous coordinate.