KR101895895B1

KR101895895B1 - Data deduplication method and system

Info

Publication number: KR101895895B1
Application number: KR1020170027460A
Authority: KR
Inventors: 김원빈; 이임영
Original assignee: 순천향대학교 산학협력단
Priority date: 2017-03-03
Filing date: 2017-03-03
Publication date: 2018-09-07

Abstract

The present invention discloses a data deduplication method and a system therefor. A method for data deduplication in a data deduplication system including a user client, a metadata server, and a cloud storage server according to an aspect of the present invention comprises the steps of requesting, at a user client, a check for duplication of data to be uploaded to a metadata server; checking, at the metadata server, whether the requested data is duplicated, and transmitting the result to the user client; generating, at the user client, ownership verification data through an XOR operation according to the received result and transmitting the generated ownership verification data to the metadata server; performing, at the metadata server, ownership verification using the received ownership verification data, and transmitting the result to the user client; transmitting, at the user client, encrypted data to the cloud storage server when receiving the data upload permission as a result of the verification; and receiving, at the cloud storage server, the encrypted data from the user client and updating the data.

Description

{DATA DEDUPLICATION METHOD AND SYSTEM}

본 발명은 데이터 중복 제거 방법에 관한 것으로, 더욱 상세하게는 클라우드 스토리지 환경에서 암호화된 데이터의 중복을 제거하는 과정에 포함되는 소유권 검증을 XOR 연산을 통해 수행하여 중복 데이터를 제거하는 데이터 중복 제거 방법 및 시스템에 관한 것이다. The present invention relates to a data deduplication method, and more particularly, to a data deduplication method for performing duplication of encrypted data in a cloud storage environment by performing ownership verification through XOR operation, &Lt; / RTI >

최근 컴퓨터 네트워크 기술이 발달함에 따라 데이터 저장 환경이 물리적인 저장장치를 이용하는 대신 네트워크를 통한 클라우드 스토리지를 이용하는 방식으로 변화하고 있다.Recently, as computer network technology has developed, data storage environment has changed from using physical storage device to using cloud storage through network.

클라우드 스토리지는, 네트워크에 연결되어 있는 단말기라면 언제 어디에서든 접근이 가능하며, 데이터의 보관, 내려받기가 가능하다는 점에서 기존의 물리적인 저장매체를 일부 대체하고 있다. 이러한 클라우드 스토리지를 이용한 서비스는, 일반적으로 네트워크상에서 컨텐츠를 제공하는 업체에 의해 제공되는 형태를 가지며, 다수의 사용자에 의해 동시에 이용되고 있다. 이때, 다수의 사용자가 이용하는 클라우드 스토리지는 다양한 데이터가 저장되지만, 저장되는 데이터의 상당수는 기존에 이미 저장된 데이터가 반복 저장됨에 따라, 중복 저장되는 데이터의 비율이 증가하여 스토리지 가용 공간이 소모되는 낭비를 초래한다. 이에 따라, 클라우드 스토리지 공간의 확장을 위해 유지비용이 증가할 수 있으며, 중복되는 데이터의 양이 많아질수록 유지 비용은 더욱 증가하게 된다. Cloud storage is replacing some of the existing physical storage media in that it is accessible anywhere on the network and can be stored and downloaded. The service using the cloud storage is generally provided by a company that provides contents on a network, and is used simultaneously by a plurality of users. In this case, various data are stored in a cloud storage used by a plurality of users, but a large number of stored data is stored in the storage of already stored data, thereby increasing the ratio of redundantly stored data, . As a result, the maintenance cost may increase to expand the cloud storage space, and the higher the amount of redundant data, the more the maintenance cost increases.

이러한, 클라우드 스토리지의 저장 공간에 저장되는 데이터의 중복을 방지하기 위해 다양한 중복 제거 기술이 연구되었다. 다양한 중복 제거 기술들 중, 가장 기본적인 중복 제거 기술로는, 해시 알고리즘을 이용하여 데이터를 원본 파일 단위로 변화한 후, 스토리지에 동일한 데이터가 이미 저장되어 있는지 일대일로 비교하는 방식을 이용하였다. 이때, 상술한 중복 제거 기술은 데이터 비교 단위에 따라, 파일 단위 중복 제거와 블록 단위 중복 제거로 나누어질 수 있다.Various deduplication techniques have been studied to prevent duplication of data stored in the storage space of cloud storage. Among the various de-duplication techniques, the most basic de-duplication technique is to use a hash algorithm to convert data to original file units, and then to compare one-to-one with the same data already stored in the storage. At this time, the above-mentioned deduplication technique can be divided into file unit deduplication and block unit deduplication according to the data comparison unit.

먼저, 파일 단위 중복 제거 기술은, 데이터의 중복 단위를 파일 단위로 규정하여 데이터 원본의 해시값을 일대일로 비교하는 방식으로, 데이터의 비교 속도는 매우 빠르지만, 데이터의 1비트(bit)만 달라져도 서로 다른 데이터로 인식하여 중복제거가 이루어지지 않는 단점이 있다. 이러한 문제점을 보완하기 위해 블록 단위 중복 제거 기술이 개발되었다. 블록 단위 중복 제거 기술은, 데이터 원본을 일정 크기로 나눈 후, 해시 알고리즘을 통해 변환한 해시값을 이용하기 때문에 하나의 데이터 원본에서 여러 개의 데이터 블록이 생성되며, 생성된 데이터 블록을 이용하여 데이터 중복 여부를 판단하기 때문에 데이터 원본의 일부가 달라지면, 나머지 블록에 대해서만 중복제거가 이루어지기 때문에 중복 제거 효율이 높아진다. 하지만, 블록 단위로 제거가 이루어지기 때문에 데이터의 블록이 많아짐에 따라 처리 과정에서 다수의 연산이 발생하게 되는 문제점이 발생한다.First, in the file-based de-duplication technique, a hash value of a data source is compared on a one-to-one basis by defining a redundant unit of data in units of files. The comparison speed of data is very fast. However, It is recognized as different data and there is a problem that duplication is not removed. In order to overcome this problem, a block-based de-duplication technique has been developed. In the block-based de-duplication technique, a data source is divided into a predetermined size, and a hash value converted through a hash algorithm is used. Therefore, a plurality of data blocks are generated from one data source, If the part of the data source is different, the deduplication efficiency is improved because only the remaining blocks are deduplicated. However, since the block is removed in units of blocks, there arises a problem in that a large number of operations are generated in the processing as the number of data blocks increases.

한편, 클라우드 스토리지는 원격 서버의 한 종류로, 원격 서버는 내, 외부의 요인에 의해 데이터가 유출될 수 있는 가능성이 존재하기 때문에, 클라우드 스토리지에 저장되는 데이터는 유출되더라도, 데이터의 소유권이 없는 사용자는 데이터의 원본 내용을 알 수 없도록 암호화해야 한다. 하지만, 데이터 중복 제거 기술과 데이터 암호화 기술은 서로 상충되는 속성을 가지기 때문에 이러한 두 기술을 동시에 적용할 수 없다. 이러한 문제를 해결하기 위해 다양한 보안 기술이 적용되고 있으나, 적용 과정에서 많은 연산과 소유권 위조 공격과 같은 추가적인 보안 위협이 발생하는 문제점이 있다.On the other hand, cloud storage is a type of remote server, and remote servers have the possibility of data leakage due to internal or external factors. Therefore, even if data stored in the cloud storage is leaked, Should be encrypted so that the original contents of the data can not be known. However, these two technologies can not be applied at the same time because the data de-duplication technology and the data encryption technology have conflicting properties. Various security technologies have been applied to solve these problems, but there are problems in that additional security threats such as a large number of operations and counterfeit attacks are generated in the application process.

한국등록특허 제10-1590270호(2016.02.02 공고)Korean Registered Patent No. 10-1590270 (published on February 02, 2016)

본 발명은 상기와 같은 문제점을 해결하기 위해 제안된 것으로서, 데이터 중복 제거 과정에 포함되는 소유권 검증 방식을 개선하여 비교적 적은 양의 연산량을 가지는 암호화된 데이터의 중복을 제거하는 데이터 중복 제거 방법을 제공하는데 그 목적이 있다.Disclosure of Invention Technical Problem [8] Accordingly, the present invention has been made to solve the above problems, and it is an object of the present invention to provide a method for removing duplication of encrypted data having a relatively small amount of computation by improving the ownership verification method included in the deduplication process It has its purpose.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 일 실시 예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by one embodiment of the present invention. It will also be readily apparent that the objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.

상기와 같은 목적을 달성하기 위한 본 발명의 일 측면에 따른 사용자 클라이언트, 메타데이터 서버 및 클라우드 스토리지 서버를 포함하는 데이터 중복 제거 시스템에서의 데이터 중복 제거 방법은, 상기 사용자 클라이언트가, 상기 메타데이터 서버로 업로드할 데이터의 중복을 확인 요청하는 단계; 상기 메타데이터 서버가, 상기 요청된 데이터의 중복 여부를 확인하고, 그 결과를 상기 사용자 클라이언트로 전송하는 단계; 상기 사용자 클라이언트가, 상기 수신한 결과에 따라 XOR 연산을 통해 소유권 검증 데이터를 생성하여 상기 메타데이터 서버로 전송하는 단계; 상기 메타데이터 서버가, 상기 수신한 소유권 검증 데이터를 이용하여 소유권 검증을 수행하고, 그 결과를 상기 사용자 클라이언트로 전송하는 단계; 상기 사용자 클라이언트가, 상기 검증 결과로 데이터의 업로드 허가를 수신한 경우, 상기 클라우드 스토리지 서버로 암호화된 데이터를 전송하는 단계; 및 상기 클라우드 스토리지 서버가, 상기 사용자 클라이언트로부터 암호화된 데이터를 수신하여 데이터를 갱신하는 단계;를 포함한다. According to an aspect of the present invention, there is provided a method for removing data de-duplication in a data de-duplication system including a user client, a metadata server, and a cloud storage server, Requesting confirmation of duplication of data to be uploaded; Checking whether the metadata requested by the metadata server is duplicated, and transmitting the result to the user client; Generating the ownership verification data through the XOR operation according to the received result and transmitting the generated ownership verification data to the metadata server; Performing the ownership verification using the received ownership verification data and transmitting the result to the user client; Transmitting encrypted data to the cloud storage server when the user client receives an upload permission of data as a result of the verification; And the cloud storage server receiving the encrypted data from the user client and updating the data.

상기 데이터의 중복을 확인 요청하는 단계는, 상기 사용자 클라이언트가, 상기 메타데이터 서버로 자신의 식별자 정보를 전송하는 단계; 상기 메타데이터 서버가, 상기 수신한 사용자 클라이언트의 식별자 정보를 확인하여 사용자 클라이언트를 확인하고, 상기 사용자 클라이언트로 세션키를 발급하여 전송하는 단계; 및 상기 사용자 클라이언트가, 상기 세션키를 수신하고, 업로드 하고자 하는 데이터를 블록으로 나누어 CE(Convergent Encryption) 기술을 이용하여 암호화 키와 암호 데이터를 생성하며, 상기 생성된 암호화 키와 암호 데이터를 해시화하고 XOR 연산을 하여 데이터 식별자를 생성하여 상기 메타데이터 서버로 전송하는 단계;를 포함할 수 있다. The step of requesting confirmation of the duplication of data may include: transmitting, by the user client, its identifier information to the metadata server; Confirming a user client by checking the identifier information of the received user client, issuing and transmitting a session key to the user client; And the user client receives the session key, divides the data to be uploaded into blocks, and generates an encryption key and encryption data using a CE (Convergent Encryption) technology. The generated encryption key and encryption data are hashed And performing an XOR operation to generate a data identifier and transmitting the data identifier to the metadata server.

상기 요청된 데이터의 중복 여부를 확인하고, 그 결과를 상기 사용자 클라이언트로 전송하는 단계는, 상기 메타데이터 서버가, 상기 수신한 데이터 식별자를 미리 저장된 데이터 식별자와 비교하여 저장 여부를 판단하는 단계; 및 상기 메타데이터 서버가, 상기 판단 결과 저장되지 않은 데이터 식별자만 재구성하여 데이터 목록을 구성하고, 상기 재구성된 데이터 목록과 상기 데이터 목록에 포함되지 않은 식별자에 연결된 소유권 검증 데이터 생성을 위한 데이터를 상기 사용자 클라이언트로 전송하는 단계;를 포함할 수 있다. Checking whether the requested data is duplicated, and transmitting the result to the user client, the metadata server comparing the received data identifier with a previously stored data identifier and determining whether to store the data identifier; And the metadata server configures a data list by reconstructing only a data identifier that is not stored as a result of the determination, and transmits data for generating ownership verification data, which is connected to the reconfigured data list and an identifier not included in the data list, To the client.

상기 수신한 결과에 따라 XOR 연산을 통해 소유권 검증 데이터를 생성하여 상기 메타데이터 서버로 전송하는 단계는, 상기 사용자 클라이언트가, 상기 메타데이터 서버로부터 수신한 재구성된 데이터 목록과 소유권 검증 데이터 생성을 위한 데이터를 자신이 소유한 데이터 원본과 XOR 연산하여 비밀값을 추출하는 단계; 및 상기 사용자 클라이언트가, 상기 추출된 비밀값과 보유한 데이터를 XOR 연산하여 소유권 검증 데이터를 생성하고, 상기 생성한 소유권 검증 데이터를 메타데이터 서버로 전송하는 단계;를 포함할 수 있다. Generating the ownership verification data through the XOR operation according to the received result and transmitting the generated ownership verification data to the metadata server may include transmitting the reconstructed data list received from the metadata server and the data for generating ownership verification data XORing with a data source owned by the user to extract a secret value; And XORing the extracted secret value and the held data by the user client to generate ownership verification data, and transmitting the generated ownership verification data to the metadata server.

상기 수신한 소유권 검증 데이터를 이용하여 소유권 검증을 수행하고, 그 결과를 상기 사용자 클라이언트로 전송하는 단계는, 상기 메타데이터 서버가, 상기 소유권 검증 데이터를 XOR 연산하여 소유권 검증을 수행하고, 소유권 검증이 완료되면, 상기 사용자 클라이언트로 저장되어 있지 않은 데이터의 업로드를 허가하는 데이터의 소유권을 발급하여 전송할 수 있다. Wherein the step of performing the ownership verification using the received ownership verification data and transmitting the result to the user client includes the steps of: XORing the ownership verification data by the metadata server to perform ownership verification; Upon completion, the user client can issue and transmit ownership of data for allowing upload of data not stored in the user client.

상기 검증 결과로 데이터의 업로드 허가를 수신한 경우, 상기 클라우드 스토리지 서버로 암호화된 데이터를 전송하는 단계는, 상기 사용자 클라이언트가, 상기 메타데이터 서버로부터 데이터의 소유권을 수신하는 경우, 암호화된 데이터와 자신이 생성한 비밀값을 이용하여 생성한 검증 데이터를 클라우드 스토리지 서버로 전송할 수 있다. And transmitting the encrypted data to the cloud storage server when receiving the upload permission of the data as a result of the verification, when the user client receives the ownership of the data from the metadata server, And the verification data generated using the generated secret value can be transmitted to the cloud storage server.

상기 사용자 클라이언트로부터 암호화된 데이터를 수신하여 데이터를 갱신하는 단계는, 상기 클라우드 스토리지 서버가, 상기 사용자 클라이언트로부터 수신한 암호화된 데이터를 저장된 목록에 갱신하여 저장하고, 암호화된 데이터를 해시화와 XOR 연산을 통해 소유권 검증 데이터 생성을 위한 데이터를 생성하며, 상기 메타데이터 서버로 상기 생성된 소유권 검증 데이터 생성을 위한 데이터를 전송하여, 갱신된 데이터 목록을 상기 메타데이터 서버와 동기화할 수 있다. The step of receiving encrypted data from the user client and updating the data comprises: updating and storing the encrypted data received from the user client in the cloud storage server, storing the encrypted data in the stored list, And transmits the generated data for generating the ownership verification data to the metadata server to synchronize the updated data list with the metadata server.

상기와 같은 목적을 달성하기 위한 본 발명의 다른 측면에 따른 데이터 중복 제거 시스템은, 메타데이터 서버로 업로드할 데이터의 중복을 확인 요청하고, 상기 메타데이터 서버로부터 수신한 확인 결과에 따라 XOR 연산을 통해 소유권 검증 데이터를 생성하여 메타데이터 서버로 전송하며, 상기 메타데이터 서버로부터 소유권 검증이 수행된 결과를 수신하되, 검증 결과로 데이터의 업로드 허가를 수힌한 경우, 클라우드 스토리지 서버로 암호화된 데이터를 전송하는 사용자 클라이언트; 상기 사용자 클라이언트로부터 데이터의 중복 여부를 확인하는 요청을 수신하고, 그 결과를 상기 사용자 클라이언트로 전송하며, 상기 사용자 클라이언트로부터 소유권 검증 데이터를 수신하면, 상기 소유권 검증 데이터를 이용해 소유권 검증을 수행하고, 그 결과를 상기 사용자 클라이언트로 전송하는 메타데이터 서버; 및 상기 사용자 클라이언트로부터 암호화된 데이터를 수신하고 저장 목록에서 데이터를 갱신하는 클라우드 스토리지 서버;를 포함한다. According to another aspect of the present invention, there is provided a data deduplication system for requesting confirmation of duplication of data to be uploaded to a metadata server and performing an XOR operation according to an acknowledgment result received from the metadata server And transmits the encrypted data to the metadata server when receiving the result of the ownership verification performed by the metadata server, and when the data is permitted to be uploaded as a result of the verification, the encrypted data is transmitted to the cloud storage server User client; Receiving a request for confirming duplication of data from the user client, transmitting the result to the user client, and receiving ownership verification data from the user client, performing ownership verification using the ownership verification data, A metadata server for transmitting a result to the user client; And a cloud storage server for receiving the encrypted data from the user client and updating the data in the storage list.

본 발명의 일 측면에 따르면, 메타데이터 서버와 클라우드 스토리지 서버를 구분하여 메타데이터 서버에서 데이터 중복 여부 판단과 소유권 검증을 수행하고 클라우드 스토리지 서버에는 중복되지 않은 데이터만을 전송하는 과정에서, 데이터 중복 제거 과정에서 발생하는 소유권 검증을 XOR 기반으로 수행하여 중복되는 데이터를 제거함에 따라, 해시 트리(머클 트리) 방식을 이용하는 기존의 방법에 비해 다수의 해시 연산을 감소시킬 수 있으며, 시스템에서 발생하는 전체 연산량을 감소시킬 수 있는 효과가 있다. According to an aspect of the present invention, in a process of determining whether data is duplicated and verifying ownership in a metadata server by separating a metadata server and a cloud storage server and transmitting only data that is not duplicated to a cloud storage server, , It is possible to reduce the number of hash operations compared to the conventional method using the hash tree (merge tree) method, and it is possible to reduce the total amount of computation generated in the system There is an effect that can be reduced.

또한, 암호화된 데이터를 제공하여 데이터의 기밀성을 보장할 수 있으며, 클라우드 스토리지 서버에 저장되는 데이터의 중복을 방지함으로써, 저장 공간의 저장 효율을 향상시킬 수 있다. Also, it is possible to guarantee the confidentiality of data by providing encrypted data and to prevent duplication of data stored in the cloud storage server, thereby improving storage efficiency of the storage space.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtained in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description .

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시 예를 예시하는 것이며, 발명을 실시하기 위한 구체적인 내용들과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일 실시 예에 따른 데이터 중복 제거 시스템의 개략적인 구성도,
도 2 내지 도 5는 본 발명의 일 실시 예에 따른 데이터 중복 제거 시스템의 동작 시나리오를 도시한 도면,
도 6은 본 발명의 일 실시 예에 따른 Convergent Encryption(CE) 암호화 방식을 이용한 암호화 방법을 도시한 도면,
도 7은 본 발병의 일 실시 예에 따른 데이터 중복 제거 방법의 흐름을 도시한 도면이다. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments of the invention and, together with the specific details for carrying out the invention, And shall not be construed as limited to the matters described.
1 is a schematic configuration diagram of a data de-duplication system according to an embodiment of the present invention;
FIGS. 2 to 5 illustrate operation scenarios of a data de-duplication system according to an embodiment of the present invention;
6 illustrates an encryption method using a Convergent Encryption (CE) encryption method according to an embodiment of the present invention;
7 is a flowchart illustrating a data de-duplication method according to an embodiment of the present invention.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시 예를 상세히 설명하기로 한다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, in which: There will be. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 “포함”한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 “…부” 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when an element is referred to as " comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. In addition, the term "Quot; and " part " refer to a unit that processes at least one function or operation, which may be implemented in hardware, software, or a combination of hardware and software.

본 실시 예에 따른 데이터 중복 제거 시스템을 설명함에 앞서서, 종래 기술에서의 중복 제거 시스템의 동작 과정을 살펴보기로 한다. Before describing the data deduplication system according to the present embodiment, the operation process of the deduplication system in the prior art will be described.

사용자 클라이언트는 업로드할 데이터의 식별자를 클라우드 스토리지 서버로 전송하며, 클라우드 스토리지 서버는 전송받은 식별자를 이용하여 데이터의 저장 여부를 판단한 뒤, 결과를 반환한다. 이때, 이미 클라우드 스토리지 서버에 저장된 데이터로 확인될 경우, 클라우드 스토리지 서버는 사용자 클라이언트에게 해당 데이터의 소유권을 부여하게 된다. 반면, 클라우드 스토리지 서버에 저장되어 있지 않은 데이터일 경우, 사용자 클라이언트는 해당 데이터의 암호 데이터를 업로드한다. 이 과정에서 데이터의 원본을 소유하지 않은 공격자가 이미 클라우드 스토리지 서버에 저장되어 있는 데이터의 식별자를 위조할 수 있을 경우, 데이터의 원본 없이도 해당 데이터의 소유권을 취득할 수 있는 문제가 존재한다. 이러한 문제점을 해결하기 위해, 클라우드 스토리지 서버는 사용자 클라이언트가 업로드 요청한 데이터의 원본을 소유하였는지 판단하는 과정을 수행해야 한다. 또한, 이러한 과정에서 클라우드 스토리지 서버는 사용자 클라이언트가 소유한 데이터 원본을 획득하지 않고도 사용자 클라이언트의 데이터 원본 소유 여부를 알 수 있어야 한다. 이와 같은 일련의 과정을 소유권 검증 과정이라고 한다. The user client transmits the identifier of the data to be uploaded to the cloud storage server. The cloud storage server uses the received identifier to determine whether to store the data, and then returns the result. At this time, when the data is already stored in the cloud storage server, the cloud storage server assigns ownership of the data to the user client. On the other hand, when the data is not stored in the cloud storage server, the user client uploads the password data of the data. In this process, if an attacker who does not own the original data can falsify the identifier of the data already stored in the cloud storage server, there is a problem that the ownership of the data can be acquired without the source of the data. In order to solve this problem, the cloud storage server has to perform a process of determining whether the user client has the original of the data requested to be uploaded. In addition, in this process, the cloud storage server must be able to know whether the user client has a data source without acquiring the data source owned by the user client. This process is called the process of ownership verification.

소유권 검증 과정은 데이터의 원본을 노출하지 않고도 데이터의 소유권을 판단할 수 있는 방법이다. 이를 위해, 데이터 소유자가 데이터의 일부를 이용하여 생성한 검증 정보를 획득할 경우, 검증 정보를 이용해 데이터의 진위여부를 판단할 수 있는 방법이 필요하다. 이를 위해, 일반적인 소유권 검증 과정에서는 머클 트리(Merkle Tree, 해시 트리라고도 함)라는 자료 구조를 이용한다. 머클 트리는 데이터 원본을 여러 조각으로 나눈 복수 개의 데이터인 리프 노드(Leaf node)를 각각 해시화 한 뒤, 이를 2개씩 연접하고 다시 해시화하는 방법으로 최종 1개의 데이터가 생성될 때까지 이를 반복한다. 따라서, 이렇게 생성된 최종 1개의 노드를 루트 노드(root node)라고 하며, 루트 노드(root node)에 이용된 데이터를 해당 데이터의 식별자로 취급한다. 이 식별자는 데이터의 저장 여부를 판단하는데 사용된다. 클라우드 스토리지 서버는 이 식별자를 이용하여 데이터 저장여부를 판단하고 만약, 이미 데이터가 저장되어 있을 경우, 해당 데이터의 소유 여부 검증인 소유권 검증을 사용자와 수행한다. 클라우드 스토리지 서버는 사용자에게 무작위 리프 노드(Leaf node)를 요청한다. 사용자 클라이언트는 클라우드 스토리지 서버가 요청한 리프 노드(Leaf node)와 리프 노드(Leaf node)부터 루트 노드(root node)까지의 경로 중 형제노드의 집합인 시블링 패스(Sibling Path)를 클라우드 스토리지 서버로 전송한다. 이후, 클라우드 스토리지 서버는 전송받은 리프 노드(Leaf node)와 시블링 패스(Sibling Path)를 이용하여 루트' 노드(root' node)를 생성할 수 있다. 만약, 루트 노드(root node)와 루트' 노드(root' node)가 일치할 경우, 클라우드 스토리지 서버는 사용자가 데이터의 원본을 소유한 것으로 판단하여 소유권 검증 과정이 종료된다. 한편, 머클 트리(Merkle Tree)를 이용한 소유권 검증 과정에서 다수의 연산이 발생한다. 소유권 검증 방법은 업로드할 데이터마다 머클 트리(Merkle Tree)를 생성한다. 이때, 각 머클 트리(Merkle Tree)의 생성은 여러 번의 해시 연산을 발생시키며, 업로드 할 데이터의 수가 많아질수록 해시 연산의 수는 급격히 증가한다. 예컨대, 1GB의 데이터를 32KB 크기의 단위로 중복제거를 수행하면, 각 블록 당 8조각의 리프 노드(Leaf node)를 생성할 경우 29,150번의 해시 연산이 발생한다. 이처럼, 해시 연산의 횟수가 증가할수록 다수의 처리 시간과 자원이 소모되는 문제가 발생한다. 따라서 이러한 문제점을 해결하기 위한 데이터 중복 제거 시스템에서의 데이터 중복 제거 방법을 도 1 내지 도 7을 통해 설명하기로 한다. The ownership verification process is a way to determine ownership of data without exposing the original data. For this purpose, when the data owner acquires the verification information generated by using a part of the data, a method of determining the authenticity of the data by using the verification information is needed. To this end, the common ownership verification process uses a data structure called Merkle Tree (also known as a hash tree). A merge tree is a method of hashing each leaf node, which is a plurality of data, into a plurality of pieces of data, splicing the data into two pieces, and then re-hashing the pieces of data until the last one piece of data is generated. Thus, the last node thus generated is referred to as a root node, and the data used in the root node is treated as an identifier of the corresponding data. This identifier is used to determine whether to store data. The cloud storage server determines whether the data is stored using the identifier. If the data is already stored, the cloud storage server performs the ownership verification with the user to verify whether the data is owned or not. The cloud storage server requests the user a random leaf node. The user client sends a sibling path, which is a set of sibling nodes, from the leaf node and the leaf node requested from the cloud storage server to the root node to the cloud storage server do. Then, the cloud storage server can create a root 'node' using the leaf node and the sibling path. If the root node matches the root 'node', the cloud storage server determines that the user owns the original data and ends the ownership verification process. On the other hand, a number of operations occur during the ownership verification process using Merkle Tree. The ownership verification method creates a Merkle Tree for each data to be uploaded. At this time, the generation of each merge tree generates several hash operations, and the number of hash operations increases sharply as the number of data to be uploaded increases. For example, if 1 GB of data is deduplicated in units of 32 KB, 29,150 hash operations are generated when 8 leaf nodes are generated for each block. As the number of hash operations increases, a large amount of processing time and resources are consumed. Therefore, a method of deduplicating data in a data deduplication system for solving such a problem will be described with reference to FIG. 1 through FIG.

도 1은 본 발명의 일 실시 예에 따른 데이터 중복 제거 시스템의 개략적인 구성도이다.1 is a schematic block diagram of a data deduplication system according to an embodiment of the present invention.

도 1을 참조하면, 본 실시 예에 따른 데이터 중복 제거 시스템은 사용자 클라이언트(110), 메타데이터 서버(130) 및 클라우드 스토리지 서버(150)를 포함한다. Referring to FIG. 1, a data deduplication system according to an embodiment of the present invention includes a user client 110, a metadata server 130, and a cloud storage server 150.

사용자 클라이언트(110)는 메타데이터 서버(130) 및 클라우드 스토리지 서버(150)와 네트워크에 의해 연결되어 통신을 수행한다. 사용자 클라이언트(110)는 메타데이터 서버(130)와 통신하여 암호화된 데이터의 비교와 중복 제거 확인 요청 과정을 수행할 수 있다. 사용자 클라이언트(110)는 클라우드 스토리지 서버(150)와 통신하여 중복 제거된 데이터를 업로드할 수 있다. 이때, 업로드되는 데이터는 암호화된 데이터일 수 있다. 사용자 클라이언트(110)는 사용자가 이용하는 단말기일 수 있으며, PC(personal computer), 태블릿 PC, 노트북(notebook), 넷-북(net-book), e-리더(e-reader), PDA(personal digital assistant), PMP(portable multimedia player), MP3 플레이어, 또는 MP4 플레이어와 같은 데이터 처리 장치로 구현되거나, 모바일폰(mobile phone), 스마트폰(smart phone) 등과 같은 핸드헬드 장치(handheld device)로 구현될 수 있다. The user client 110 is connected to the metadata server 130 and the cloud storage server 150 via a network to perform communication. The user client 110 may communicate with the metadata server 130 to compare the encrypted data and perform a duplicate removal confirmation request process. The user client 110 may communicate with the cloud storage server 150 to upload the deduplicated data. At this time, the uploaded data may be encrypted data. The user client 110 may be a terminal used by a user and may be a personal computer, a tablet PC, a notebook, a net-book, an e-reader, a personal digital assistant an assistant, a portable multimedia player (PMP), an MP3 player, an MP4 player, or a handheld device such as a mobile phone, a smart phone, .

메타데이터 서버(130)는 사용자 클라이언트(110)로부터 암호화된 데이터에 대한 중복 제거 요청을 수신하고, 그 결과를 사용자 클라이언트(110)로 전송하며, 사용자 클라이언트(110)로부터 소유권 검증 데이터를 수신하면, 소유권 검증 데이터를 이용해 소유권 증명을 수행하고, 그 결과를 사용자 클라이언트(110)로 전송할 수 있다. The metadata server 130 receives a deduplication request for the encrypted data from the user client 110 and transmits the result to the user client 110. Upon receiving the ownership verification data from the user client 110, It can perform the ownership verification using the ownership verification data and transmit the result to the user client 110. [

클라우드 스토리지 서버(150)는 사용자 클라이언트(110)로부터 암호화된 데이터를 수신하고 저장 목록에서 데이터를 갱신할 수 있다. The cloud storage server 150 may receive the encrypted data from the user client 110 and update the data in the storage list.

한편, 본 실시 예를 설명함에 있어서, 사용자 클라이언트(110), 메타데이터 서버(130) 및 클라우드 스토리지 서버(150)의 식별자(ID) 및 공개키는 사전에 분배되어 있을 수 있다. 클라우드 스토리지 서버(150)는 사용자 클라이언트(110)로부터 수신한 암호화된 데이터에 대한 데이터 식별자를 생성하여 저장할 수 있으며, 암호화된 데이터에 대한 데이터 식별자를 메타데이터 서버(130)와 동기화할 수 있다. The ID and the public key of the user client 110, the metadata server 130, and the cloud storage server 150 may be distributed in advance. The cloud storage server 150 may generate and store a data identifier for the encrypted data received from the user client 110 and may synchronize the data identifier of the encrypted data with the metadata server 130.

이하, 상술한 데이터 중복 제거 시스템에서의 데이터 중복 제거 방법에 대해 설명하기로 한다. Hereinafter, a data de-duplication method in the data de-duplication system will be described.

도 2 내지 도 5는 본 발명의 일 실시 예에 따른 데이터 중복 제거 시스템의 동작 시나리오를 도시한 도면, 도 6은 본 발명의 일 실시 예에 따른 Convergent Encryption(CE) 암호화 방식을 이용한 암호화 방법을 도시한 도면이다.FIGS. 2 to 5 illustrate operation scenarios of a data de-duplication system according to an embodiment of the present invention. FIG. 6 illustrates an encryption method using a Convergent Encryption (CE) encryption method according to an embodiment of the present invention. Fig.

본 실시 예에 따르면, 클라우드 스토리지 환경에서 경량화된 소유권 검증을 수행하는 암호 데이터 중복 제거 시스템은, 개략적으로 데이터를 암호화 및 중복 제거하고, 암호화된 데이터에 대한 소유권을 검증하며, 중복 제거된 암호 데이터를 업로드하고, 저장된 데이터의 목록을 갱신하는 과정을 수행하여 클라우드 스토리지 서버(150)에 중복 제거된 데이터를 저장할 수 있다. 이때, 본 실시 예를 설명함에 있어서, 사용되는 기호들에 대해 정의하면 다음과 같다. According to the present embodiment, a password data deduplication system that performs lightweighted ownership verification in a cloud storage environment can be roughly implemented by encrypting and deduplicating data, verifying ownership of encrypted data, Uploading and updating the list of stored data to store the deduplicated data in the cloud storage server 150. [ In describing the present embodiment, the symbols to be used are defined as follows.

<기호 정의><Symbol definition>

·

: 참여 객체(

: 사용자,

: 메타데이터 서버(130),

: 스토리지 서버)·

: Participating object (

: user,

: The metadata server 130,

: Storage server)

·

: 원본 파일·

: Original file

·

: 중복되지 않은

의 수·

: Not duplicated

Number of

·

: 중복된

의 수·

: Duplicate

Number of

·

:

로부터 생성된 모든

의 수·

:

All

Number of

·

: CE(Convergent Encryption)로 생성된

의 암호화 키·

: Generated by Convergent Encryption (CE)

Encryption key

·

:

로 암호화된

·

:

Encrypted with

·

:

의 집합·

:

Set of

·

: 중복제거 된

의 집합·

: Duplicate removed

Set of

·

: 사용자

의 식별자·

: user

Identifier

·

: 메타데이터 서버(130)와 사용자

간의 세션키·

: The metadata server 130 and the user

Session key between

·

: *의 공개키·

: * Public key

·

: *의 개인키·

: * Private key

·

:

의 노드 수·

:

Number of nodes in

·

: 암호학적 해시 함수를 이용한 해시화·

: Hashing with cryptographic hash function

·

:

로 부터 생성된

·

:

&Lt; / RTI >

·

:

로 부터 생성된

·

:

&Lt; / RTI >

·

:

를 최초 업로드 한 사용자가 생성한

·

:

Created by the first uploading user

·

:

를 최초 업로드 한 사용자가 생성한 비밀값·

:

The secret value generated by the first uploading user

·

:

의 식별자·

:

Identifier

·

: 최초 업로드 한 사용자가 생성한 비밀값·

: The secret value generated by the first uploaded user

이하, 본 실시 예에 따른 암호 데이터 중복 제거 시스템에서의 암호 데이터 중복 제거 방법에 대해 상세히 설명하기로 한다. Hereinafter, an encryption data duplication elimination method in the encryption data duplication elimination system according to the present embodiment will be described in detail.

먼저, 도 2를 참조하여 사용자 클라이언트(110)가 데이터를 암호화 및 중복 제거 요청하는 과정을 설명하기로 한다.First, referring to FIG. 2, a process of the user client 110 requesting data encryption and duplication removal will be described.

도 2는 사용자 클라이언트(110)가 메타데이터 서버(130)로 자신의 신분을 밝히고, 세션키를 발급받는 과정 및 클라우드 스토리지 서버(150)로 업로드할 데이터의 중복 여부 확인 요청 과정을 나타낸 것이다. 암호 데이터 중복 제거 시스템에서는, 데이터의 기밀성이 요구되기 때문에 전송되는 데이터를 암호화할 필요가 있다. 따라서, 메타데이터 서버(130)와 사용자 클라이언트(110) 간의 세션키를 분배할 필요가 있다.FIG. 2 illustrates a process in which the user client 110 identifies itself to the metadata server 130, issues a session key, and confirms whether the data to be uploaded to the cloud storage server 150 is duplicated. In the encrypted data deduplication system, since the confidentiality of data is required, it is necessary to encrypt the transmitted data. Therefore, it is necessary to distribute the session key between the metadata server 130 and the user client 110.

먼저, 사용자 클라이언트(110)는 메타데이터 서버(130)로 자신의 식별자(UID)를 전송하여 신원을 알린다(210). 메타데이터 서버(130)는 사용자 클라이언트(110)의 신원을 확인한 뒤 사용자 클라이언트(110)의 공개키로 암호화된 세션키를 사용자 클라이언트(110)에게 분배한다(220). 사용자 클라이언트(110)는 분배된 세션키로 업로드할 파일의 데이터 블록(

)을 해시화하여

를 생성하고,

를 재차 해시화하여

를 생성하고,

를 암호화하여

를 생성하고,

를 해시화하여

를 각각 생성하고,

를 계산한 뒤, 세션키로 암호화하여 메타데이터 서버(130)로 전송한다(230, 240). 메타데이터 서버(130)는 전송받은 데이터를 복호화하여

를 얻은 후, 메타데이터 서버(130)에 보관된 데이터 식별자 목록과 비교하여 각 식별자들의 저장 여부를 판단한다. 한편,

은 CE(Convergent Encryption) 암호화 방식을 이용하여 생성될 수 있다. 도 6을 참조하면, CE(Convergent Encryption) 암호화 방식은 원본 파일(f)의 데이터 블록(

)을 해시 알고리즘을 통해 해시화한 값(

)을 대칭키 암호화 키로 사용하는 방식일 수 있으며,

는 아래의 수학식 1을 통해 생성될 수 있다. First, the user client 110 transmits its identifier (UID) to the metadata server 130 to inform the identity (210). After confirming the identity of the user client 110, the metadata server 130 distributes the session key encrypted with the public key of the user client 110 to the user client 110 (220). The user client 110 sends a data block ("

) Is hashized

Lt; / RTI >

Is again re-hashed

Lt; / RTI >

By encrypting

Lt; / RTI >

Hash

Respectively,

And transmits the encrypted data to the metadata server 130 (230, 240). The metadata server 130 decodes the received data

And compares it with the data identifier list stored in the metadata server 130 to determine whether to store the respective identifiers. Meanwhile,

May be generated using a Convergent Encryption (CE) encryption scheme. Referring to FIG. 6, the CE (Convergent Encryption) encryption method encrypts data blocks of the original file f

) Hash value obtained by the hash algorithm (

) As a symmetric key encryption key,

Can be generated by the following equation (1).

다음으로, 도 3을 참조하여 소유권 검증을 수행하는 과정을 설명하기로 한다.Next, a process of performing ownership verification will be described with reference to FIG.

도 3은, 메타데이터 서버(130)가 중복 확인 결과를 사용자 클라이언트(110)로 전송하고, 사용자 클라이언트(110)는 수신한 결과에 따라 소유권 검증 데이터를 생성하여 메타데이터 서버(130)와 소유권 검증을 수행하는 과정을 나타낸 것이다. 본 실시 예에 따른 암호 데이터 중복 제거 시스템에서는 사용자 클라이언트(110)가 업로드 하고자 하는 데이터의 식별자를 이용하여 데이터 중복 확인을 하고, 중복된 데이터에 대해서는 사용자 클라이언트(110)가 실제로 해당 데이터의 원본을 소유하고 있는지 판단하는 과정을 수행해야 할 필요가 있다.3, the metadata server 130 transmits a result of duplicate checking to the user client 110, and the user client 110 generates ownership verification data according to the received result, . In the encryption data deduplication system according to the present embodiment, the user client 110 checks the data duplication using the identifier of the data to be uploaded, and the user client 110 actually checks the duplication of the original data It is necessary to carry out a process of judging whether or not the user is doing.

이에 따라, 메타데이터 서버(130)는 사용자가 전송한 식별자의 목록 중 저장되어 있지 않은 식별자를 목록화하여

를 생성한다(310). 또한,

에 포함되지 않은

와 연결되어 있는

를 사용자 클라이언트(110)로 전송한다(320). 사용자 클라이언트(110)는 전송 받은

와 자신이 보유한 데이터 원본(

)을 이용하여 소유권 검증 데이터 생성에 이용되는

를 생성한다(330).

의 생성 과정은 아래의 수학식 2와 같다.Accordingly, the metadata server 130 catalogs identifiers that are not stored in the list of identifiers transmitted by the user

(310). Also,

Not included in

Connected with

To the user client 110 (320). The user client 110 sends

And your own data source (

) To be used for generating ownership verification data

(330).

Is expressed by the following equation (2).

사용자 클라이언트(110)는 자신이 보유한

,

와 자신이 생성한

,

를 계산하여 소유권 검증 데이터(

)를 생성한다. 사용자 클라이언트(110)는 생성된 소유권 검증 데이터를 메타데이터 서버(130)로 전송한다(340). 메타데이터 서버(130)는 소유권 검증 데이터와 클라우드 스토리지 서버(150)에 저장된 데이터를 각각 연산하여 소유권 검증을 수행한다(350). 이때, 소유권 검증 과정은 아래의 수학식 3과 같다.The user client (110)

,

And yourself

,

To calculate ownership verification data (

). The user client 110 transmits the generated ownership verification data to the metadata server 130 (340). The metadata server 130 performs ownership verification by calculating the ownership verification data and the data stored in the cloud storage server 150 (350). At this time, the ownership verification process is expressed by Equation (3) below.

만약, 수학식 3의 등식이 성립하는 경우, 메타데이터 서버(130)는 사용자 클라이언트(110)가 원본데이터를 소유하고 있는 것으로 판단하고, 사용자 클라이언트(110)에게 해당 데이터의 소유권을 발급할 수 있다(360). 이후, 메타데이터 서버(130)는 클라우드 스토리지 서버(150)에 저장되지 않은 데이터에 대해 업로드를 허가할 수 있다.If Equation (3) is satisfied, the metadata server 130 determines that the user client 110 owns the original data, and issues the ownership of the data to the user client 110 (360). Then, the metadata server 130 may permit uploading of data not stored in the cloud storage server 150. [

다음으로, 도 4를 참조하여 사용자 클라이언트(110)가 중복 제거된 암호 데이터를 업로드하고, 소유권 검증 데이터 생성을 위한 데이터를 생성하는 과정을 설명하기로 한다.Next, referring to FIG. 4, a description will be made of a process in which the user client 110 uploads the duplicated cipher data and generates data for generating ownership verification data.

도 4는, 사용자 클라이언트(110)가 클라우드 스토리지 서버(150)로 암호화된 데이터 블록과 소유권 검증 데이터 생성용 데이터를 생성 및 업로드하고, 클라우드 스토리지 서버(150)는 이를 저장한 후 소유권 검증 데이터를 생성한 뒤, 메타데이터 서버(130)로 전송을 통해 클라우드 스토리지 서버(150)에 저장된 데이터이의 갱신을 수행할 수 있다.4 is a flowchart illustrating a method of generating and uploading data for encrypting data blocks and data for generating ownership verification data by the user client 110 to the cloud storage server 150. The cloud storage server 150 stores the data blocks and generates ownership verification data The data stored in the cloud storage server 150 can be updated through transmission to the metadata server 130. [

상술한 도 3의 소유권 검증 과정이 완료된 후, 사용자 클라이언트(110)는 클라우드 스토리지 서버(150)에 저장되지 않은 데이터를 업로드하는 과정을 수행한다. 이 과정을 통해, 사용자 클라이언트(110)는 업로드 하는 데이터에 대해 최초 업로드 사용자 클라이언트(110)가 되며, 이후 업로드 하는 사용자 클라이언트(110)의 소유권 검증을 수행하기 위한 소유권 검증 데이터 생성용 데이터를 생성하여 클라우드 스토리지 서버(150)로 전송하는 과정을 수행해야 할 필요가 있다. 따라서, 사용자 클라이언트(110)는 업로드할 데이터와 소유권 검증 데이터 생성용 데이터를 생성하여 클라우드 스토리지 서버(150)로 전송하고, 클라우드 스토리지 서버(150)는 이를 저장 및 변조하는 과정을 갖는다.After the ownership verification process of FIG. 3 is completed, the user client 110 performs a process of uploading data that is not stored in the cloud storage server 150. Through this process, the user client 110 becomes the first uploading user client 110 with respect to the data to be uploaded, and then generates data for generating ownership verification data for performing the ownership verification of the uploading user client 110 It is necessary to perform a process of transmitting to the cloud storage server 150. [ Accordingly, the user client 110 generates data to be uploaded and data for generating ownership verification data to be transmitted to the cloud storage server 150, and the cloud storage server 150 has a process of storing and modulating the data.

먼저, 사용자 클라이언트(110)는 비밀값(

)을 생성하고,

와

를 이용하여 소유권 검증 데이터 생성용 데이터(

)를 생성한다(410). 이때,

는 아래의 수학식 4를 통해 생성될 수 있다.First, the user client 110 sends a secret value (

),

Wow

Data for generating ownership verification data (

(410). At this time,

Can be generated by the following equation (4).

사용자 클라이언트(110)는 생성한

와

를 계산하여

를 생성하고,

,

, UID와 함께 암호화하여 클라우드 스토리지 서버(150)로 전송한다(420). 클라우드 스토리지 서버(150)는

를 저장한 후, 재차 해시화하여 소유권 검증 데이터 생성을 위한 데이터(

)를 생성한다(430). 이때,

는 아래의 수학식 5를 통해 생성될 수 있다.The user client (110)

Wow

To calculate

Lt; / RTI >

,

, And transmits it to the cloud storage server 150 by encrypting it together with the UID (420). The cloud storage server 150

And then re-hashes the data to generate ownership verification data (

(430). At this time,

Can be generated by the following equation (5).

마지막으로, 도 5를 참조하여 소유권 검증 데이터 생성을 위한 데이터 전송 및 저장된 데이터 목록 생신 과정을 설명하기로 한다.Lastly, referring to FIG. 5, a description will be given of data transmission for generating ownership verification data and a stored data list birth process.

도 5는 클라우드 스토리지 서버(150)에 데이터의 업로드가 완료된 후, 데이터 목록 갱신을 수행하는 과정을 나타낸 것이다.FIG. 5 shows a process of updating the data list after the uploading of data to the cloud storage server 150 is completed.

클라우드 스토리지 서버(150)는 자신이 생성한

와 사용자 클라이언트(110)가 전송한

를 연산하여

를 생성하고, 사용자 클라이언트(110)가 전송한

,

, UID를 메타데이터 서버(130)로 전송하며, 메타데이터 서버(130)는 전송 받은 데이터를 저장한다(510, 520). 이때, 메타데이터 서버(130)로 전송되는 데이터는 업로드가 완료된 데이터의 식별자일 수도 있다. The cloud storage server (150)

And the user client 110

By the operation

And transmits the data transmitted by the user client 110

,

, And transmits the UID to the metadata server 130. The metadata server 130 stores the received data (510, 520). At this time, the data transmitted to the metadata server 130 may be an identifier of the uploaded data.

도 7은 본 발병의 일 실시 예에 따른 데이터 중복 제거 방법의 흐름을 도시한 도면이다. 7 is a flowchart illustrating a data de-duplication method according to an embodiment of the present invention.

도 7을 참조하면, 먼저, 사용자 클라이언트(110)가, 메타데이터 서버(130)로 업로드할 데이터의 중복을 확인 요청한다(S710). 이때, 사용자 클라이언트(110)가, 상기 메타데이터 서버(130)로 자신의 식별자 정보를 전송하고, 메타데이터 서버(130)가, 상기 수신한 사용자 클라이언트(110)의 식별자 정보를 확인하여 사용자 클라이언트(110)를 확인하고, 상기 사용자 클라이언트(110)로 세션키를 발급하여 전송할 수 있다. 또한, 사용자 클라이언트(110)가, 상기 세션키를 수신하고, 업로드 하고자 하는 데이터를 블록으로 나누어 CE(Convergent Encryption) 기술을 이용하여 암호화 키와 암호 데이터를 생성하며, 상기 생성된 암호화 키와 암호 데이터를 해시화하고 XOR 연산을 하여 데이터 식별자를 생성하여 상기 메타데이터 서버(130)로 전송할 수 있다. Referring to FIG. 7, first, the user client 110 requests confirmation of duplication of data to be uploaded to the metadata server 130 (S710). At this time, the user client 110 transmits its identifier information to the metadata server 130, and the metadata server 130 confirms the identifier information of the received user client 110, 110), and can issue a session key to the user client 110 and transmit the session key. Also, the user client 110 receives the session key, divides the data to be uploaded into blocks, and generates an encryption key and encryption data using a CE (Convergent Encryption) technology. The generated encryption key and encryption data And XOR operation to generate a data identifier and transmit the data identifier to the metadata server 130.

메타데이터 서버(130)가, 상기 요청된 데이터의 중복 여부를 확인하고, 그 결과를 상기 사용자 클라이언트(110)로 전송한다(S720). 이때, 메타데이터 서버(130)가, 상기 수신한 데이터 식별자를 미리 저장된 데이터 식별자와 비교하여 저장 여부를 판단하고, 메타데이터 서버(130)가, 상기 판단 결과 저장되지 않은 데이터 식별자만 재구성하여 데이터 목록을 구성하고, 상기 재구성된 데이터 목록과 상기 데이터 목록에 포함되지 않은 식별자에 연결된 소유권 검증 데이터 생성을 위한 데이터를 상기 사용자 클라이언트(110)로 전송할 수 있다. The metadata server 130 checks whether the requested data is duplicated and transmits the result to the user client 110 (S720). At this time, the metadata server 130 compares the received data identifier with a previously stored data identifier to determine whether to store the data identifier, and the metadata server 130 reconstructs only the data identifier that is not stored as a result of the determination, And transmits data for generating ownership verification data linked to the reconfigured data list and an identifier not included in the data list to the user client 110. [

사용자 클라이언트(110)가, 상기 수신한 결과에 따라 XOR 연산을 통해 소유권 검증 데이터를 생성하여 상기 메타데이터 서버(130)로 전송한다(S730). 이때, 사용자 클라이언트(110)가, 상기 메타데이터 서버(130)로부터 수신한 재구성된 데이터 목록과 소유권 검증 데이터 생성을 위한 데이터를 자신이 소유한 데이터 원본과 XOR 연산하여 비밀값을 추출하고, 사용자 클라이언트(110)가, 상기 추출된 비밀값과 보유한 데이터를 XOR 연산하여 소유권 검증 데이터를 생성하고, 상기 생성한 소유권 검증 데이터를 메타데이터 서버(130)로 전송할 수 있다. The user client 110 generates ownership verification data by XOR operation according to the received result and transmits it to the metadata server 130 in operation S730. At this time, the user client 110 performs XOR operation on the reconstructed data list received from the metadata server 130 and data for generating ownership verification data with a data source owned by the user client 110 to extract a secret value, (110) generates ownership verification data by XORing the extracted secret value and the held data, and transmits the generated ownership verification data to the metadata server (130).

메타데이터 서버(130)가, 상기 수신한 소유권 검증 데이터를 이용하여 소유권 검증을 수행하고, 그 결과를 상기 사용자 클라이언트(110)로 전송한다(S740). 이때, 메타데이터 서버(130)가, 상기 소유권 검증 데이터를 XOR 연산하여 소유권 검증을 수행하고, 소유권 검증이 완료되면, 상기 사용자 클라이언트(110)로 저장되어 있지 않은 데이터의 업로드를 허가하는 데이터의 소유권을 발급하여 전송할 수 있다. The metadata server 130 performs ownership verification using the received ownership verification data, and transmits the result to the user client 110 (S740). At this time, the metadata server 130 XORs the ownership verification data to perform ownership verification. Upon completion of the ownership verification, the metadata server 130 acquires the ownership of data that is permitted to upload data that is not stored in the user client 110 Can be issued and transmitted.

사용자 클라이언트(110)가, 상기 검증 결과로 데이터의 업로드 허가를 수신한 경우, 상기 클라우드 스토리지 서버(150)로 암호화된 데이터를 전송한다(S750). 이때, 사용자 클라이언트(110)가, 상기 메타데이터 서버(130)로부터 데이터의 소유권을 수신하는 경우, 암호화된 데이터와 자신이 생성한 비밀값을 이용하여 생성한 검증 데이터를 클라우드 스토리지 서버(150)로 전송할 수 있다. When the user client 110 receives the data upload permission as a result of the verification, the encrypted data is transmitted to the cloud storage server 150 in operation S750. In this case, when the user client 110 receives the ownership of the data from the metadata server 130, the user client 110 transmits the verification data generated using the encrypted data and the secret value generated by the user client 110 to the cloud storage server 150 Lt; / RTI >

클라우드 스토리지 서버(150)가, 상기 사용자 클라이언트(110)로부터 암호화된 데이터를 수신하여 데이터를 갱신한다(S760). 이때, 클라우드 스토리지 서버(150)가, 상기 사용자 클라이언트(110)로부터 수신한 암호화된 데이터를 저장하고, 암호화된 데이터를 해시화와 XOR 연산을 통해 소유권 검증 데이터 생성을 위한 데이터를 생성하며, 상기 메타데이터 서버(130)로 상기 생성된 소유권 검증 데이터 생성을 위한 데이터를 전송하여, 갱신된 데이터 목록을 상기 메타데이터 서버(130)와 동기화할 수 있다. The cloud storage server 150 receives the encrypted data from the user client 110 and updates the data (S760). At this time, the cloud storage server 150 stores encrypted data received from the user client 110, generates data for generating ownership verification data through hashing and XOR operation of the encrypted data, Data for generating the generated ownership verification data may be transmitted to the data server 130, and the updated data list may be synchronized with the metadata server 130.

상술한 바와 같은 본 발명에 따르면, 메타데이터 서버(130)와 클라우드 스토리지 서버(150)를 구분하여 메타데이터 서버(130)에서 데이터 중복 여부 판단과 소유권 검증을 수행하고 클라우드 스토리지 서버(150)에는 중복되지 않은 데이터만을 전송하는 과정에서, 데이터 중복 제거 과정에서 발생하는 소유권 검증을 XOR 기반으로 수행하여 중복되는 데이터를 제거함에 따라, 해시 트리(머클 트리) 방식을 이용하는 기존의 방법에 비해 다수의 해시 연산을 감소시킬 수 있으며, 시스템에서 발생하는 전체 연산량을 감소시킬 수 있는 효과가 있다. According to the present invention, the metadata server 130 and the cloud storage server 150 are distinguished from each other, and the metadata server 130 performs data redundancy determination and ownership verification, and the cloud storage server 150 includes redundant In the process of transmitting only the data that is not transmitted, the ownership verification generated in the data deduplication process is performed based on the XOR to remove the redundant data. As compared with the conventional method using the hash tree (merch tree) method, It is possible to reduce the total amount of computation generated in the system.

또한, 암호화된 데이터를 제공하여 데이터의 기밀성을 보장할 수 있으며, 클라우드 스토리지 서버(150)에 저장되는 데이터의 중복을 방지함으로써, 저장 공간의 저장 효율을 향상시킬 수 있다.Also, it is possible to guarantee the confidentiality of the data by providing the encrypted data, and to prevent the data stored in the cloud storage server 150 from being duplicated, the storage efficiency of the storage space can be improved.

본 발명의 실시예에 따른 방법들은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는, 본 발명을 위한 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The methods according to embodiments of the present invention may be implemented in an application or implemented in the form of program instructions that may be executed through various computer components and recorded on a computer readable recording medium. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination. The program instructions recorded on the computer-readable recording medium may be ones that are specially designed and configured for the present invention and are known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

본 명세서는 많은 특징을 포함하는 반면, 그러한 특징은 본 발명의 범위 또는 특허청구범위를 제한하는 것으로 해석되어서는 아니 된다. 또한, 본 명세서의 개별적인 실시 예에서 설명된 특징들은 단일 실시 예에서 결합되어 구현될 수 있다. 반대로, 본 명세서의 단일 실시 예에서 설명된 다양한 특징들은 개별적으로 다양한 실시 예에서 구현되거나, 적절히 결합되어 구현될 수 있다.While the specification contains many features, such features should not be construed as limiting the scope of the invention or the scope of the claims. In addition, the features described in the individual embodiments herein may be combined and implemented in a single embodiment. On the contrary, the various features described in the singular embodiments may be individually implemented in various embodiments or properly combined.

도면에서 동작들이 특정한 순서로 설명되었으나, 그러한 동작들이 도시된 바와 같은 특정한 순서로 수행되는 것으로 또는 일련의 연속된 순서, 또는 원하는 결과를 얻기 위해 모든 설명된 동작이 수행되는 것으로 이해되어서는 안 된다. 특정 환경에서 멀티태스킹 및 병렬 프로세싱이 유리할 수 있다. 아울러, 상술한 실시 예에서 다양한 시스템 구성요소의 구분은 모든 실시 예에서 그러한 구분을 요구하지 않는 것으로 이해되어야 한다. 상술한 앱 구성요소 및 시스템은 일반적으로 단일 소프트웨어 제품 또는 멀티플 소프트웨어 제품에 패키지로 구현될 수 있다.Although the operations are described in a particular order in the figures, it should be understood that such operations are performed in a particular order as shown, or that all described operations are performed in a series of sequential orders, or to obtain the desired result. In certain circumstances, multitasking and parallel processing may be advantageous. It should also be understood that the division of various system components in the above embodiments does not require such distinction in all embodiments. The above-described application components and systems can generally be packaged into a single software product or multiple software products.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시 예 및 첨부된 도면에 의해 한정되는 것은 아니다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The present invention is not limited to the drawings.

110 : 사용자 클라이언트(User)
130 : 메타데이터 서버(Metadata Server)
150 : 클라우드 스토리지 서버(Storage Server)110: User Client (User)
130: Metadata Server
150: Cloud Storage Server (Storage Server)

Claims

delete

A method for deduplicating data in a deduplication system including a user client, a metadata server, and a cloud storage server,
Requesting the user client to confirm duplication of data to be uploaded to the metadata server;
Checking whether the metadata requested by the metadata server is duplicated, and transmitting the result to the user client;
Generating the ownership verification data through the XOR operation according to the received result and transmitting the generated ownership verification data to the metadata server;
Performing the ownership verification using the received ownership verification data and transmitting the result to the user client;
Transmitting encrypted data to the cloud storage server when the user client receives an upload permission of data as a result of the verification; And
The cloud storage server receiving encrypted data from the user client and updating the data,
Wherein the step of requesting confirmation of the duplication of the data comprises:
Transmitting, by the user client, its identifier information to the metadata server;
Confirming a user client by checking the identifier information of the received user client, issuing and transmitting a session key to the user client; And
The user client receives the session key, divides the data to be uploaded into blocks, generates an encryption key and encryption data using CE (Convergent Encryption) technology, and hashizes the generated encryption key and encryption data Generating a data identifier by performing an XOR operation and transmitting the data identifier to the metadata server,
Checking whether the requested data is duplicated, and transmitting the result to the user client,
Comparing the received data identifier with a previously stored data identifier and determining whether to store the data identifier; And
The metadata server configures a data list by reconstructing only the data identifier that is not stored as a result of the determination, and transmits data for generating ownership verification data, which is connected to the reconfigured data list and the identifier not included in the data list, , The method comprising:
Generating the ownership verification data through the XOR operation according to the received result and transmitting the generated ownership verification data to the metadata server,
Extracting a secret value by XORing the reconstructed data list received from the metadata server and data for generating ownership verification data with a data source owned by the user client; And
The user client XORing the extracted secret value and the held data to generate ownership verification data, and transmitting the generated ownership verification data to the metadata server.

5. The method of claim 4,
Performing ownership verification using the received ownership verification data, and transmitting the result to the user client,
The metadata server performs XOR operation on the ownership verification data to perform ownership verification. When the ownership verification is completed, the meta data server issues ownership of data that is allowed to upload data that is not stored in the user client, Removal method.

6. The method of claim 5,
Wherein the step of transmitting the encrypted data to the cloud storage server comprises the steps of:
When the user client receives the ownership of data from the metadata server, transmits the verification data generated using the encrypted data and the secret value generated by the user client to the cloud storage server.

The method according to claim 6,
Wherein the receiving the encrypted data from the user client and updating the data comprises:
The cloud storage server updates and stores the encrypted data received from the user client in a stored list and generates data for generating ownership verification data through hashing and XOR operation of the encrypted data, And transmitting the generated data for generating ownership verification data to the metadata server to synchronize the updated data list with the metadata server.

delete