KR102157836B1

KR102157836B1 - Deduplication security method for privacy protection in cloud environment

Info

Publication number: KR102157836B1
Application number: KR1020180096811A
Authority: KR
Inventors: 이동혁; 박남제
Original assignee: 제주대학교 산학협력단
Priority date: 2018-08-20
Filing date: 2018-08-20
Publication date: 2020-09-18
Also published as: KR20200021268A

Abstract

본 발명은 클라우드 환경에서 복수의 사용자에 의해 업로드된 동일한 파일을 중복으로 저장하지 않도록 하면서도 파일정보로부터 사용자의 정보를 유추할 수 없도록 하여 프라이버시 문제를 해결하는 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법을 제공하는 것이 목적이다. 이를 위해서, a)메타 서버가 SID를 생성하고 상기 SID를 암호화하여 클라이언트 디바이스로 전송하는 단계;와 b)클라이언트 디바이스가 상기 암호화된 SID를 복호화하는 단계;와 c)클라이언트 디바이스가 상기 메타 서버로 파일 업로드에 필요한 파일 경로를 요청하는 단계;와 d)메타 서버가 상기 파일 경로를 생성하는 단계;와 e)클라이언트 디바이스가 업로드 파일을 이중으로 해쉬처리하여 FHV를 생성하는 단계;와 f)클라이언트 디바이스가 FHV를 상기 SID로 암호화 처리하여 상기 메타 서버로 전송하는 단계;와 g)메타 서버가 상기 SID로 암호화 처리된 FHV를 복호화하는 단계;와 h)메타 서버가 FHV를 스토리지 서버로 전송하여 업로드 파일이 이미 등록된 파일인지를 요청하는 단계; 및 i)스토리지 서버는 상기 FHV의 분석하여 업로드 파일이 이미 등록된 파일인지에 대한 중복 여부를 확인하여 상기 메타 서버 및 상기 클라이언트 디비이스로 알려주는 단계;를 포함하는 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법이 제공된다.The present invention is a file deduplication method for protecting privacy in a cloud environment that solves the privacy problem by preventing the user's information from being inferred from the file information while preventing duplicate storage of the same file uploaded by a plurality of users in a cloud environment The purpose is to provide. To this end, a) a meta server generates an SID, encrypts the SID, and transmits the encrypted SID to the client device; and b) the client device decrypts the encrypted SID; and c) the client device sends a file to the meta server. Requesting a file path for uploading; and d) generating the file path by the meta server; and e) generating an FHV by double hashing the uploaded file by the client device; and f) the client device Encrypting the FHV with the SID and transmitting it to the meta server; And g) Decrypting the FHV encrypted with the SID by the meta server; And h) The meta server transmits the FHV to the storage server and the upload file is Requesting whether the file is already registered; And i) the storage server analyzing the FHV to check whether or not the uploaded file is an already registered file, and notifying the meta server and the client device; file for privacy protection in a cloud environment including Deduplication method is provided.

Description

Deduplication security method for privacy protection in cloud environment}

본 발명은 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법에 대한 것으로, 보다 상세하게는 PIN, RefID, FHV를 기반으로 사용자가 파일에 대한 업로드와 다운로드를 수행함으로써, 사용자-파일 리스트와 파일의 매핑관계를 원천적으로 차단하여 서버상에서의 메타데이터 및 파일구조 분석만을 통해서는 사용자를 유추할 수 없도록 하여 사용자가 업로드한 파일을 서버 관리자라도 알 수 없도록 하는 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법에 대한 것이다.The present invention relates to a file deduplication method for privacy protection in a cloud environment. More specifically, a user uploads and downloads a file based on a PIN, RefID, and FHV, thereby mapping a user-file list and a file. In a cloud environment that prevents users from being inferred only through metadata and file structure analysis on the server by fundamentally blocking the relationship, the file deduplication method for privacy protection in a cloud environment that prevents even server administrators from knowing the files uploaded by users. For.

최근 클라우드 서비스가 대중화되면서 다양한 클라우드 서비스 제공 업체가 등장하고 있다. 클라우드 플랫폼에서는 다양한 형태의 서비스 제공이 가능하며, 이 가운데 스토리지 기반의 클라우드 서비스가 가장 널리 사용되는 서비스 유형으로 향후에도 지속적으로 시장이 확대될 것으로 보인다. As cloud services have recently become popular, various cloud service providers are emerging. Cloud platforms can provide various types of services, and among them, storage-based cloud services are the most widely used service types, and the market is expected to continue to expand in the future.

특히, 향후 다가올 4차산업시대에서는 데이터 사이즈가 기하급수적으로 늘어날 것이며, 이는 결국 스토리지 확대에 따른 비용 증가로 이어지게 되어 서비스 제공 업체에 큰 부담으로 다가올 것이다.In particular, data size will increase exponentially in the upcoming 4th industrial era, which will eventually lead to an increase in cost due to storage expansion, which will put a heavy burden on service providers.

따라서, 클라우드 환경에서는 중복제거 기술이 매우 중요한 요소기술 중 하나이다. 중복제거 기술이란 복수의 사용자에 의해 업로드된 동일한 데이터를 중복으로 저장하지 않는 기술을 의미하며, 이러한 중복제거 기술을 통하여 획기적인 스토리지 용량 절감이 가능하다는 장점이 있다.Therefore, in a cloud environment, deduplication technology is one of the very important element technologies. The deduplication technology refers to a technology that does not redundantly store the same data uploaded by a plurality of users, and has the advantage that it is possible to dramatically reduce storage capacity through such a deduplication technology.

따라서, 현재 상당수의 클라우드 스토리지 환경에서는 중복제거 기술이 이미 적용되어 있다. 클라우드 환경의 특성상 대용량의 스토리지 서버가 필요하며, 저장 용량이 증가하면 추가 스토리지 증설이 필요하다. 그러나 중복제거 기술을 사용하면 이러한 비용상의 문제를 원천적으로 해결할 수 있다.Therefore, deduplication technology is already applied in many cloud storage environments. Due to the nature of the cloud environment, a large-capacity storage server is required, and if the storage capacity increases, additional storage expansion is required. However, using deduplication technology can fundamentally solve this cost problem.

그러나, 중복제거 기술은 구조적으로 프라이버시 문제를 안고 있다. 중복제거 기술이 적용되려면, 사용자와 파일의 매핑구조를 메타정보로 저장하게 되며, 이 과정에서 서버상의 메타 분석을 통하여 특정 파일을 업로드한 사용자의 리스트를 확보할 수 있기 때문이다. 특히, 정치성향, 사상, 특정 질병, 성생활 등 민감성을 가지고 있는 파일인 경우 문제는 매우 심각할 수 있다. 클라우드 서버에서 해당 파일의 업로드 사용자 리스트를 확보할 수 있게 된다면 향후 클라우드 환경이 일종의 감시 체제로 작용할 위험성도 배제할 수 없기 때문이다.However, the deduplication technology has a structural privacy problem. This is because if the deduplication technology is to be applied, the mapping structure of users and files is stored as meta information, and in this process, a list of users who uploaded a specific file can be secured through meta-analysis on the server. In particular, in the case of files with sensitivity such as political orientation, thoughts, specific diseases, and sexual life, the problem can be very serious. This is because if a list of users uploading the file can be secured from the cloud server, the risk that the cloud environment will act as a kind of monitoring system in the future cannot be excluded.

따라서, 클라우드 서버에 파일을 중복되지 않도록 제정하면서도 프라이버시 문제를 해결할 수 있는 방법이 필요한 시점이다.Therefore, it is time to find a way to solve the privacy problem while enacting files in the cloud server so as not to be duplicated.

대한민국등록특허 제10-1422759호(데이터 위탁 환경에서 결탁을 방지하는 데이터 저장 및 공유 방법)Korean Registered Patent No. 10-1422759 (Data storage and sharing method to prevent collusion in data consignment environment)

본 발명은 클라우드 환경에서 복수의 사용자에 의해 업로드된 동일한 파일을 중복으로 저장하지 않도록 하면서도 파일정보로부터 사용자의 정보를 유추할 수 없도록 하여 프라이버시 문제를 해결하는 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법을 제공하는 것이 목적이다.The present invention is a file deduplication method for protecting privacy in a cloud environment that solves the privacy problem by preventing the user's information from being inferred from the file information while preventing duplicate storage of the same file uploaded by a plurality of users in a cloud environment The purpose is to provide.

본 발명은 사용자에 의해서 PIN을 제공받아 RefID를 생성하고 클라우드에 저장되는 파일을 이중으로 해쉬처리하여 FHV를 생성하는 클라이언트 디바이스와 상기 RefID를 제공받아 저장하는 메타 서버와 상기 FHV를 제공받아 저장하는 스토리지 서버를 포함하는 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법에 있어서, a)메타 서버가 SID를 생성하고 상기 SID를 암호화하여 클라이언트 디바이스로 전송하는 단계;와 b)클라이언트 디바이스가 상기 암호화된 SID를 복호화하는 단계;와 c)클라이언트 디바이스가 상기 메타 서버로 파일 업로드에 필요한 파일 경로를 요청하는 단계;와 d)메타 서버가 상기 파일 경로를 생성하는 단계;와 e)클라이언트 디바이스가 업로드 파일을 이중으로 해쉬처리하여 FHV를 생성하는 단계;와 f)클라이언트 디바이스가 FHV를 상기 SID로 암호화 처리하여 상기 메타 서버로 전송하는 단계;와 g)메타 서버가 상기 SID로 암호화 처리된 FHV를 복호화하는 단계;와 h)메타 서버가 FHV를 스토리지 서버로 전송하여 업로드 파일이 이미 등록된 파일인지를 요청하는 단계; 및 i)스토리지 서버는 상기 FHV의 분석하여 업로드 파일이 이미 등록된 파일인지에 대한 중복 여부를 확인하여 상기 메타 서버 및 상기 클라이언트 디비이스로 알려주는 단계;를 포함하는 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법이 제공된다.The present invention is a client device that generates a RefID by receiving a PIN from a user and generates a RefID and double hashed a file stored in the cloud to generate an FHV, a meta server that receives and stores the RefID, and a storage that receives and stores the FHV. A method for deduplication of files for protection of privacy in a cloud environment including a server, the method comprising: a) generating an SID by a meta server, encrypting the SID, and transmitting it to a client device; And b) the client device transmits the encrypted SID. Decrypting; And c) A client device requesting a file path required for file upload to the meta server; And d) A meta server generating the file path; And e) A client device doubles the upload file. Generating an FHV by hashing; and f) encrypting the FHV with the SID by a client device and transmitting the FHV to the meta server; and g) decrypting the FHV encrypted with the SID by the meta server; and h) requesting whether the uploaded file is a registered file by transmitting the FHV to the storage server by the meta server; And i) the storage server analyzing the FHV to check whether or not the uploaded file is an already registered file, and notifying the meta server and the client device; file for privacy protection in a cloud environment including Deduplication method is provided.

여기서, j)상기 업로드 파일이 등록되지 않은 파일인 경우, 상기 클라이언트 디비이스는 상기 업로드 파일을 상기 스토리지에 업로드하는 단계;와 k)클라이언트 디바이스는 사용자로부터 PIN을 제공받아 RefID를 생성하는 단계;와 l)스토리지 서버는 저장된 업로드 파일의 무결성을 확인하는 단계;와 m)스토리지 서버가 저장된 파일과 FHV를 매핑 처리하는 단계;와 n)클라이언트 디바이스가 상기 RefID를 메타 서버로 전송하는 단계; 및 o)메타 서버가 저장된 상기 RefID와 파일 경로를 매핑 처리하고 파일 업로드가 완료되었음을 클라이언트 디바이스로 알려주는 단계;를 더 포함하는 것을 특징으로 할 수 있다.Here, j) if the upload file is a file that is not registered, the client device uploading the upload file to the storage; and k) the client device receiving a PIN from the user and generating a RefID; And l) the storage server verifies the integrity of the stored upload file; and m) the storage server maps the stored file and the FHV; and n) the client device transmits the RefID to the meta server; And o) mapping the stored RefID and the file path by the meta server, and informing the client device that the file upload has been completed.

여기서, 상기 RefID는 다음 (식1)을 통해 생성되는 것을 특징으로 할 수 있다.Here, the RefID may be characterized in that it is generated through the following (Equation 1).

(식1) E(H(UserID)

(H(PIN)

FHV))^H(PIN) (Equation 1) E(H(UserID)

(H(PIN)

FHV)) ^H(PIN)

또한, 사용자에 의해서 PIN을 제공받아 RefID를 생성하고 클라우드에 저장되는 파일을 이중으로 해쉬처리하여 FHV를 생성하는 클라이언트 디바이스와 상기 RefID를 제공받아 저장하는 메타 서버와 상기 FHV를 제공받아 저장하는 스토리지 서버를 포함하는 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법에 있어서, a)메타 서버가 SID를 생성하고 상기 SID를 암호화하여 클라이언트 디바이스로 전송하는 단계;와 b)클라이언트 디바이스가 상기 암호화된 SID를 복호화하는 단계;와 c)클라이언트 디바이스가 상기 메타 서버로 특정 경로에 대한 파일 다운로드를 요청하는 단계;와 d)메타 서버가 상기 특정 경로에 다운로드 파일이 존재하는지를 확인하는 단계;와 e)클라이언트 디바이스가 사용자로부터 PIN을 제공받고 상기 PIN의 해쉬값을 SID로 암호화하여 메타 서버로 전송하는 단계;와 f)메타 서버가 복호화를 통해서 PIN의 해쉬값을 복호화하는 단계;와 g)메타 서버가 상기 RefID 및 상기 PIN의 해쉬값을 이용하여 FHV를 복호화하는 단계;와 h)메타 서버가 FHV를 스토리지 서버로 전송하여 상기 FHV와 매핑된 다운로드 파일이 존재하는지를 요청하는 단계;와 i)스토리지 서버는 상기 FHV의 분석하여 매핑된 다운로드 파일이 존재함을 상기 메타 서버 및 상기 클라이언트 디바이스로 알려주는 단계; 및 j)클라이언트 디바이스가 상기 스토리지 서버에 접속하여 상기 FHV와 매핑된 파일을 다운로드 하는 단계;를 포함하는 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 방법이 제공된다.In addition, a client device that generates a RefID by receiving a PIN from the user and generates a RefID and double hashed a file stored in the cloud to generate an FHV, a meta server that receives and stores the RefID, and a storage server that receives and stores the FHV. A method for deduplication of files for privacy protection in a cloud environment comprising: a) generating an SID by a meta server, encrypting the SID, and transmitting it to a client device; And b) the client device decrypts the encrypted SID. And c) requesting, by a client device, to download a file for a specific path to the meta server; and d) checking whether a download file exists in the specific path by the meta server; and e) a user of the client device Receiving a PIN from, encrypting the hash value of the PIN with SID, and transmitting it to the meta server; and f) decrypting the hash value of the PIN by the meta server; and g) the meta server decrypting the RefID and the Decrypting the FHV using the hash value of the PIN; And h) The meta server transmits the FHV to the storage server and requests whether a downloaded file mapped to the FHV exists; and i) the storage server analyzes the FHV. Notifying to the meta server and the client device that the mapped download file exists; And j) connecting a client device to the storage server to download a file mapped with the FHV; a method for removing duplicate files for privacy protection in a cloud environment including.

여기서, 상기 g)단계에서 상기 FHV는 다음 (식2)를 통해 생성되는 것을 특징으로 할 수 있다.Here, in step g), the FHV may be generated through the following (Equation 2).

(식2) D((RefID

UserID)

(H(PIN)))^H(PIN) (Equation 2) D((RefID

UserID)

(H(PIN))) ^H(PIN)

본 발명은 클라이언트 디바이스와 클라우드 서버(메타서버, 스토리지 서버를 포함)간에 사전 공유된 암호화 키를 이용한 암호화된 파라미터를 이용하므로 외부자 공격에 의해 사용자 정보가 유출되는 것을 방지하는 효과가 있다.The present invention has an effect of preventing user information from being leaked by an outsider attack because an encrypted parameter using an encryption key shared in advance between a client device and a cloud server (including a meta server and a storage server) is used.

또한, 본 발명은 클라우드 서버인 메타 서버와 스토리지 서버를 물리적으로 분리하여 서로 다른 관리자가 관리하며 각각의 서버에서 사용되는 정보를 이용하여 상대편에서 이용되는 정보를 복호화하지 못하도록 함으로써, 내부자의 공격에 의한 사용자 정보 유출을 방지할 수 있는 효과가 있다.In addition, the present invention physically separates the meta server, which is a cloud server, and the storage server, managed by different administrators, and prevents decryption of the information used by the other party using the information used in each server. It has the effect of preventing the leakage of user information.

도 1은 본 발명의 일실시예로 클라우드 서버에 업로드할 파일이 중복으로 저장되어 있는지를 확인하는 것을 나타낸 도면이다.
도 2는 본 발명의 일실시예로 클라우드 서버에 파일을 업로드한 후에 해당 파일의 중복 저장을 제거 및 프라이버시 보호를 위한 매핑을 수행하는 것을 나타낸 도면이다.
도 3은 본 발명의 일실시예로 클라우드 서버에 저장된 파일을 다운로드하는 과정을 나타낸 도면이다.FIG. 1 is a diagram illustrating checking whether a file to be uploaded to a cloud server is stored in duplicate according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a mapping for removing redundant storage and protecting privacy after uploading a file to a cloud server according to an embodiment of the present invention.
3 is a diagram illustrating a process of downloading a file stored in a cloud server according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되는 실시 예를 참조하면 명확해질 것이다.Advantages and features of the present invention, and a method of achieving them will become apparent with reference to embodiments to be described later in detail together with the accompanying drawings.

그러나, 본 발명은 이하에서 개시되는 실시 예로 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이다.However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms.

본 명세서에서 본 실시 예는 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다.In the present specification, the present embodiment is provided to complete the disclosure of the present invention, and to completely inform the scope of the invention to those of ordinary skill in the art to which the present invention pertains.

그리고 본 발명은 청구항의 범주에 의해 정의될 뿐이다.And the invention is only defined by the scope of the claims.

따라서, 몇몇 실시 예에서, 잘 알려진 구성 요소, 잘 알려진 동작 및 잘 알려진 기술들은 본 발명이 모호하게 해석되는 것을 피하기 위하여 구체적으로 설명되지 않는다.Accordingly, in some embodiments, well-known components, well-known operations, and well-known techniques have not been described in detail in order to avoid obscuring interpretation of the present invention.

또한, 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭하고, 본 명세서에서 사용된(언급된) 용어들은 실시 예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다.In addition, throughout the specification, the same reference numerals refer to the same constituent elements, and terms used in the present specification (referred to) are for describing exemplary embodiments and not limiting the present invention.

본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함하며, '포함(또는, 구비)한다'로 언급된 구성 요소 및 동작은 하나 이상의 다른 구성요소 및 동작의 존재 또는 추가를 배제하지 않는다.In this specification, the singular form also includes the plural form unless specifically stated in the phrase, and the components and actions referred to as'include (or, have)' do not exclude the presence or addition of one or more other components and actions. .

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used as meanings that can be commonly understood by those of ordinary skill in the art to which the present invention belongs.

또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 정의되어 있지 않은 한 이상적으로 또는 과도하게 해석되지 않는다.In addition, terms defined in a commonly used dictionary are not interpreted ideally or excessively unless defined.

본 발명에서 사용하는 용어에 대해서 다음 [표1]과 같이 정의하기로 한다.Terms used in the present invention will be defined as shown in [Table 1] below.

본 발명에서 사용되는 약어 및 그 의미Abbreviations used in the present invention and their meaning 약어Abbreviation 의미meaning UserIDUserID 사용자 IDUser ID SIDSID Session IDSession ID PKPK 사전 공유된 암호화 PK 키 Pre-shared encryption PK key FileFile 업로드되거나 다운로드되는 파일Files being uploaded or downloaded FilehashFilehash 파일의 해쉬값Hash value of the file FHVFHV 파일의 해쉬값의 해쉬값(즉, 파일의 이중해쉬값)Hash value of the hash value of the file (i.e., the double hash value of the file) FilePathFilePath 목적지 파일(저장된 파일)의 접근 파일경로Access file path of destination file (stored file) PINPIN 사용자 인증값User authentication value RefIDRefID 메타 서버에 저장되는 참조값Reference value stored in meta server H(·)^K H(·) ^K 특정 대상의 해쉬결과(해쉬값)Hash result of a specific target (hash value) E(·)^K E(·) ^K 특정 대상을 key K를 이용하여 암호화를 수행한 결과값Result value of performing encryption on a specific target using key K D(·)^K D(·) ^K 특정 대상을 key K를 이용하여 복호화를 수행한 결과값The result value of decoding a specific target using key K

본 발명의 클라우드 환경에서 프라이버시 보호를 위한 파일 중복제거 시스템은 클라이언트 디바이스(100), 메타 서버(200) 및 스토리지 서버(300)를 포함하여 구성된다.A file deduplication system for privacy protection in a cloud environment of the present invention includes a client device 100, a meta server 200, and a storage server 300.

여기서, 메타 서버(200)와 스토리지 서버(300)는 클라우드 서버를 의미하며, 메타 서버(200)는 메타 데이터를 저장하고 스토리지 서버(300)는 메타 데이터에 대응하는 사용자가 업로드 및 다운로드하여 이용하는 실제 데이터가 저장된다.Here, the meta server 200 and the storage server 300 refer to a cloud server, and the meta server 200 stores meta data, and the storage server 300 is an actual uploaded and downloaded user corresponding to the meta data. The data is saved.

여기서, 클라이언트 디바이스(100)는 스마트폰을 포함하는 모바일 기기, 노트북, PC 등 클라이언트가 사용하여 클라우드 서버에 접속할 수 있는 모든 장치를 의미한다.Here, the client device 100 refers to any device that can be used by a client to access a cloud server, such as a mobile device including a smartphone, a laptop computer, and a PC.

본 발명에서 메타 서버(200)와 스토리지 서버(300)는 엄격히 분리되어야 한다. 메타 서버(200)와 스토리지 서버(300)는 하나의 서버에 같이 구성될 수 없으며, 서로 다른 서버에 구성되어야 한다. 즉, 메타 서버(200)와 스토리지 서버(300)는 물리적으로 분리되는 것이 바람직하다.In the present invention, the meta server 200 and the storage server 300 must be strictly separated. The meta server 200 and the storage server 300 cannot be configured in one server, but must be configured in different servers. That is, it is preferable that the meta server 200 and the storage server 300 are physically separated.

또한, 메타 서버(200)와 스토리지 서버(300)의 담당 관리자는 별도로 구성하여야 한다.In addition, managers in charge of the meta server 200 and the storage server 300 must be configured separately.

이하, 첨부된 도면을 참고로 본 발명의 바람직한 실시예에 대하여 설명한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 일실시예로 클라우드 서버에 업로드할 파일이 중복으로 저장되어 있는지를 확인하는 것을 나타낸 도면이다.FIG. 1 is a diagram illustrating checking whether a file to be uploaded to a cloud server is stored in duplicate according to an embodiment of the present invention.

S101 단계는 메타 서버(200)가 세션 아이디(SID)를 생성하는 단계이다.In step S101, the meta server 200 generates a session ID (SID).

S102 단계는 메타 서버(200)가 생성한 세션 아이디를 사전 공유된 암호화 PK 키를 이용하여 암호화를 수행하고 이를 클라이언트 디바이스(200)로 제공하는 단계이다.In step S102, the session ID generated by the meta server 200 is encrypted using a pre-shared encryption PK key and provided to the client device 200.

S103 단계는 클라이언트 디바이스(200)가 수신한 암호화된 세션아이디를 사전 공유된 암호화 PK 키를 이용하여 복호화하는 단계이다.Step S103 is a step of decrypting the encrypted session ID received by the client device 200 using a pre-shared encrypted PK key.

S104 단계는 클라이언트 디바이스(200)가 클라이언트 서버에 특정 파일의 업로드를 요청하는 단계이다.In step S104, the client device 200 requests the client server to upload a specific file.

여기서, 클라이언트 디바이스(200)는 원하는 업로드 파일 경로를 요청할 수 있다.Here, the client device 200 may request a desired upload file path.

S105 단계는 메타 서버(200)가 클라이언트 디바이스(200)가 요청한 업로드 파일 경로를 생성하는 단계이다.Step S105 is a step in which the meta server 200 generates an upload file path requested by the client device 200.

S106 단계는 메타 서버(200)가 클라이언트 디바이스(100)가 요청한 업로드 파일 경로가 생성되었음을 클라이언트 디바이스(100)에게 알려주는 단계이다.In step S106, the meta server 200 informs the client device 100 that the upload file path requested by the client device 100 has been generated.

S107 단계는 클라이언트 디바이스(200)가 업로드 할 파일을 해쉬처리하여 업로드 할 파일의 해쉬값을 생성하는 단계이다.Step S107 is a step of generating a hash value of the file to be uploaded by hashing the file to be uploaded by the client device 200.

S108 단계는 클라이언트 디바이스(200)가 S107 단계에서 생성한 업로드 할 파일의 해쉬값을 한번 더 해쉬처리하여 업로드 할 파일의 이중 해쉬값인 FHV를 생성하는 단계이다.In step S108, the client device 200 hashes the hash value of the file to be uploaded generated in step S107 once more to generate the FHV, which is the double hash value of the file to be uploaded.

S109 단계는 클라이언트 디바이스(200)가 FHV를 SID를 이용하여 암호화하여 메타 서버(200)로 전송하는 단계이다.In step S109, the client device 200 encrypts the FHV using the SID and transmits it to the meta server 200.

S110 단계는 메타 서버(200)가 FHV를 복호화하는 단계이다.In step S110, the meta server 200 decrypts the FHV.

여기서, 메타 서버(200)는 SID를 알고 있으므로 SID를 이용하여 암호화된 FHV 를 SID 를 이용하여 복호화를 수행할 수 있다.Here, since the meta server 200 knows the SID, the FHV encrypted using the SID may be decrypted using the SID.

S111 단계는 메타 서버(200)가 스토리지 서버(300)로 FHV를 전송하는 단계이다.In step S111, the meta server 200 transmits the FHV to the storage server 300.

여기서, 메타 서버(200)는 FHV를 통해서 스토리지 서버(300)에 FHV에 해당하는 파일이 존재하는지를 요청한다. Here, the meta server 200 requests whether a file corresponding to the FHV exists from the storage server 300 through the FHV.

S112 단계는 스토리지 서버(300)가 FHV를 이용하여 해당 파일이 스토리지 서버(300)에 저장되어 있는지를 검색한다. 그리고, 스토리지 서버(300)는 검색 결과 해당 파일이 완전히 저장되어 등록된 상태인지, 일부만 저장되어 있는 상태인지, 저장되어 있지 않은 상태인지를 확인한다.In step S112, the storage server 300 searches whether a corresponding file is stored in the storage server 300 using FHV. In addition, the storage server 300 checks whether the corresponding file is completely stored and registered as a result of the search, only partially stored, or not stored.

S113 단계는 스토리지 서버(300)가 FHV를 확인해서 메타 서버(200)가 요청한 파일의 저장 여부를 확인하고 그 결과를 메타 서버(200)로 알려주는 단계이다.In step S113, the storage server 300 checks the FHV, checks whether or not the file requested by the meta server 200 is stored, and informs the meta server 200 of the result.

여기서, 스토리지 서버(300)는 해당 파일이 저장상태(E : Exist)인지, 저장되어 있지 않은 비저장상태(N : Not Exist)인지, 일부만 저장되어 있는 일부저장상태(I : Incomplete)인지에 따라서 해당 상태에 대한 정보를 메타 서버(200)로 전송한다.Here, the storage server 300 depends on whether the file is in a storage state (E: Exist), a non-storage state (N: Not Exist) that is not stored, or a partial storage state (I: Incomplete) in which only a part is stored. Information on the status is transmitted to the meta server 200.

S114 단계는 메타 서버(200)가 스토리지 서버(300)로부터 수신한 파일의 저장상태에 대한 정보를 클라이언트 디바이스(100)로 제공하는 단계이다.In step S114, the meta server 200 provides information on the storage state of the file received from the storage server 300 to the client device 100.

여기서, 파일의 저장상태에 대한 정보는 다음 [표2]와 같다.Here, information on the storage status of the file is shown in [Table 2].

파일 저장 상태에 대한 정보의 약어와 그 의미Abbreviation for information on file storage status and their meaning 약어Abbreviation 의미meaning EE 파일이 이미 스토리지 서버에 존재함File already exists on the storage server NN 파일이 스토리지 서버에 존재하지 않음File does not exist on the storage server II 파일의 일부분이 스토리지 서버에 저장되어 있음Part of the file is stored on the storage server

도 2는 본 발명의 일실시예로 클라우드 서버에 파일을 업로드한 후에 해당 파일의 중복 저장을 제거 및 프라이버시 보호를 위한 매핑을 수행하는 것을 나타낸 도면이다.FIG. 2 is a diagram illustrating a mapping for removing redundant storage and protecting privacy after uploading a file to a cloud server according to an embodiment of the present invention.

S201 단계는 클라이언트 디바이스(100)가 상술한 도 1에서의 S114 단계 이후에 업로드할 파일이 스토리지 서버(300)에 완전히 저장된 상태가 아닌 경우에 업로드 파일을 스토리지 서버(300)에 업로드하는 단계이다.Step S201 is a step of uploading an upload file to the storage server 300 when the file to be uploaded after step S114 in FIG. 1 is not completely stored in the storage server 300 by the client device 100.

여기서, 업로드 파일은 상술한 S105 단계에서 생성한 업로드 파일 경로를 이용하여 해당 파일 경로에 업로드를 수행한다.Here, the upload file is uploaded to a corresponding file path using the upload file path generated in step S105 described above.

S202 단계는 스토리지 서버(300)가 해당 업로드 파일이 업로드가 완료되었음을 클라이언트 디바이스(100)에 알려주는 단계이다.In step S202, the storage server 300 notifies the client device 100 that the upload file has been uploaded.

S203 단계는 사용자가 사용자 인증값(PIN)을 클라이언트 디바이스(100)에 입력하는 단계이다.Step S203 is a step in which the user inputs the user authentication value (PIN) into the client device 100.

S204 단계는 클라이언트 디바이스(100)가 사용자 인증값(PIN)을 기반으로, 메타 서버에 저장되는 참조값(RefID)을 생성하는 단계이다.In step S204, the client device 100 generates a reference value RefID stored in the meta server based on the user authentication value PIN.

여기서, RefID는 다음과 같은 [식1]에 의하여 생성된다.Here, the RefID is generated by the following [Equation 1].

[식1] RefID = E(H(UserID)

(H(PIN)

FHV))^H(PIN) [Equation 1] RefID = E(H(UserID)

(H(PIN)

FHV)) ^H(PIN)

RefID는 PIN의 해쉬값과 FHV를 XOR 연산처리한 결과값과 사용자 ID의 해쉬값을 XOR 연산처리한 결과값을 PIN의 해쉬값으로 암호화하여 생성한다.RefID is generated by encrypting the hash value of the PIN and the result of XORing the FHV and the hash value of the user ID with the hash value of the PIN.

여기서,

는 XOR 연산처리를 수행하는 연산기호이다.here,

Is an operation symbol that performs XOR operation processing.

S205 단계는 스토리지 서버(300)가 클라이언트 디바이스(100)로부터 업로드되어 스토리지 서버(300)에 저장된 파일의 해쉬값을 추출하는 단계이다.In operation S205, the storage server 300 extracts a hash value of a file uploaded from the client device 100 and stored in the storage server 300.

S206 단계는 스토리지 서버(300)가 도 1의 S111 단계를 통해서 클라이언트 디바이스(100)로부터 수신한 FHV 값과 업로드되어 저장된 파일의 해쉬값을 비교하여 저장된 파일의 완전한 파일인지를 검증하는 단계이다.In step S206, the storage server 300 compares the FHV value received from the client device 100 through step S111 of FIG. 1 with the hash value of the uploaded and stored file to verify whether the stored file is a complete file.

S207 단계는 S206 단계에서 검증결과 저장된 파일이 완전한 파일인 경우에 저장된 파일과 FHV값을 매핑하는 단다.In step S207, when the file stored as a result of the verification in step S206 is a complete file, the stored file and the FHV value are mapped.

S208 단계는 클라이언트 디바이스(100)가 S204 단계에서 생성한 RefID 를 메타 서버(200)로 제공하는 단계이다.In step S208, the client device 100 provides the RefID generated in step S204 to the meta server 200.

S209 단계는 메타 서버(200)가 수신한 RefID 를 저장하는 단계이다.Step S209 is a step of storing the RefID received by the meta server 200.

S210 단계는 메타 서버(200)가 RefID와 파일 경로(FilePath)를 매핑하여 저장하는 단계이다.In step S210, the meta server 200 maps and stores the RefID and the file path.

S211 단계는 메타 서버(200)가 업로드된 파일의 중복제거를 위한 파일의 정보와 파일 경로가 매핑되었음을 클라이언트 디바이스(100)로 알려주는 단계이다.In step S211, the meta server 200 informs the client device 100 that information of a file for deduplication of an uploaded file and a file path are mapped.

즉, RefID와 파일 경로(FilePath)가 매핑되었음과 저장된 파일과 FHV값이 매핑되었음을 클라이언트 디바이스(100)로 알려준다.That is, it notifies the client device 100 that the RefID and the file path have been mapped and that the stored file and the FHV value have been mapped.

상기의 과정을 거쳐 본 발명은 클라우드 환경에서 프라이버시 보호를 위한 파일 중복 제거 방법을 제공할 수 있다.Through the above process, the present invention can provide a method for removing duplicate files for privacy protection in a cloud environment.

즉, 본 발명은 업로드할 파일이 스토리지 서버(300)에 저장되어 있는지를 판단하고 이미 저장되어 있는 경우에는 중복으로 동일한 파일을 업로드하지 않고 해당 파일의 이중 해쉬값인 FHV를 스토리지 서버(300)에 이미 저장되어 있는 파일과 매핑시켜 저장한다.That is, the present invention determines whether a file to be uploaded is stored in the storage server 300, and if the file to be uploaded is already stored, the same file is not duplicated and the FHV, which is the double hash value of the file, is transferred to the storage server 300. Save by mapping with the already saved file.

따라서, 동일한 파일을 이용하는 사용자는 동일한 파일을 추가로 스토리지 서버(300)에 저장할 필요없이 해당 파일의 FHV값만을 이용하여 스토리지 서버(300)에 이미 저장되어 있는 동일한 파일을 이용할 수 있다.Accordingly, a user who uses the same file may use the same file already stored in the storage server 300 using only the FHV value of the file without needing to additionally store the same file in the storage server 300.

도 3은 본 발명의 일실시예로 클라우드 서버에 저장된 파일을 다운로드하는 과정을 나타낸 도면이다.3 is a diagram illustrating a process of downloading a file stored in a cloud server according to an embodiment of the present invention.

S301 단계는 메타 서버(200)가 세션아이디인 SID를 생성하는 단계이다.In step S301, the meta server 200 generates an SID, which is a session ID.

S302 단계는 메타 서버(200)가 생성한 SID를 사전 공유된 암호화 PK 키를 이용하여 암호화를 수행하고 이를 클라이언트 디바이스(200)로 제공하는 단계이다.Step S302 is a step of encrypting the SID generated by the meta server 200 using a pre-shared encryption PK key and providing the SID to the client device 200.

S303 단계는 클라이언트 디바이스(200)가 수신한 암호화된 세션아이디를 사전 공유된 암호화 PK 키를 이용하여 복호화하는 단계이다.Step S303 is a step of decrypting the encrypted session ID received by the client device 200 using a pre-shared encrypted PK key.

S304 단계는 클라이언트 디바이스(200)가 메타 서버(200)로 자신이 이전에 업로드한 특정 파일의 다운로드를 요청하는 단계이다.In operation S304, the client device 200 requests the meta server 200 to download a specific file previously uploaded.

여기서, 클라이언트 디바이스(200)는 업로드한 파일의 파일 경로를 알고 있기 대문에 해당 파일 경로를 메타 서버(200)로 알려준다.Here, since the client device 200 knows the file path of the uploaded file, the file path is notified to the meta server 200.

S305 단계는 메타 서버(200)가 클라이언트 디바이스(100)가 요청한 다운로드 파일이 해당 파일 경로에 존재하고 있는지를 확인하는 단계이다.In step S305, the meta server 200 checks whether the download file requested by the client device 100 exists in the corresponding file path.

S306 단계는 메타 서버(200)가 해당 파일 경로에 클라이언트 디바이스(100)가 요청한 다운로드 파일이 존재하는 경우에 이를 클라이언트 디바이스(100)에 알려주는 단계이다.In step S306, when the download file requested by the client device 100 exists in the corresponding file path, the meta server 200 notifies the client device 100 of this.

S307 단계는 사용자가 사용자 인증값(PIN)을 클라이언트 디바이스(100)에 입력하는 단계이다.In step S307, a user inputs a user authentication value (PIN) to the client device 100.

S308 단계는 클라이언트 디바이스(100)가 사용자 인증값(PIN)의 해쉬값(H(PIN))을 SID를 이용하여 암호화하여 메타 서버(200)로 전송하는 단계이다.In step S308, the client device 100 encrypts the hash value H(PIN) of the user authentication value PIN using the SID and transmits it to the meta server 200.

S309 단계는 메타 서버(200)가 수신한 사용자 인증값(PIN)의 해쉬값(H(PIN))을 복호화하는 단계이다.Step S309 is a step of decoding the hash value H (PIN) of the user authentication value PIN received by the meta server 200.

메타 서버(200)는 SID 값을 이미 알고 있으므로 이를 이용하여 사용자 인증값(PIN)의 해쉬값(H(PIN))을 복호화할 수 있다.Since the meta server 200 already knows the SID value, it can decrypt the hash value H(PIN) of the user authentication value PIN using this.

S310 단계는 메타 서버(200)가 RefID와 H(PIN)값을 기반으로 FHV값을 추출하는 단계이다.In step S310, the meta server 200 extracts the FHV value based on the RefID and H(PIN) value.

메타 서버(200)가 다음의 [식2]을 이용하여 FHV값을 추출한다.The meta server 200 extracts the FHV value using the following [Equation 2].

[식2] FHV = D((RefID

UserID)

(H(PIN)))^H(PIN) [Equation 2] FHV = D((RefID

UserID)

(H(PIN))) ^H(PIN)

여기서, FHV는 RefID와 사용자 ID(UserID)의 XOR 연산처리한 결과값을 PIN의 해쉬값과 XOR 연산처리한 후, 이를 PIN의 해쉬값으로 암호화하여 생성한다.Here, the FHV is generated by performing the XOR operation result value of the RefID and the user ID (UserID) with the hash value of the PIN and the XOR operation, and then encrypting it with the hash value of the PIN.

S311 단계는 메타 서버(200)가 다운로드를 요청한 파일의 FHV를 스토리지 서버(300)로 전송하는 단계이다.In step S311, the meta server 200 transmits the FHV of the file requested to be downloaded to the storage server 300.

S312 단계는 스토리지 서버(300)가 클라이언트 디바이스(100)가 요청한 다운로드 파일에 해당하는 FHV가 스토리지 서버(300)에 존재하는지를 확인하는 단계이다.In step S312, the storage server 300 checks whether the FHV corresponding to the download file requested by the client device 100 exists in the storage server 300.

S313 단계는 스토리지 서버(300)는 클라이언트 디바이스(100)가 요청한 다운로드 파일에 해당하는 FHV가 스토리지 서버(300)에 존재하는 경우에, 이를 메타 서버(200)로 알려주는 단계이다.In step S313, when the FHV corresponding to the download file requested by the client device 100 exists in the storage server 300, the storage server 300 notifies the meta server 200.

S314 단계는 메타 서버(200)가 클라이언트 디바이스(100)에 다운로드 준비 상태임을 알려주는 단계이다.In step S314, the meta server 200 informs the client device 100 that it is in a download ready state.

S315 단계는 클라이언트 디바이스(100)가 스토리지 서버(300)로 해당 파일의 다운로드를 요청하는 단계이다.In step S315, the client device 100 requests the storage server 300 to download a corresponding file.

S316 단계는 클라이언트 디바이스(100)가 스토리지 서버(300)로부터 다운로드 파일의 다운로드를 수행하는 단계이다.Step S316 is a step in which the client device 100 downloads a download file from the storage server 300.

기존의 중복제거 환경에서의 프라이버시 이슈는 근본적으로 사용자와 파일이 매핑된 구조에서 발생한다. 실질적으로, 파일과 사용자의 매핑구조를 완전히 제거한다면 특정 파일에서 해당 파일을 업로드한 사용자 리스트나, 혹은 그 반대로 특정 사용자가 올린 파일 리스트를 확보할 수 없게 된다. 따라서 본 발명에서는 이러한 사용자-파일 리스트와 파일의 매핑관계를 원천적으로 차단하여 서버상에서의 메타데이터 및 파일구조 분석만을 통해서는 사용자를 유추할 수 없는 상술한 방법을 사용한다.The privacy issue in the existing deduplication environment arises from the structure in which users and files are mapped. In practice, if the mapping structure between files and users is completely removed, the list of users who uploaded the file from a specific file, or vice versa, cannot be obtained. Accordingly, the present invention uses the above-described method in which the user-file list and the mapping relationship between the files are fundamentally blocked so that the user cannot be inferred only through metadata and file structure analysis on the server.

즉, 본 발명에서 이용하는 방법은 상술한 바와 같이 PIN, RefID, FHV를 기반으로 사용자가 파일에 대한 업로드/다운로드 수행이 가능하다. 여기에서, PIN은 사용자만 알고 있는 값이며, RefID는 메타 서버(200)에서 소유한 값으로 PIN을 기반으로 구성된다. 또한, FHV 값은 파일의 해쉬 결과값에 한번 더 해쉬를 취한 이중해쉬값으로, FHV를 기반으로 RefID를 추측할 수 없으며, 반대로 RefID를 기반으로 FHV 및 PIN도 추측할 수도 없다.That is, in the method used in the present invention, as described above, a user can upload/download a file based on PIN, RefID, and FHV. Here, the PIN is a value that only the user knows, and the RefID is a value owned by the meta server 200 and is configured based on the PIN. In addition, the FHV value is a double hash value obtained by taking a hash one more time to the hash result value of the file, and the RefID cannot be inferred based on the FHV. Conversely, the FHV and PIN cannot be inferred based on the RefID.

본 발명에서 RefID는 메타 서버(200)에서 저장되고, FHV 는 스토리지 서버(300)에 저장된다.In the present invention, RefID is stored in the meta server 200, and the FHV is stored in the storage server 300.

여기서, RefID 를 복호화하여 사용자 ID(UserID) 및 FHV를 알기위해서는 사용자 인증값(PIN)의 해쉬값인 H(PIN)값이 필요하나, H(PIN)값은 메타 서버(200) 및 스토리지 서버(300)에 저장되지 않는다.Here, in order to know the user ID and FHV by decrypting the RefID, the H(PIN) value, which is the hash value of the user authentication value (PIN), is required, but the H(PIN) value is the meta server 200 and the storage server ( 300).

따라서, 메타 서버(200) 관리자는 RefID 값은 복호화할 수 없어 사용자 ID 및 FHV를 알 수 없다.Therefore, the meta server 200 administrator cannot decode the RefID value and thus cannot know the user ID and FHV.

상술한 특징을 가지는 본 발명은 다음과 같은 장점을 가지고 있다.The present invention having the above-described characteristics has the following advantages.

(1) 사용자-파일 정보의 완전한 분리를 통한 프라이버시 보장(1) Guaranteed privacy through complete separation of user-file information

기존의 중복제거 방식은 파일 암호화 기법을 적용하여 파일 자체에 대한 보안성은 확보하고 있으나, 사용자-파일 매핑구조는 그대로 노출하고 있다. 이러한 문제에 따라 메타정보 분석이 가능하며, 심각한 사용자 프라이버시 침해로 이어질 수 있다.The existing deduplication method secures security for the file itself by applying the file encryption technique, but exposes the user-file mapping structure as it is. Meta-information analysis is possible according to these problems, and it can lead to serious invasion of user privacy.

본 발명에서는 사용자와 파일의 정보가 완전히 분리된다. RefID는 메타 서버(200)에서 저장되고, FHV 값은 스토리지 서버(300)에서 저장된다. In the present invention, user and file information are completely separated. The RefID is stored in the meta server 200, and the FHV value is stored in the storage server 300.

RefID는 PIN에 의하여 XOR 처리 및 암호화된 값이며, 해당 암호화된 값을 메타 서버(200)의 관리자도 복호화할 수 없다. 복호화를 위해서는 H(PIN)값이 필요하나, 이 정보는 메타 서버(200)에 저장되지 않는다. RefID is a value that is XOR-processed and encrypted by a PIN, and an administrator of the meta server 200 cannot decrypt the encrypted value. An H(PIN) value is required for decryption, but this information is not stored in the meta server 200.

따라서, RefID를 기반으로 FHV를 추정할 수 없으며, 반대로 FHV를 기반으로 RefID를 알아낼 수도 없다. 만약 해당 정보를 완전히 구성하여 파일을 다운로드하려면 반드시 사용자가 알고 있는 PIN 값이 필요하다. 특히, RefID는 H(PIN)값을 키로 암호화되어 있어 분석을 더욱 어렵게 한다. 즉, PIN 값을 알지 못하면 서버에 있는 정보를 이용해서는 사용자 정보와 파일의 매핑관계를 재구성할 수 없으므로 사용자의 프라이버시는 안전하게 보호된다.Therefore, the FHV cannot be estimated based on the RefID, and conversely, the RefID cannot be found based on the FHV. If the information is completely configured and the file is downloaded, a PIN value known to the user is required. In particular, RefID is encrypted with H(PIN) value, making analysis more difficult. In other words, if the PIN value is not known, the user's privacy is secured because the mapping relationship between user information and files cannot be reconstructed using information in the server.

(2) 소유권 문제 해결(2) Resolving ownership issues

기존의 중복제거 기술은 파일의 소유권 문제를 파일에 대한 사용자 매핑구조를 기반으로 처리하는 경우가 일반적이다. 그러나, 이러한 방법은 구조적으로 프라이버시 침해 가능성이 있어 문제가 된다. The existing deduplication technology generally handles the problem of file ownership based on the user mapping structure for files. However, this method is structurally problematic because there is a possibility of privacy invasion.

본 발명에서 제안한 방식에서는 PIN값이 사용자의 특정 파일 소유권을 증명하는 단서가 된다. 만약, 권한이 없는 자가 파일에 대한 다운로드를 시도하는 경우에는 PIN를 알지 못하므로 RefID에서 정상적인 FHV를 추출할 수 없으며, 파일 다운로드 절차를 정상적으로 수행할 수 없다. 그러나 정상적인 사용자의 경우 업로드시에 적용한 PIN을 기반으로 RefID에서 FHV 값을 추출하여 다운로드를 수행할 수 있다. 이 경우, RefID로부터 해당 파일의 FHV를 추출하는 것은 PIN을 알고 있는 사용자만 가능하므로, 해당 사용자는 파일의 소유권 증명이 가능하다. 즉, 비밀정보는 PIN이며, 서버에는 비밀정보가 저장되어지지 않는다는 특성을 통하여 프라이버시를 보장한 상태에서 파일 소유권 문제를 해결할 수 있다.In the method proposed by the present invention, the PIN value serves as a clue to prove the user's ownership of a specific file. If an unauthorized person attempts to download a file, the PIN is not known, so the normal FHV cannot be extracted from RefID, and the file download procedure cannot be performed normally. However, in the case of a normal user, the FHV value can be extracted from RefID based on the PIN applied at the time of upload and downloaded. In this case, only the user who knows the PIN can extract the FHV of the file from the RefID, so the user can prove ownership of the file. That is, the confidential information is a PIN, and the problem of file ownership can be solved while ensuring privacy through the characteristic that the secret information is not stored in the server.

(3) 내부자 공격(3) Insider attack

기존의 중복제거 기술에서는 파일과 사용자간의 매핑구조를 메타정보로서 가지고 있다. 이러한 경우 열람 권한을 가지고 있는 관리자는 메타정보에 대한 수집이 가능하다는 측면에서 내부자 공격에 매우 취약하다. 즉, 클라우드 서버 관리자는 특정 파일에 대한 업로더 리스트, 혹은 특정 사용자의 전체 파일 리스트 등을 용이하게 파악할 수 있다. The existing deduplication technology has a mapping structure between files and users as meta information. In this case, the administrator with the browsing authority is very vulnerable to insider attacks in terms of being able to collect meta information. That is, the cloud server administrator can easily identify an uploader list for a specific file or a list of all files of a specific user.

본 발명에서는 내부자 공격에 의해 메타 서버(200)와 스토리지 서버(300) 전체가 노출된 경우를 가정하여도 공격자는 사용자의 파일 리스트(즉, 사용자 정보), 혹은 특정 파일에 대한 업로드 사용자 리스트를 추출할 수 없다. 각각의 서버에 저장된 정보는 RefID와 FHV이며, 해당 값 자체는 상호간 연결성을 가지고 있지 않다. 특히, RefID 값은 H(PIN)을 키로 암호화되어 있어 이는 메타 분석을 더욱 어렵게 한다. 따라서, 내부자에 의한 데이터 전수 노출이 발생하더라도 PIN 정보를 알지 못하면 파일과 사용자간 관계를 알 수 없으므로, 내부자 공격 문제를 해결할 수 있다.In the present invention, even assuming that the entire meta server 200 and the storage server 300 are exposed by an insider attack, the attacker extracts a user's file list (i.e., user information) or an upload user list for a specific file. Can not. The information stored in each server is RefID and FHV, and the values themselves do not have mutual connectivity. In particular, since the RefID value is encrypted with H (PIN) as a key, this makes meta-analysis more difficult. Therefore, even if the entire data is exposed by the insider, if the PIN information is not known, the relationship between the file and the user cannot be known, so that the insider attack problem can be solved.

(4) 스니핑 공격(4) sniffing attack

본 발명은 파일 업로드, 파일 매핑 및 파일 다운로드 수행 과정에서 클라이언트와 서버 간의 파라미터가 암호화되어 전송된다. In the present invention, parameters between a client and a server are encrypted and transmitted during file upload, file mapping, and file download.

이는 해커가 스니핑 공격을 수행하더라도 안전을 보장한다. 특히, 프로토콜 과정에서 전달되는 파라미터를 암호화하는 키로써, 메타 서버(200)가 생성하는 1회성 값인 SID를 사용한다. 따라서 해커가 스니핑에 따른 재연공격을 수행하더라도 서버와 클라이언트간 사전 공유된 PK를 알지 못하면 공격자는 파일에 대한 업로드 및 다운로드 프로토콜을 정상적으로 수행할 수 없게 된다.This guarantees safety even if a hacker performs a sniffing attack. In particular, the SID, which is a one-time value generated by the meta server 200, is used as a key for encrypting parameters transmitted in the protocol process. Therefore, even if a hacker performs a replay attack based on sniffing, if the attacker does not know the PK shared in advance between the server and the client, the attacker cannot normally perform the upload and download protocol for the file.

(5) 무결성 측면(5) Integrity aspect

본 발명에서는 파일의 무결성을 보장하기 위해서 서버측에서의 검증 절차를 한번 더 거치게 된다. FHV값은 파일의 해쉬값에 대한 재해쉬값인 이중 해쉬값이며, 서버 측에서는 업로드된 파일에 대한 두번의 해쉬 결과값과 FHV값이 일치하는지 여부를 확인하여 업로드된 파일의 무결성 여부를 확인할 수 있다. 즉, FHV값은 다운로드 프로토콜 수행 과정에서 파일 소유권 증명에 활용됨과 동시에 업로드된 파일에 대한 무결성 검증의 역할도 수행할 수 있다는 특징이 있다. 특히, 스토리지 서버(300)에 저장된 파일에 대한 변조 여부를 감지할 수 있다는 장점도 존재한다.In the present invention, in order to ensure the integrity of the file, a verification procedure at the server side is performed once more. The FHV value is a double hash value, which is a disaster hash value for the hash value of the file, and the server side can check whether the uploaded file is integrity by checking whether the two hash result values for the uploaded file and the FHV value match. That is, the FHV value is characterized in that it is used to prove ownership of the file during the execution of the download protocol, and at the same time, it can perform the role of verifying the integrity of the uploaded file. In particular, there is also an advantage in that it is possible to detect whether a file stored in the storage server 300 has been altered.

(6) 효율성 측면(6) Efficiency aspect

중복제거 방식은 일반적으로 Merkle-Tree 기반의 파일 해쉬값을 생성한다. 파일 해쉬 생성 부분은 실질적으로 중복처리 과정에서 가장 시간이 많이 소요되는 부분이며, 본 발명에서는 파일 중복체크 시의 Merkle-Tree 생성 시간에 RefID 추출시간이 추가된다.In general, the deduplication method creates a file hash value based on Merkle-Tree. The file hash generation part is actually the part that takes the most time in the process of overlapping processing, and in the present invention, the RefID extraction time is added to the Merkle-Tree generation time during the file overlap check.

그러나, 본 발명은 기존의 중복제거 기술에 보안 기능을 더한 것으로, 실질적으로 처리시간은 기존의 중복제거 기술에 비하여 RefID 연산 등 보안처리에 필요한 시간이 추가되나, 실질적으로 기존의 Merkle Tree 기반의 중복제거 방식에 있어 현저한 성능저하를 보이지는 않으며, 프라이버시 보호 및 보안 기능을 가지고 있다는 장점을 가지고 있으므로, 기존 중복제거 방식에 별개의 보안성에 관한 방법을 추가로 수행하는 것에 비하여 효율성의 측면에서 장점을 가지고 있다.However, in the present invention, a security function is added to the existing deduplication technology, and the processing time is substantially added to the time required for security processing such as RefID operation compared to the existing deduplication technology. It does not show significant performance degradation in the removal method, and has the advantage of having privacy protection and security functions, so it has an advantage in terms of efficiency compared to additionally performing a separate security method to the existing deduplication method. have.

본 발명은 상기한 특정의 바람직한 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능한 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위 내에 있게 되는 것임은 자명하다.The present invention is not limited to the specific preferred embodiments described above, and any person having ordinary knowledge in the technical field to which the present invention pertains without departing from the gist of the present invention claimed in the claims can implement various modifications Of course, it is obvious that such a change will fall within the scope of the description of the claims.

100 : 클라이언트 디바이스
200 : 메타 서버
300 : 스토리지 서버100: client device
200: meta server
300: storage server

Claims

Includes a client device that generates a RefID by receiving a PIN from a user and generates a RefID and generates an FHV by double hashing a file stored in the cloud, a meta server that receives and stores the RefID, and a storage server that receives and stores the FHV. In the file deduplication method for privacy protection in a cloud environment,
a) generating an SID by a meta server, encrypting the SID, and transmitting the SID to a client device;
b) decrypting the encrypted SID by a client device;
c) requesting, by a client device, a file path required for file upload to the meta server;
d) generating the file path by a meta server;
e) generating FHV by double hashing the uploaded file by the client device;
f) the client device encrypting the FHV with the SID and transmitting the encrypted data to the meta server;
g) decrypting the FHV encrypted with the SID by the meta server;
h) requesting whether the uploaded file is a registered file by transmitting the FHV to the storage server by the meta server; And
i) the storage server analyzes the FHV, checks whether or not the uploaded file is a registered file, and notifies the meta server and the client device;
File deduplication method for privacy protection in a cloud environment that includes.

The method according to claim 1,
j) if the upload file is a file that is not registered, the client device uploading the upload file to the storage;
k) generating a RefID by receiving a PIN from the user at the client device;
l) the storage server checks the integrity of the stored upload file;
m) mapping the stored file and FHV by the storage server;
n) the client device transmitting the RefID to the meta server; And
o) mapping the stored RefID and the file path by the meta server and notifying the client device that the file upload has been completed;
File deduplication method for privacy protection in a cloud environment, characterized in that it further comprises.

The method according to claim 2,
The RefID is a file deduplication method for privacy protection in a cloud environment, characterized in that generated through the following (Equation 1).
(Equation 1) E(H(UserID)

(H(PIN)

FHV)) ^H(PIN)

Includes a client device that generates a RefID by receiving a PIN from a user and generates a RefID and generates an FHV by double hashing a file stored in the cloud, a meta server that receives and stores the RefID, and a storage server that receives and stores the FHV. In the file deduplication method for privacy protection in a cloud environment,
a) generating an SID by a meta server, encrypting the SID, and transmitting the SID to a client device;
b) decrypting the encrypted SID by a client device;
c) requesting, by a client device, to download a file for a specific path to the meta server;
d) checking, by the meta server, whether the download file exists in the specific path;
e) receiving, by a client device, a PIN from a user, encrypting the hash value of the PIN with SID, and transmitting it to the meta server;
f) the meta server decrypting the hash value of the PIN through decryption;
g) decoding the FHV using the hash value of the RefID and the PIN by the meta server;
h) requesting whether a download file mapped to the FHV exists by transmitting the FHV to the storage server by the meta server;
i) the storage server notifying the meta server and the client device that a downloaded file mapped by analyzing the FHV exists; And
j) accessing the storage server by a client device to download a file mapped with the FHV;
File deduplication method for privacy protection in a cloud environment that includes.

The method of claim 4,
In the step g), the FHV is generated through the following (Equation 2). Method for removing duplicate files for privacy protection in a cloud environment, characterized in that.
(Equation 2) D((RefID

UserID)

(H(PIN))) ^H(PIN)