KR102306672B1

KR102306672B1 - Storage System Performing Data Deduplication and Operating Method of Storage System and Data Processing System

Info

Publication number: KR102306672B1
Application number: KR1020170031808A
Authority: KR
Inventors: 조성국; 안병영; 윤은진; 기양석; 장실완; 이석찬
Original assignee: 삼성전자주식회사
Priority date: 2016-11-23
Filing date: 2017-03-14
Publication date: 2021-09-29
Also published as: KR20180058169A

Abstract

데이터 중복 제거를 수행하는 스토리지 시스템, 스토리지 시스템 및 데이터 처리 시스템의 동작방법이 개시된다. 본 개시의 기술적 사상의 일측면에 따른 스토리지 시스템은, 데이터를 저장하는 스토리지 장치 및 호스트로부터 데이터와 함께 상기 데이터로부터 생성된 인덱스를 수신하고, 인덱스와 물리적 어드레스와의 맵핑 정보 및 상기 인덱스에 대응하는 참조 카운트를 저장하는 메모리를 포함하는 컨트롤러를 구비하고, 상기 컨트롤러는, 상기 호스트로부터 수신된 인덱스에 따라 상기 메모리로부터 독출된 맵핑 정보 또는 참조 카운트를 판단함으로써 상기 호스트로부터 수신된 데이터가 중복 데이터에 해당하는 지를 판단하고, 상기 수신된 데이터가 중복 데이터에 해당할 때 상기 참조 카운트를 업데이트함으로써 중복 제거 처리를 수행하는 것을 특징으로 한다.Disclosed are a storage system that performs data deduplication, a storage system, and a method of operating a data processing system. A storage system according to an aspect of the inventive concept receives an index generated from the data together with data from a storage device and a host for storing data, mapping information between the index and a physical address, and corresponding to the index a controller including a memory for storing a reference count, wherein the controller determines the mapping information read from the memory or the reference count according to the index received from the host so that the data received from the host corresponds to duplicate data Deduplication processing is performed by determining whether to do so, and updating the reference count when the received data corresponds to duplicate data.

Description

Storage System Performing Data Deduplication and Operating Method of Storage System and Data Processing System

본 개시의 기술적 사상은 스토리지 시스템에 관한 것으로서, 상세하게는 데이터 중복 제거를 수행하는 스토리지 시스템, 스토리지 시스템 및 데이터 처리 시스템의 동작방법에 관한 것이다.The technical idea of the present disclosure relates to a storage system, and more particularly, to a storage system for performing data deduplication, a storage system, and an operating method of the data processing system.

데이터 중복 제거(deduplication) 기술은, 스토리지 시스템에 저장될 데이터가 기존에 스토리지 시스템에 저장되어 있는지를 판단하고, 중복 저장이라고 판단되는 경우에는 중복된 데이터를 시스템에 저장하지 않고 이미 저장되어 있는 데이터로의 링크만을 관리함으로써 스토리지 공간을 효율적으로 사용할 수 있도록 하는 기술이다. 이와 같은 중복 제거 기술은 스토리지 시스템의 사용 효율을 향상할 수 있으므로 대용량 데이터를 위한 스토리지 시스템에 필요하다.Data deduplication technology determines whether data to be stored in the storage system is previously stored in the storage system, and when it is determined that the data is stored as redundant, the duplicate data is not stored in the system but is converted to already stored data. It is a technology that enables efficient use of storage space by managing only the links of Since such deduplication technology can improve the use efficiency of the storage system, it is required for a storage system for large-capacity data.

그러나, 중복 제거 기술을 이용하기 위해서는 데이터(또는, 해시 인덱스)와 이에 대응하는 데이터의 저장 위치(예컨대, 논리적/물리적 어드레스) 등 다양한 정보가 관리되어야 하며, 상기 중복 제거를 위한 정보의 관리에 필요한 자원이 증대되는 문제가 발생한다.However, in order to use the deduplication technology, various information such as data (or hash index) and a storage location (eg, logical/physical address) of data corresponding thereto must be managed, and it is necessary to manage the information for deduplication. The problem of increasing resources arises.

미국 특허등록공보 US 8,751,763US Patent Registration Publication US 8,751,763 미국 특허출원공개공보 US 2017/0249327호US Patent Application Publication No. US 2017/0249327

본 발명의 기술적 사상이 해결하려는 과제는, 중복 제거에 관련된 정보의 관리 부담을 감소할 수 있는 스토리지 시스템, 스토리지 시스템 및 데이터 처리 시스템의 동작방법을 제공하는 데 있다.SUMMARY OF THE INVENTION An object of the present invention is to provide a storage system, a storage system, and an operating method of a data processing system capable of reducing a management burden of information related to deduplication.

상기와 같은 목적을 달성하기 위하여, 본 개시의 기술적 사상의 일측면에 따른 스토리지 시스템은, 데이터를 저장하는 스토리지 장치 및 호스트로부터 데이터와 함께 상기 데이터로부터 생성된 인덱스를 수신하고, 인덱스와 물리적 어드레스와의 맵핑 정보 및 상기 인덱스에 대응하는 참조 카운트를 저장하는 메모리를 포함하는 컨트롤러를 구비하고, 상기 컨트롤러는, 상기 호스트로부터 수신된 인덱스에 따라 상기 메모리로부터 독출된 맵핑 정보 또는 참조 카운트를 판단함으로써 상기 호스트로부터 수신된 데이터가 중복 데이터에 해당하는 지를 판단하고, 상기 수신된 데이터가 중복 데이터에 해당할 때 상기 참조 카운트를 업데이트함으로써 중복 제거 처리를 수행하는 것을 특징으로 한다.In order to achieve the above object, a storage system according to an aspect of the technical idea of the present disclosure receives an index generated from the data together with data from a storage device and a host for storing data, and includes an index and a physical address a controller comprising a memory for storing mapping information of and a reference count corresponding to the index, wherein the controller determines the mapping information or the reference count read from the memory according to the index received from the host, whereby the host It is characterized in that by determining whether the received data corresponds to duplicate data, and by updating the reference count when the received data corresponds to duplicate data, deduplication processing is performed.

한편, 본 개시의 기술적 사상의 일측면에 따른 스토리지 시스템의 동작방법은, 호스트로부터 제1 데이터 및 상기 제1 데이터로부터 생성된 제1 인덱스를 수신하는 단계와, 상기 수신된 제1 인덱스가 상기 스토리지 시스템에 기존에 저장된 데이터에 대응하는 인덱스와 동일한지를 판단하는 단계와, 상기 제1 인덱스와 동일한 인덱스가 존재할 때, 데이터 중복 제거에 따라 상기 제1 데이터를 기록함이 없이 상기 스토리지 시스템 내부에 기 저장된 참조 카운트를 업데이트하는 단계 및 상기 호스트로 상기 업데이트된 참조 카운트를 제공하는 단계를 구비하는 것을 특징으로 한다.Meanwhile, in an operating method of a storage system according to an aspect of the technical concept of the present disclosure, receiving first data and a first index generated from the first data from a host; Determining whether the index is the same as the index corresponding to the data previously stored in the system, and when the same index as the first index exists, the reference stored in the storage system in advance without writing the first data according to data deduplication updating a count and providing the updated reference count to the host.

한편, 본 개시의 기술적 사상의 일측면에 따른 데이터 처리 시스템의 동작방법에 있어서, 상기 데이터 처리 시스템은 스토리지 시스템을 포함하고, 외부 시스템으로부터의 데이터를 이용하여 생성된 인덱스와 상기 데이터의 저장 위치를 나타내는 물리적 어드레스의 맵핑 정보를 상기 스토리지 시스템 내에 저장하는 단계와, 데이터 및 상기 데이터에 대응하는 인덱스를 포함하는 기록 요청을 상기 스토리지 시스템에서 수신하는 단계와, 상기 스토리지 시스템 내에서, 상기 기록 요청이 중복 데이터에 대한 기록 요청인지를 판단하는 단계 및 상기 기록 요청이 중복 데이터에 대한 기록 요청인 것으로 판단될 때, 상기 스토리지 시스템 내에 저장된 참조 카운트를 업데이트함으로써 중복 제거 처리를 수행하는 단계를 구비하는 것을 특징으로 한다.On the other hand, in the operating method of the data processing system according to an aspect of the technical idea of the present disclosure, the data processing system includes a storage system, an index generated using data from an external system and a storage location of the data Storing mapping information of the indicated physical address in the storage system; receiving a write request including data and an index corresponding to the data in the storage system; in the storage system, the write request is duplicated determining whether the write request for data is a write request; and when it is determined that the write request is a write request for redundant data, performing deduplication processing by updating a reference count stored in the storage system. do.

본 발명의 기술적 사상의 스토리지 시스템, 스토리지 시스템 및 데이터 처리 시스템의 동작방법에 따르면, 데이터 중복 제거와 관련하여 관리되는 정보의 양을 감소함으로써, 정보의 저장을 위한 메모리를 감소할 수 있으며, 또한 정보 처리 속도를 향상할 수 있는 효과가 있다.According to the operating method of the storage system, the storage system, and the data processing system of the technical idea of the present invention, by reducing the amount of information managed in relation to data deduplication, it is possible to reduce the memory for storing information, and also It has the effect of improving the processing speed.

또한, 본 발명의 기술적 사상의 스토리지 시스템, 스토리지 시스템 및 데이터 처리 시스템의 동작방법에 따르면, 중복 제거 판단 및 데이터 충돌 판단의 주체를 다양하게 설정할 수 있으므로, 시스템 운용 방식을 다양화할 수 있는 효과가 있다. In addition, according to the operating method of the storage system, the storage system, and the data processing system of the technical idea of the present invention, since the subject of the determination of deduplication and the determination of data collision can be set in various ways, there is an effect that the method of operating the system can be diversified. .

도 1은 본 발명의 예시적인 실시예에 따른 데이터 처리 시스템을 나타내는 블록도이다.
도 2 및 도 3은 데이터 처리 시스템의 구체적인 구현 예들을 나타내는 블록도이다.
도 4는 본 발명의 데이터 처리 시스템의 호스트에서 수행되는 기능들의 일 예를 나타내는 블록도이다.
도 5는 본 발명의 일 실시예에 따른 스토리지 시스템의 구현 예를 나타내는 블록도이다.
도 6은 도 5의 동작 메모리에 저장되는 각종 모듈들의 일 예를 나타내는 블록도이다.
도 7a,b는 본 발명의 실시예들에 따라 호스트 및 스토리지 시스템에서 관리되는 정보의 예를 나타내는 도면이다.
도 8은 본 발명의 실시예에 따른 데이터 처리 시스템의 데이터 기록 및 독출 동작의 예를 나타내는 블록도이다.
도 9는 본 발명의 예시적인 실시예에 따른 호스트의 동작 방법을 나타내는 플로우차트이다.
도 10은 본 발명의 예시적인 실시예에 따른 스토리지 시스템의 동작 방법을 나타내는 플로우차트이다.
도 11 내지 도 18은 본 발명의 데이터 처리 시스템에서 호스트와 스토리지 시스템 사이의 통신 예들을 나타내는 도면이다.
도 19는 본 발명의 실시 예에 따른 서버 시스템을 포함하는 네트워크 시스템을 나타내는 블록도이다.1 is a block diagram illustrating a data processing system according to an exemplary embodiment of the present invention.
2 and 3 are block diagrams illustrating specific implementation examples of a data processing system.
4 is a block diagram illustrating an example of functions performed by a host of the data processing system of the present invention.
5 is a block diagram illustrating an implementation example of a storage system according to an embodiment of the present invention.
6 is a block diagram illustrating an example of various modules stored in the operation memory of FIG. 5 .
7A and 7B are diagrams illustrating examples of information managed by a host and a storage system according to embodiments of the present invention.
8 is a block diagram illustrating an example of data write and read operations of the data processing system according to an embodiment of the present invention.
9 is a flowchart illustrating a method of operating a host according to an exemplary embodiment of the present invention.
10 is a flowchart illustrating a method of operating a storage system according to an exemplary embodiment of the present invention.
11 to 18 are diagrams illustrating examples of communication between a host and a storage system in the data processing system of the present invention.
19 is a block diagram illustrating a network system including a server system according to an embodiment of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 실시 예에 대해 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 예시적인 실시예에 따른 데이터 처리 시스템을 나타내는 블록도이다.1 is a block diagram illustrating a data processing system according to an exemplary embodiment of the present invention.

도 1을 참조하면, 데이터 처리 시스템(10)은 호스트(100) 및 스토리지 시스템(200)을 포함할 수 있다. 또한, 스토리지 시스템(200)은 컨트롤러(210) 및 스토리지 장치(220)를 포함할 수 있다. 또한, 본 발명의 예시적인 실시예에 따라, 호스트(100)는 인덱스 생성기(110)를 포함할 수 있으며, 스토리지 시스템(200)의 컨트롤러(210)는 인덱스 테이블(211)을 포함할 수 있다. 도 1의 예에서는 인덱스 테이블(211)이 컨트롤러(210) 내에 구비되는 것으로 도시되었으나, 본 발명의 실시예는 이에 국한될 필요가 없다. 일 예로, 인덱스 테이블(211)은 스토리지 시스템(200) 내에서 컨트롤러(210)의 외부에 배치되는 메모리에 저장되어도 무방하다.Referring to FIG. 1 , a data processing system 10 may include a host 100 and a storage system 200 . Also, the storage system 200 may include a controller 210 and a storage device 220 . Also, according to an exemplary embodiment of the present invention, the host 100 may include the index generator 110 , and the controller 210 of the storage system 200 may include the index table 211 . In the example of FIG. 1 , the index table 211 is illustrated as being provided in the controller 210 , but the embodiment of the present invention is not limited thereto. For example, the index table 211 may be stored in a memory disposed outside the controller 210 in the storage system 200 .

데이터 처리 시스템(10)은 외부의 시스템(예컨대, 컴퓨팅 노드)으로부터의 요청에 따라 데이터를 저장하기 위한 저장 매체들을 포함할 수 있다. 일 예로서, 스토리지 시스템(200)은 하나 이상의 솔리드 스테이트 드라이브(Solid State Drive, SSD)를 포함할 수 있다. 스토리지 시스템(200)이 솔리드 스테이트 드라이브를 포함하는 경우, 스토리지 시스템(200)은 데이터를 불휘발성하게 저장하는 다수 개의 플래시 메모리 칩들(예컨대, NAND 메모리 칩들)을 포함할 수 있다. 또는, 스토리지 시스템(200)은 하나의 플래시 메모리 장치에 해당할 수 있다. 또는, 스토리지 시스템(200)은 하나 이상의 플래시 메모리 칩들을 포함하는 메모리 카드에 해당할 수 있다.The data processing system 10 may include storage media for storing data according to a request from an external system (eg, a computing node). As an example, the storage system 200 may include one or more solid state drives (SSDs). When the storage system 200 includes a solid state drive, the storage system 200 may include a plurality of flash memory chips (eg, NAND memory chips) that nonvolatilely store data. Alternatively, the storage system 200 may correspond to one flash memory device. Alternatively, the storage system 200 may correspond to a memory card including one or more flash memory chips.

스토리지 시스템(200)이 플래시 메모리를 포함할 때, 상기 플래시 메모리는 2D NAND 메모리 어레이나 3D(또는 수직형, Vertical) NAND(VNAND) 메모리 어레이를 포함할 수 있다. 상기 3D 메모리 어레이는 실리콘 기판 위에 배치되는 활성 영역을 가지는 메모리 셀들의 어레이들, 또는 상기 메모리 셀들의 동작과 관련된 회로로서 상기 기판상에 또는 상기 기판 내에 형성된 회로의 적어도 하나의 물리적 레벨에 모놀리식으로 형성된다. 상기 용어 "모놀리식”은 상기 어레이를 구성하는 각 레벨의 층들이 상기 어레이 중 각 하부 레벨의 층들의 바로 위에 적층되어 있음을 의미한다.When the storage system 200 includes a flash memory, the flash memory may include a 2D NAND memory array or a 3D (or vertical) NAND (VNAND) memory array. The 3D memory array is a monolithic array of memory cells having an active area disposed over a silicon substrate, or at least one physical level of circuitry formed on or within the substrate as circuitry associated with the operation of the memory cells. is formed with The term "monolithic" means that the layers of each level constituting the array are stacked directly on top of the layers of each lower level of the array.

본 발명의 기술적 사상에 의한 일 실시예에서, 상기 3D 메모리 어레이는 적어도 하나의 메모리 셀이 다른 메모리 셀의 위에 위치하도록 수직 방향으로 배치된 Vertical NAND 스트링들을 포함한다. 상기 적어도 하나의 메모리 셀은 전하 트랩층을 포함할 수 있다. In an embodiment of the inventive concept, the 3D memory array includes vertical NAND strings arranged in a vertical direction such that at least one memory cell is positioned on top of another memory cell. The at least one memory cell may include a charge trap layer.

미국 특허공개공보 제7,679,133호, 동 제8,553,466호, 동 제8,654,587호, 동 제8,559,235호, 및 미국 특허출원공개공보 제2011/0233648호는 3D 메모리 어레이가 복수 레벨로 구성되고 워드 라인들 및/또는 비트 라인들이 레벨들간에 공유되어 있는 3D 메모리 어레이에 대한 적절한 구성들을 상술하는 것들로서, 본 명세서에 인용 형식으로 결합된다.US Patent Application Publication Nos. 7,679,133, 8,553,466, 8,654,587, 8,559,235, and US 2011/0233648 disclose that a 3D memory array is constructed in multiple levels and contains word lines and/or Those detailing suitable configurations for a 3D memory array in which bit lines are shared between levels are incorporated herein by reference.

다른 예로서, 스토리지 시스템(200)은 다른 다양한 종류의 메모리들을 포함할 수도 있다. 예를 들어, 스토리지 시스템(200)은 불휘발성 메모리를 포함할 수 있으며, 불휘발성 메모리는 MRAM(Magnetic RAM), 스핀전달토크 MRAM(Spin-Transfer Torgue MRAM), Conductive bridging RAM(CBRAM), FeRAM(Ferroelectric RAM), PRAM(Phase RAM), 저항 메모리(Resistive RAM), 나노튜브 RAM(Nanottube RAM), 폴리머 RAM(Polymer RAM: PoRAM), 나노 부유 게이트 메모리(Nano Floating Gate Memory: NFGM), 홀로그래픽 메모리(holographic memory), 분자 전자 메모리 소자(Molecular Electronics Memory) 또는 절연 저항 변화 메모리(Insulator Resistance Change Memory) 등 다양한 종류의 메모리가 적용될 수 있다.As another example, the storage system 200 may include other various types of memories. For example, the storage system 200 may include a nonvolatile memory, which includes a magnetic RAM (MRAM), a spin-transfer torque MRAM (MRAM), a conductive bridging RAM (CBRAM), and a FeRAM (FeRAM). Ferroelectric RAM), PRAM (Phase RAM), Resistive RAM, Nanotube RAM, Polymer RAM (PoRAM), Nano Floating Gate Memory (NFGM), Holographic Memory Various types of memories, such as a holographic memory, a molecular electronic memory, or an insulator resistance change memory, may be applied.

호스트(100)는 데이터 처리 시스템(10) 내에서 데이터의 관리 동작을 수행할 수 있다. 일 예로서, 호스트(100)는 스토리지 시스템(200)으로 데이터의 기록 또는 독출 요청을 제공할 수 있다. 또한, 호스트(100)로부터의 데이터 소거 요청에 따라, 스토리지 시스템(200)은 호스트(100)로부터 지시되는 영역의 데이터에 대한 소거 동작을 수행할 수 있다. The host 100 may perform a data management operation in the data processing system 10 . As an example, the host 100 may provide a request to write or read data to the storage system 200 . Also, in response to a data erase request from the host 100 , the storage system 200 may perform an erase operation on data in an area indicated by the host 100 .

호스트(100)는 다양한 인터페이스를 통하여 스토리지 시스템(200)과 통신할 수 있다. 호스트(100)는 스토리지 시스템(200)에 대한 데이터 억세스를 수행할 수 있는 다양한 종류의 장치를 포함할 수 있다. 예컨대, 호스트(100)는 플래시 메모리 기반의 스토리지 시스템(200)과 통신하는 어플리케이션 프로세서(Application Processor, AP)일 수 있다. 일 실시예에 따라, 호스트(100)는 USB(Universal Serial Bus), MMC(MultiMediaCard), PCI-E(PCIExpress), ATA(AT Attachment), SATA(Serial AT Attachment), PATA(Parallel AT Attachment), SCSI(Small Computer System Interface), SAS(Serial Attached SCSI), ESDI(Enhanced Small Disk Interface), IDE(Integrated Drive Electronics) 등과 같은 다양한 인터페이스를 통해 스토리지 시스템(200)과 통신할 수 있다. The host 100 may communicate with the storage system 200 through various interfaces. The host 100 may include various types of devices capable of accessing data to the storage system 200 . For example, the host 100 may be an application processor (AP) that communicates with the flash memory-based storage system 200 . According to an embodiment, the host 100 is a Universal Serial Bus (USB), MultiMediaCard (MMC), PCI-E (PCIExpress), ATA (AT Attachment), SATA (Serial AT Attachment), PATA (Parallel AT Attachment), It may communicate with the storage system 200 through various interfaces, such as a small computer system interface (SCSI), a serial attached SCSI (SAS), an enhanced small disk interface (ESD), and an integrated drive electronics (IDE).

일 실시예에 따라, 데이터 처리 시스템(10)은 데이터 중복 제거(deduplication) 기술을 채용할 수 있다. 중복 제거 기술이 적용되는 경우, 기록 요청된 데이터가 스토리지 시스템(200)에 기존에 저장된 데이터와 중복될 때(또는, 동일할 때), 기록 요청된 데이터를 중복하여 저장하지 않는 대신에 기존에 저장된 데이터의 링크만을 관리함으로써 기록 요청에 대한 처리가 완료될 수 있다. 이에 따라 스토리지 시스템(200)의 저장 공간이 효율적으로 이용될 수 있다. According to one embodiment, data processing system 10 may employ data deduplication techniques. When the deduplication technology is applied, when the data requested to be written is duplicated (or identical to) the data previously stored in the storage system 200, instead of storing the data requested to be written in duplicate, the previously stored data By managing only the link of data, the processing of the write request can be completed. Accordingly, the storage space of the storage system 200 may be efficiently used.

기록 요청된 데이터가 중복 데이터인지를 판단하기 위해, 스토리지 시스템(200) 내에서 인덱스 테이블이 관리될 수 있다. 일 실시예에 따라, 호스트(100)의 인덱스 생성기(110)는 기록 요청된 데이터에 대응하는 인덱스(Index)를 생성하고 이를 스토리지 시스템(200)으로 제공할 수 있다. 인덱스(Index)는 데이터(Data)를 식별할 수 있는 정보를 가질 수 있으며, 일 예로서 데이터(Data) 별로 고유한 값을 갖는 인덱스(Index)가 생성되어 스토리지 시스템(200)으로 제공될 수 있다.In order to determine whether the write-requested data is duplicate data, an index table may be managed in the storage system 200 . According to an embodiment, the index generator 110 of the host 100 may generate an index corresponding to the write-requested data and provide it to the storage system 200 . The index may have information for identifying the data, and as an example, an index having a unique value for each data may be generated and provided to the storage system 200 . .

인덱스 생성기(110)는 다양하게 구현될 수 있다. 예컨대, 인덱스 생성기(110)는 하드웨어적으로 구현되는 연산 회로를 포함할 수 있다. 또는, 인덱스 생성기(110)는 연산 기능을 수행하는 소프트웨어로 구현될 수도 있다. 일 실시예에 따라, 인덱스 생성기(110)는 데이터(Data)에 대한 해시 함수를 이용한 연산을 통해 해시 값(hash value)을 인덱스(Index)로서 산출하는 해시 엔진에 해당할 수 있다. 인덱스 생성기(110)가 해시 엔진에 해당하는 경우, 인덱스 생성기(110)는 GOST, HAVAL, MD2, MD4, MD5, PANAMA, RadioGatun, RIPEMD, RIPEMD-128/256, RIPEMD-160, RIPEMD-320, SHA-0, SHA-1, SHA-256/224, SHA-512/384, SHA-3, WHIRLPOOL 등 다양한 해시 알고리즘을 이용하여 해시 값을 산출할 수 있다.The index generator 110 may be implemented in various ways. For example, the index generator 110 may include an arithmetic circuit implemented in hardware. Alternatively, the index generator 110 may be implemented as software that performs an arithmetic function. According to an embodiment, the index generator 110 may correspond to a hash engine that calculates a hash value as an index through an operation using a hash function on data. If the index generator 110 corresponds to a hash engine, the index generator 110 is GOST, HAVAL, MD2, MD4, MD5, PANAMA, RadioGatun, RIPEMD, RIPEMD-128/256, RIPEMD-160, RIPEMD-320, SHA Hash values can be calculated using various hash algorithms such as -0, SHA-1, SHA-256/224, SHA-512/384, SHA-3, and WHIRLPOOL.

기록 요청시, 스토리지 시스템(200)은 데이터(Data)를 수신할 수 있으며, 또한 데이터(Data)를 식별하기 위한 정보로서 데이터(Data)로부터 생성된 인덱스(Index)를 함께 수신할 수 있다. 데이터 기록 동작시, 스토리지 시스템(200)의 컨트롤러(210)는 인덱스(Index)와 물리적 어드레스(예컨대, 물리적 블록 어드레스(PBA))를 맵핑시키고, 맵핑된 물리적 어드레스(PBA)에 해당하는 위치에 데이터(Data)를 저장할 수 있다. 또한, 인덱스(Index)와 물리적 어드레스(PBA)의 맵핑 정보는 인덱스 테이블(211)에 저장되어 관리될 수 있다.When a write request is made, the storage system 200 may receive data, and may also receive an index generated from the data as information for identifying the data. During a data write operation, the controller 210 of the storage system 200 maps an index and a physical address (eg, a physical block address (PBA)), and places data in a position corresponding to the mapped physical address (PBA). (Data) can be saved. Also, mapping information between the index and the physical address PBA may be stored and managed in the index table 211 .

일 실시예에 따라, 호스트(100)는 파일 단위의 데이터 관리나 스토리지 시스템(200)에 대한 논리적 어드레스의 생성을 위한 파일 시스템을 실행할 필요가 없이, 호스트(100)는 데이터(Data) 및 이에 대응하는 인덱스(Index)를 데이터 기록을 위한 정보로서 스토리지 시스템(200)으로 제공할 수 있다. 또한, 스토리지 시스템(200)이 플래시 메모리를 포함할 때, 컨트롤러(210)는 호스트(100)와 스토리지 장치(200) 사이의 인터페이스를 제공하기 위해 FTL(Flash Translation Layer)을 포함할 수 있으며, FTL를 통한 어드레스 맵핑 동작은 인덱스(Index)와 물리적 어드레스(PBA) 사이의 맵핑을 수행할 수 있다.According to an embodiment, the host 100 does not need to execute a file system for managing data in a file unit or for generating a logical address for the storage system 200 , and the host 100 can The corresponding index may be provided to the storage system 200 as information for data recording. Also, when the storage system 200 includes a flash memory, the controller 210 may include a Flash Translation Layer (FTL) to provide an interface between the host 100 and the storage device 200 , The address mapping operation through ? may perform mapping between an index (Index) and a physical address (PBA).

본 발명의 일 실시예에 따른 데이터 중복 제거에 관련된 동작 예를 설명하면 다음과 같다. An example of an operation related to data deduplication according to an embodiment of the present invention will be described as follows.

외부 시스템으로부터 데이터 기록이 요청됨에 따라, 호스트(100)는 외부 시스템으로부터 데이터(Data) 및 이에 대응하는 논리적 어드레스(미도시)를 수신할 수 있다. 또한, 인덱스 생성기(110)에 의해 생성된 인덱스(Index)가 호스트(100) 내에 저장될 수 있다. 일 예로서, 호스트(100)는 메모리(미도시)를 포함하고, 인덱스(Index)는 수신된 논리적 어드레스에 정렬되어 메모리에 트리 구조로서 저장될 수 있다.As data recording is requested from the external system, the host 100 may receive data and a corresponding logical address (not shown) from the external system. Also, an index generated by the index generator 110 may be stored in the host 100 . As an example, the host 100 may include a memory (not shown), and an index may be aligned with a received logical address and stored in the memory as a tree structure.

스토리지 시스템(200)은 호스트(100)로부터 데이터(Data) 및 인덱스(Index)를 수신하고, 인덱스(Index)를 이용하여 데이터의 중복 여부를 판단하며, 판단 결과에 따라 중복 제거(Deduplication) 처리를 수행할 수 있다. 일 예로서, 스토리지 시스템(200)은 호스트(100)로부터 수신된 인덱스(Index)가 스토리지 시스템(200) 내에 기존에 저장된 데이터에 대응하는 인덱스와 동일한지 여부를 비교하고, 이에 따라 데이터의 중복 여부를 판단할 수 있다.The storage system 200 receives data and an index from the host 100 , determines whether data is duplicated using the index, and performs deduplication processing according to the determination result. can be done As an example, the storage system 200 compares whether an index received from the host 100 is the same as an index corresponding to data previously stored in the storage system 200 , and thus whether data is duplicated. can be judged

인덱스(Index)를 이용한 중복 데이터 여부를 판단함에 있어서, 인덱스 테이블(211)에 기존에 수신된 인덱스(Index)가 저장된 경우, 수신된 인덱스(Index)와 인덱스 테이블(211)에 저장된 인덱스와의 비교 동작을 통해 데이터 중복 여부가 판단될 수 있다. 동일한 인덱스(Index)가 인덱스 테이블(211)에 존재하지 않는 경우, 스토리지 시스템(200)은 수신된 인덱스(Index)에 새로이 맵핑된 물리적 어드레스(PBA)가 지시하는 위치에 데이터(Data)를 저장할 수 있다. 반면에, 수신된 인덱스(Index)와 동일한 인덱스(Index)가 인덱스 테이블(211)에 존재하는 경우, 스토리지 시스템(200)은 수신된 데이터(Data)가 중복 데이터인 것으로 판단하고, 데이터(Data)를 스토리지 장치(220)에 저장하지 않음으로써 중복 제거 처리를 수행할 수 있다. In determining whether data is duplicated using the index, when the previously received index is stored in the index table 211 , the received index is compared with the index stored in the index table 211 . Data duplication may be determined through the operation. When the same index (Index) does not exist in the index table 211, the storage system 200 may store the data (Data) at a location indicated by the physical address (PBA) newly mapped to the received index (Index). have. On the other hand, when the same index as the received index Index exists in the index table 211 , the storage system 200 determines that the received data Data is duplicate data, and the data By not storing in the storage device 220, the deduplication process may be performed.

중복 제거에 기반한 데이터 기록이 수행됨에 따라, 스토리지 시스템(200)에 기 저장된 데이터(Data)는 외부 시스템으로부터의 다수의 논리적 어드레스들에 의해 참조(또는, 억세스)될 수 있다. 일 실시예에 따라, 스토리지 시스템(200)은 기 저장된 데이터(Data)의 참조된 횟수를 관리하기 위해 카운트 정보(예컨대, 참조 카운트)를 더 저장할 수 있으며, 일 예로서 참조 카운트는 인덱스 테이블(211)에 저장되어 관리될 수 있다. 일 실시예에 따라, 호스트(100)로부터 기록 요청된 데이터(Data)가 중복 데이터에 해당하는 경우, 스토리지 시스템(200)은 참조 카운트에 대한 업데이트 동작을 수행하고, 중복 제거 처리가 수행되었음을 나타내는 정보와 함께 업데이트된 참조 카운트를 호스트(100)로 제공할 수 있다.As data recording based on deduplication is performed, data previously stored in the storage system 200 may be referenced (or accessed) by a plurality of logical addresses from an external system. According to an embodiment, the storage system 200 may further store count information (eg, a reference count) in order to manage the referenced number of times of previously stored data (Data). As an example, the reference count may be stored in the index table 211 ) can be stored and managed. According to an embodiment, when the data (Data) requested to be written from the host 100 corresponds to duplicate data, the storage system 200 performs an update operation on the reference count and information indicating that the deduplication process has been performed. together with the updated reference count may be provided to the host 100 .

변형 가능한 실시예로서, 물리적 어드레스(PBA) 및 참조 카운트는 인덱스 테이블(211) 내에서 인덱스(Index)에 정렬되어 저장될 수 있으며, 실시예에 따라 인덱스(Index)의 실제 정보는 인덱스 테이블(211)에 저장되지 않을 수 있다. 전술한 중복 데이터의 판단은 다양한 방식에 의해 수행될 수 있으며, 일 예로서 수신된 인덱스(Index)에 정렬되어 저장된 정보(예컨대, 물리적 어드레스 및/또는 참조 카운트)를 확인함으로써 중복 데이터 여부가 판단될 수도 있을 것이다. As a deformable embodiment, the physical address (PBA) and the reference count may be sorted and stored in an index (Index) in the index table 211, and according to an embodiment, actual information of the index (Index) may be stored in the index table 211 ) may not be saved. Determination of the above-described duplicate data may be performed by various methods, and as an example, whether duplicate data is determined by checking information (eg, physical address and/or reference count) stored in alignment with a received index. it might be

상기와 같은 실시예에 따르면, 호스트(100)는 데이터(Data)를 연산 처리함으로써 생성된 인덱스(Index)를 데이터(Data)의 기록/독출을 위한 정보로서 직접 스토리지 시스템(200)으로 제공할 수 있으므로, 데이터 중복 제거를 위해 관리되는 정보의 양이 감소될 수 있으며, 또한 호스트(100) 내에서 정보를 저장하기 위한 메모리 공간이 감소될 수 있다. 예컨대, 일반적인 경우에서 호스트(100)는 파일 시스템 등의 운용을 통해 인덱스(Index)와 파일 ID(또는, 스토리지 시스템(200)에 대한 논리적 블록 어드레스)와의 맵핑 정보를 별도로 관리할 필요가 있으나, 본 발명의 실시예에 따르면 호스트(100)는 파일 시스템의 실행이나 실행 결과에 따른 정보를 저장 및 관리함이 없이 데이터 중복 제거가 처리될 수 있다.According to the above embodiment, the host 100 may directly provide the index generated by processing the data to the storage system 200 as information for writing/reading the data. Therefore, the amount of information managed for data deduplication may be reduced, and a memory space for storing information in the host 100 may be reduced. For example, in a general case, the host 100 needs to separately manage mapping information between an index and a file ID (or a logical block address for the storage system 200 ) through operation of a file system, etc. According to an embodiment of the present invention, data deduplication may be processed by the host 100 without storing and managing information according to the execution of the file system or the execution result.

도 2 및 도 3은 데이터 처리 시스템의 구체적인 구현 예들을 나타내는 블록도이다. 이하의 실시예들에서는, 전술한 인덱스 생성기는 해시 값을 생성하는 해시 엔진에 해당하고, 인덱스는 해시 인덱스(Hash Index)인 것으로 가정된다. 또한, 데이터 처리 시스템은 외부로부터의 데이터를 압축하여 저장하는 것으로 가정될 것이나, 본 발명의 실시예들은 이에 국한될 필요는 없을 것이다.2 and 3 are block diagrams illustrating specific implementation examples of a data processing system. In the following embodiments, it is assumed that the above-described index generator corresponds to a hash engine that generates a hash value, and that the index is a hash index. In addition, it will be assumed that the data processing system compresses and stores data from the outside, but embodiments of the present invention need not be limited thereto.

도 2를 참조하면, 데이터 처리 시스템(300A)은 호스트(310A)와 스토리지 시스템(320A)을 포함할 수 있으며, 외부의 시스템(예컨대, 외부 컴퓨팅 노드)으로부터 데이터(Data) 및 대응하는 논리적 어드레스(LBA)를 수신할 수 있다. 외부의 시스템으로부터 제1 내지 제3 논리적 어드레스(LBA 1 ~ LBA 3)와 이에 대응하는 데이터(Data)가 기존에 제공된 것으로 가정할 때, 제1 내지 제3 논리적 어드레스(LBA 1 ~ LBA 3)와 제1 내지 제3 해시 인덱스(Hash Index 1 ~ Hash Index 3)의 맵핑 정보가 호스트(310A)에 저장되고, 제1 내지 제3 해시 인덱스(Hash Index 1 ~ Hash Index 3)와 제1 내지 제3 물리적 어드레스(PBA 1 ~ PBA 3)의 맵핑 정보가 스토리지 시스템(320A)에 저장될 수 있다. 이후, 외부의 시스템으로부터 제4 논리적 어드레스(LBA 4) 및 이에 대응하는 데이터(Data)가 제공되고, 데이터(Data)로부터 제4 해시 인덱스(Hash Index 4)가 생성되는 것으로 가정한다.Referring to FIG. 2 , the data processing system 300A may include a host 310A and a storage system 320A, and may include data from an external system (eg, an external computing node) and corresponding logical addresses ( LBA) can be received. Assuming that the first to third logical addresses LBA 1 to LBA 3 and corresponding data Data have been previously provided from an external system, the first to third logical addresses LBA 1 to LBA 3 and Mapping information of the first to third hash indexes (Hash Index 1 to Hash Index 3) is stored in the host 310A, and the first to third hash indexes (Hash Index 1 to Hash Index 3) and the first to third Mapping information of the physical addresses PBA 1 to PBA 3 may be stored in the storage system 320A. Thereafter, it is assumed that a fourth logical address LBA 4 and corresponding data Data are provided from an external system, and a fourth hash index 4 is generated from the data Data.

호스트(310A)는 인덱스 생성기로서 해시 엔진(311A)을 포함할 수 있으며, 또한 생성된 해시 인덱스들을 저장하는 메모리(312A)를 포함할 수 있다. 해시 엔진(311A)는 소정 사이즈의 데이터(Data)에 대응하여 다수의 비트들을 갖는 해시 인덱스를 생성할 수 있다. 상기 데이터(Data) 및 해시 인덱스의 사이즈는 다양하게 정의될 수 있으며, 일 예로서 4 kB의 데이터(Data)로부터 128 비트(bit)의 해시 인덱스가 생성될 수 있다.The host 310A may include a hash engine 311A as an index generator, and may also include a memory 312A that stores the generated hash indexes. The hash engine 311A may generate a hash index having a plurality of bits corresponding to data of a predetermined size. The size of the data (Data) and the hash index may be defined in various ways, and as an example, a hash index of 128 bits may be generated from 4 kB of data (Data).

일 실시예에 따라, 제1 내지 제3 해시 인덱스(Hash Index 1 ~ Hash Index 3)는 제1 내지 제3 논리적 어드레스(LBA 1 ~ LBA 3)에 정렬되어 메모리(312A)에 저장될 수 있다. 외부 시스템으로부터 데이터 기록 또는 독출을 위한 논리적 어드레스(LBA)가 수신되는 경우, 논리적 어드레스(LBA)에 대응하여 기 저장된 해시 인덱스(Hash Index)가 메모리(312A)로부터 독출될 수 있다.According to an embodiment, the first to third hash indexes (Hash Index 1 to Hash Index 3) may be arranged in the first to third logical addresses LBA 1 to LBA 3 and stored in the memory 312A. When a logical address LBA for writing or reading data is received from an external system, a pre-stored hash index corresponding to the logical address LBA may be read from the memory 312A.

또한, 스토리지 시스템(320A)은 메모리(321A)와 데이터(Data)를 압축하는 압축기(322A)를 포함할 수 있다. 일 예로서, 스토리지 시스템(320A)은 수신된 해시 인덱스에 대응하는 물리적 어드레스(PBA)를 생성할 수 있으며, 메모리(321A)는 해시 인덱스와 물리적 어드레스(PBA)의 맵핑 정보를 테이블 형태로 저장할 수 있다. 일 예로서, 제1 내지 제3 해시 인덱스(Hash Index 1 ~ Hash Index 3)와 제1 내지 제3 물리적 어드레스(PBA 1 ~ PBA 3)가 함께 메모리(321A)에 저장될 수 있다. 또는, 제1 내지 제3 물리적 어드레스(PBA 1 ~ PBA 3)는 제1 내지 제3 해시 인덱스(Hash Index 1 ~ Hash Index 3)에 정렬되어 메모리(321A)에 저장될 수 있으며, 이 때, 제1 내지 제3 해시 인덱스(Hash Index 1 ~ Hash Index 3)는 메모리(321A)에 실제 저장되지 않을 수 있다.Also, the storage system 320A may include a memory 321A and a compressor 322A for compressing data. As an example, the storage system 320A may generate a physical address PBA corresponding to the received hash index, and the memory 321A may store mapping information between the hash index and the physical address PBA in the form of a table. have. As an example, the first to third hash indexes Hash Index 1 to Hash Index 3 and the first to third physical addresses PBA 1 to PBA 3 may be stored together in the memory 321A. Alternatively, the first to third physical addresses PBA 1 to PBA 3 may be arranged in the first to third hash indexes Hash Index 1 to Hash Index 3 and stored in the memory 321A, at this time, The first to third hash indexes (Hash Index 1 to Hash Index 3) may not be actually stored in the memory 321A.

일 실시예에 따라, 메모리(321A)는 해시 인덱스에 대응하는 참조 카운트(Ref CNT)를 더 저장할 수 있다. 예컨대, 메모리(321A)는 제1 내지 제3 해시 인덱스(Hash Index 1 ~ Hash Index 3)에 대응하여 참조 카운트(Ref CNT 1 ~ Ref CNT 3)를 더 저장할 수 있다. 즉, 본 발명의 실시예에서는, 참조 카운트(Ref CNT)가 스토리지 시스템(320A) 내에서 관리될 수 있다. According to an embodiment, the memory 321A may further store a reference count Ref CNT corresponding to the hash index. For example, the memory 321A may further store reference counts Ref CNT 1 to Ref CNT 3 corresponding to the first to third hash indexes Hash Index 1 to Hash Index 3 . That is, in an embodiment of the present invention, the reference count Ref CNT may be managed in the storage system 320A.

데이터 중복 제거 동작의 일 예를 설명하면 다음과 같다.An example of the data deduplication operation will be described as follows.

호스트(310A)는 제4 논리적 어드레스(LBA 4) 및 이에 대응하는 데이터(Data)를 수신하고, 상기 해시 엔진(311A)은 데이터(Data)로부터 제4 해시 인덱스(Hash Index 4)를 생성할 수 있다. 생성된 제4 해시 인덱스(Hash Index 4)와 제4 논리적 어드레스(LBA 4)의 맵핑 정보가 메모리(312A)에 저장될 수 있다. 또한, 호스트(310A)는 데이터 기록 요청을 스토리지 시스템(320A)으로 제공함에 있어서, 제4 해시 인덱스(Hash Index 4) 및 대응하는 데이터(Data)를 스토리지 시스템(320A)으로 제공할 수 있다.The host 310A may receive the fourth logical address LBA 4 and data corresponding thereto, and the hash engine 311A may generate a fourth hash index 4 from the data Data. have. Mapping information between the generated fourth hash index 4 and the fourth logical address LBA 4 may be stored in the memory 312A. Also, in providing the data write request to the storage system 320A, the host 310A may provide a fourth hash index 4 and corresponding data Data to the storage system 320A.

스토리지 시스템(320A)은 제4 해시 인덱스(Hash Index 4)를 이용하여 중복 데이터 여부를 판단할 수 있다. 예컨대, 기존에 저장된 데이터들에 대응하는 제1 내지 제3 해시 인덱스(Hash Index 1 ~ Hash Index 3) 중 어느 하나가 제4 해시 인덱스(Hash Index 4)와 동일한 지 판단될 수 있다. 제4 해시 인덱스(Hash Index 4)와 동일한 해시 인덱스가 메모리(321A)에 존재하지 않는 경우, 스토리지 시스템(320A)은 제4 해시 인덱스(Hash Index 4)를 제4 물리적 어드레스(PBA 4)에 맵핑하고, 제4 물리적 어드레스(PBA 4)가 지시하는 위치에 데이터(User Data)를 저장할 수 있다. 또한, 제4 해시 인덱스(Hash Index 4)와 제4 물리적 어드레스(PBA 4)의 맵핑 정보는 메모리(321A)에 저장될 수 있다.The storage system 320A may determine whether there is duplicate data by using a fourth hash index (Hash Index 4). For example, it may be determined whether any one of the first to third hash indexes (Hash Index 1 to Hash Index 3) corresponding to the previously stored data is the same as the fourth hash index (Hash Index 4). If the same hash index as the fourth hash index (Hash Index 4) does not exist in the memory 321A, the storage system 320A maps the fourth hash index (Hash Index 4) to the fourth physical address (PBA 4) and data (User Data) may be stored at a location indicated by the fourth physical address PBA 4 . Also, mapping information between the fourth hash index 4 and the fourth physical address PBA 4 may be stored in the memory 321A.

반면에, 제4 해시 인덱스(Hash Index 4)가 어느 하나의 해시 인덱스(예컨대, 제1 해시 인덱스(Hash Index 1))와 동일한 것으로 판단될 때, 중복 제거 처리를 통해 데이터(Data)의 중복 저장이 스킵되고, 제1 해시 인덱스(Hash Index 1)에 대응하는 데이터(Data)가 제4 논리적 어드레스(LBA 4)에 의해 참조될 수 있다. 일 실시예에 따라, 중복 제거 처리가 수행되는 경우, 제1 해시 인덱스(Hash Index 1)에 대응하는 제1 참조 카운트(Ref CNT 1) 값이 업데이트될 수 있으며, 일 예로서 제1 참조 카운트(Ref CNT 1) 값이 1 만큼 증가될 수 있다. 또한, 일 실시예에 따라, 중복 제거 처리가 수행되었음을 나타내는 정보와 함께 업데이트된 제1 참조 카운트(Ref CNT 1)가 스토리지 시스템(320A)으로부터 호스트(310A)로 제공될 수 있다.On the other hand, when it is determined that the fourth hash index (Hash Index 4) is the same as any one hash index (eg, the first hash index (Hash Index 1)), redundant storage of data through deduplication processing This is skipped, and the data Data corresponding to the first hash index 1 may be referred to by the fourth logical address LBA 4 . According to an embodiment, when the deduplication process is performed, the first reference count (Ref CNT 1) value corresponding to the first hash index (Hash Index 1) may be updated, and as an example, the first reference count ( The Ref CNT 1) value may be increased by 1. Also, according to an embodiment, an updated first reference count Ref CNT 1 may be provided from the storage system 320A to the host 310A together with information indicating that the deduplication process has been performed.

한편, 도 3은 데이터(Data)를 압축하기 위한 압축기가 호스트에 구비되는 경우를 예시하며, 도 3에 도시된 바와 같이 데이터 처리 시스템(300B)은 호스트(310B)와 스토리지 시스템(320B)을 포함하며, 호스트(310B)는 해시 엔진(311B), 메모리(312B) 및 압축기(313B)를 포함할 수 있다. 또한, 스토리지 시스템(320B)은 해시 인덱스와 물리적 어드레스 사이의 맵핑 정보를 저장하는 메모리(321B)를 포함할 수 있다. 도 3에 도시된 데이터 처리 시스템(300B) 또한 전술한 실시예에서와 동일하게 데이터 중복 제거를 수행할 수 있으며, 일 예로서 스토리지 시스템(320B)은 호스트(310B)로부터 제공된 해시 인덱스를 이용하여 중복 데이터 여부를 판단하고, 판단 결과에 따라 데이터 중복 제거를 수행할 수 있다. Meanwhile, FIG. 3 illustrates a case in which a compressor for compressing data is provided in the host, and as shown in FIG. 3 , the data processing system 300B includes a host 310B and a storage system 320B. The host 310B may include a hash engine 311B, a memory 312B, and a compressor 313B. Also, the storage system 320B may include a memory 321B that stores mapping information between a hash index and a physical address. The data processing system 300B shown in FIG. 3 may also perform data deduplication in the same manner as in the above-described embodiment, and as an example, the storage system 320B may be duplicated using a hash index provided from the host 310B. It may be determined whether there is data, and data deduplication may be performed according to the determination result.

한편, 전술한 실시예들에 따라 각종 맵핑 정보를 저장하기 위한 메모리는 다양한 종류의 메모리로 구현될 수 있으며, 일 예로서 DRAM(dynamic random access memory), SRAM (static random access memory), T-RAM(thyristor RAM), Z-RAM(zero capacitor RAM), 또는 TTRAM(Twin Transistor RAM) 등의 휘발성 메모리로 구현될 수 있다.Meanwhile, according to the above-described embodiments, a memory for storing various types of mapping information may be implemented as various types of memory, and as an example, dynamic random access memory (DRAM), static random access memory (SRAM), and T-RAM. (thyristor RAM), Z-RAM (zero capacitor RAM), or TTRAM (Twin Transistor RAM) may be implemented as a volatile memory.

도 4는 본 발명의 데이터 처리 시스템의 호스트에서 수행되는 기능들의 일 예를 나타내는 블록도이다.4 is a block diagram illustrating an example of functions performed by a host of the data processing system of the present invention.

도 4를 참조하면, 데이터 처리 시스템(400)은 호스트(410) 및 스토리지 시스템(420)을 포함하고, 호스트(410)는 하드웨어적 및/또는 소프트웨어적으로 구현되는 다양한 모듈들을 포함할 수 있다. 예컨대, 호스트(410)는 RPC(Remote Procedure Calls) 모듈(411), 블록 서비스 모듈(412), 중복 제거 관리 모듈(413) 및 압축 모듈(414)을 포함할 수 있다. Referring to FIG. 4 , the data processing system 400 includes a host 410 and a storage system 420 , and the host 410 may include various modules implemented in hardware and/or software. For example, the host 410 may include a remote procedure calls (RPC) module 411 , a block service module 412 , a deduplication management module 413 , and a compression module 414 .

RPC 모듈(411)은 타 시스템(또는, 타 서버)와의 통신을 수행할 수 있으며, 일 예로 데이터 송수신을 위해 타 서버를 호출하는 기능을 수행할 수 있다. 또한, 블록 서비스 모듈(412)은 데이터의 관리를 블록 기반으로 처리하기 위한 기능을 수행할 수 있다. 또한, 중복 제거 관리 모듈(413)은 데이터 중복 제거를 위한 기능의 일부를 호스트(410) 단에서 수행하기 위해 구비될 수 있으며, 일 예로서 중복 제거 관리 모듈(413)은 데이터를 이용하여 해시 인덱스를 생성하는 해시 엔진을 포함할 수 있다. 또한, 중복 제거 관리 모듈(413)은 외부 시스템으로부터의 논리적 어드레스와 해시 인덱스와의 맵핑 정보를 저장하기 위한 메모리를 포함할 수 있다. 또한, 압축 모듈(414)은 데이터를 압축하여 스토리지 시스템(420)으로 제공할 수 있다. The RPC module 411 may communicate with another system (or another server), and, for example, may perform a function of calling another server for data transmission/reception. Also, the block service module 412 may perform a function for processing data management based on a block. In addition, the deduplication management module 413 may be provided to perform a part of the function for data deduplication at the host 410 end, and as an example, the deduplication management module 413 uses the data to generate a hash index It may include a hash engine that generates Also, the deduplication management module 413 may include a memory for storing mapping information between a logical address and a hash index from an external system. Also, the compression module 414 may compress data and provide it to the storage system 420 .

본 발명의 실시예들에 따라, 데이터의 저장 및 독출에 관련된 정보로서 해시 인덱스가 직접 스토리지 시스템(420)으로 제공될 수 있다. 이에 따라, 데이터를 파일 기준으로 관리하기 위한 파일 시스템이나, 압축을 통해 사이즈가 변동되는 데이터를 논리 블록에 해당하는 사이즈에 따라 관리하기 위한 블록 레이어가 호스트(410)에서 실행될 필요가 없다. 즉, 데이터 중복 제거와 관련하여 필요로 되는 호스트(410)의 메모리 자원이 감소될 수 있으며, 또한 파일 시스템 및/또는 블록 레이어의 기능 중 적어도 일부를 실행하지 않을 수 있으므로 시스템 성능이 향상될 수 있다. According to embodiments of the present invention, a hash index may be directly provided to the storage system 420 as information related to storage and reading of data. Accordingly, a file system for managing data on a file basis or a block layer for managing data whose size is changed through compression according to a size corresponding to a logical block does not need to be executed in the host 410 . That is, memory resources of the host 410 required in connection with data deduplication may be reduced, and system performance may be improved because at least some of the functions of the file system and/or block layer may not be executed. .

도 5는 본 발명의 일 실시예에 따른 스토리지 시스템의 구현 예를 나타내는 블록도이다. 스토리지 시스템은 컨트롤러와 스토리지 장치를 포함할 수 있으며, 도 5에 도시된 구성은 상기 컨트롤러의 일 구현 예에 해당한다.5 is a block diagram illustrating an implementation example of a storage system according to an embodiment of the present invention. The storage system may include a controller and a storage device, and the configuration shown in FIG. 5 corresponds to an implementation example of the controller.

도 5를 참조하면, 스토리지 시스템(500)은 프로세서로서 중앙 프로세싱 유닛(central processing unit, 510), 호스트 인터페이스(520), 메모리 인터페이스(530) 및 동작 메모리(working memory, 540)를 포함할 수 있다. 일 실시예에 따라, 스토리지 시스템(500)은 해시 엔진(550)을 더 포함할 수 있다. 또한, 동작 메모리(540)에는 인덱스 테이블(541)이 저장될 수 있으며, 변형 가능한 실시예로서 인덱스 테이블(541)은 스토리지 시스템(500) 내의 다른 메모리에 저장되어도 무방할 것이다.Referring to FIG. 5 , a storage system 500 as a processor may include a central processing unit 510 , a host interface 520 , a memory interface 530 , and a working memory 540 . . According to an embodiment, the storage system 500 may further include a hash engine 550 . Also, the index table 541 may be stored in the operation memory 540 , and as a deformable embodiment, the index table 541 may be stored in another memory in the storage system 500 .

중앙 프로세싱 유닛(510)은 동작 메모리(540)에 저장된 각종 프로그램들을 실행함으로써 스토리지 시스템(500)의 전반적인 동작을 제어할 수 있다. 동작 메모리(540)에는 스토리지 시스템(500)의 기능과 함께 데이터 중복 제거에 관련된 각종 프로그램들을 포함하는 소프트웨어들이 로딩될 수 있다. 동작 메모리(540)는 랜덤 액세스 메모리(RAM), 판독 전용 메모리(ROM), EEPROM(Electronically Erasable Programmable Read Only Memory), 플래시 메모리 또는 다른 메모리 기술 등으로 구현될 수 있다. The central processing unit 510 may control the overall operation of the storage system 500 by executing various programs stored in the operation memory 540 . Software including various programs related to data deduplication along with functions of the storage system 500 may be loaded into the operation memory 540 . The operative memory 540 may be implemented as random access memory (RAM), read-only memory (ROM), Electronically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other memory technology.

전술한 실시예에 따라, 호스트는 외부 시스템으로부터의 데이터에 대한 해시 인덱스를 생성하고 이를 스토리지 시스템(500)으로 제공할 수 있다. 스토리지 시스템(500)은 해시 엔진(550)를 추가로 구비할 수 있으며, 일 실시예에 따라 해시 엔진(550)은 호스트로부터 제공된 해시 인덱스나 데이터에 대한 해시 연산을 추가로 수행할 수 있다. 즉, 스토리지 시스템(500)에 해시 엔진(550)이 더 구비됨에 따라, 스토리지 시스템(500)에 저장되는 정보의 양이 더 감소될 수 있다. According to the above-described embodiment, the host may generate a hash index for data from an external system and provide it to the storage system 500 . The storage system 500 may further include a hash engine 550 , and according to an embodiment, the hash engine 550 may additionally perform a hash operation on the hash index or data provided from the host. That is, as the hash engine 550 is further provided in the storage system 500 , the amount of information stored in the storage system 500 may be further reduced.

본 발명의 보다 구체적인 구성 및 동작을 도 5 및 도 6을 참조하여 설명하면 다음과 같다. 도 6은 도 5의 동작 메모리에 저장되는 각종 모듈들의 일 예를 나타내는 블록도이다. 스토리지 시스템(500)이 플래시 메모리를 포함하는 경우, 도 6에 도시된 각종 모듈들은 FTL에 포함되는 구성인 것으로 정의될 수 있다.A more specific configuration and operation of the present invention will be described with reference to FIGS. 5 and 6 as follows. 6 is a block diagram illustrating an example of various modules stored in the operation memory of FIG. 5 . When the storage system 500 includes a flash memory, various modules illustrated in FIG. 6 may be defined as components included in the FTL.

동작 메모리(540)에는 전술한 인덱스 테이블(541)이 저장됨과 함께, 중앙 프로세싱 유닛(510)의 실행에 의해 각종 기능들을 수행하는 모듈들이 저장될 수 있다. 일 예로서, 어드레스 변환 모듈(542), 중복 제거 제어 모듈(543) 및 데이터 관리 모듈(544)이 동작 메모리(540)에 더 저장될 수 있다. In the operation memory 540 , the above-described index table 541 is stored, and modules performing various functions by the execution of the central processing unit 510 may be stored. As an example, the address translation module 542 , the deduplication control module 543 , and the data management module 544 may be further stored in the operation memory 540 .

어드레스 변환 모듈(542)은 호스트로부터 제공된 해시 인덱스를 물리적 어드레스로 변환한다. 일 예로서, 해시 인덱스와 물리적 어드레스 사이의 맵핑 정보가 인덱스 테이블(541)에 저장될 수 있다. 전술한 예에서와 같이, 인덱스 테이블(541)은 해시 인덱스에 따라 정렬되는 형태로 물리적 어드레스를 저장할 수 있으며, 호스트로부터 제공되는 해시 인덱스에 대응하는 물리적 어드레스가 인덱스 테이블(541)로부터 독출될 수 있다.The address translation module 542 converts the hash index provided from the host into a physical address. As an example, mapping information between a hash index and a physical address may be stored in the index table 541 . As in the above example, the index table 541 may store physical addresses in a form sorted according to the hash index, and the physical address corresponding to the hash index provided from the host may be read from the index table 541 . .

중복 제거 제어 모듈(543)은 데이터가 중복하여 저장되는 것을 방지하기 위한 다양한 기능들을 수행할 수 있다. 일 예로서, 중복 제거 제어 모듈(543)은 호스트로부터 제공되는 해시 인덱스와 인덱스 테이블(541)에 저장된 정보들을 확인하여 기록 요청된 데이터가 중복 데이터에 해당하는 지를 판단할 수 있다. 또한, 중복 제거 제어 모듈(543)은 참조 카운트 정보를 인덱스 테이블(541)에 저장 및 관리하기 위한 동작을 수행할 수 있으며, 일 예로서 호스트로부터의 해시 인덱스와 동일한 해시 인덱스가 존재할 때, 동일한 해시 인덱스에 대응하는 참조 카운트 값을 업데이트할 수 있다. The deduplication control module 543 may perform various functions to prevent duplicate data from being stored. As an example, the deduplication control module 543 may determine whether the write-requested data corresponds to duplicate data by checking the hash index provided from the host and information stored in the index table 541 . In addition, the deduplication control module 543 may perform an operation for storing and managing the reference count information in the index table 541. As an example, when the same hash index as the hash index from the host exists, the same hash The reference count value corresponding to the index may be updated.

한편, 데이터 관리 모듈(544)은 다양한 데이터 관리 동작을 수행할 수 있으며, 일 예로서 플래시 메모리에서의 데이터 관리 동작을 수행할 수 있다. 일 실시예에 따라, 데이터 관리 모듈(544)은 데이터 중복 제거에 관련된 정보를 이용하여 데이터의 관리 동작을 수행할 수 있다. 예컨대, 인덱스 테이블(541)에 저장된 참조 카운트의 값에 따라 데이터의 관리 동작이 조절될 수 있으며, 일 예로 데이터의 이동, 백업, 가비지 컬렉션 등의 다양한 관리 동작이 참조 카운트의 값에 기반하여 수행될 수 있을 것이다.Meanwhile, the data management module 544 may perform various data management operations, and as an example, may perform a data management operation in a flash memory. According to an embodiment, the data management module 544 may perform a data management operation using information related to data deduplication. For example, the data management operation may be adjusted according to the value of the reference count stored in the index table 541 , and as an example, various management operations such as data movement, backup, garbage collection, etc. may be performed based on the reference count value. will be able

전술한 도 5 및 도 6의 실시예에서는 본 발명의 실시예들에 따른 중복 제거 처리가 소프트웨어적으로 수행되는 예가 도시되었으나, 본 발명의 실시예는 이에 국한될 필요가 없다. 일 예로서, 중복 제거 처리를 위한 기능들 중 적어도 일부는 하드웨어적 또는 하드웨어와 소프트웨어의 조합에 의해 구현되어도 무방할 것이다.In the above-described embodiments of FIGS. 5 and 6, an example in which the deduplication processing according to embodiments of the present invention is performed by software is illustrated, but the embodiment of the present invention is not limited thereto. As an example, at least some of the functions for deduplication processing may be implemented by hardware or a combination of hardware and software.

도 7a,b는 본 발명의 실시예들에 따라 호스트 및 스토리지 시스템에서 관리되는 정보의 예를 나타내는 도면이다. 도 7a는 호스트 측에서 데이터 중복을 판단하는 일반적인 경우가 예시되며, 도 7b는 본 발명의 실시예에 따라 스토리지 시스템 측에서 데이터 중복을 판단하는 예가 도시된다.7A and 7B are diagrams illustrating examples of information managed by a host and a storage system according to embodiments of the present invention. 7A illustrates a general case of determining data redundancy at the host side, and FIG. 7B illustrates an example of determining data redundancy at the storage system side according to an embodiment of the present invention.

도 7a를 참조하면, 호스트는 외부 시스템으로부터 논리적 어드레스(LBA 1 ~ LBA 3) 및 이에 대응하는 데이터들을 수신하고, 데이터들에 대한 인덱스(예컨대 해시 인덱스, Hash Index 1 ~ Hash Index 3)를 생성할 수 있다. 또한, 논리적 어드레스(LBA 1 ~ LBA 3)와 해시 인덱스(Hash Index 1 ~ Hash Index 3)의 맵핑 정보가 메모리에 저장되어 관리될 수 있다. Referring to FIG. 7A , the host receives logical addresses (LBA 1 to LBA 3) and corresponding data from an external system, and generates an index (eg, a hash index, Hash Index 1 to Hash Index 3) for the data. can Also, mapping information of logical addresses (LBA 1 to LBA 3) and hash indexes (Hash Index 1 to Hash Index 3) may be stored and managed in a memory.

또한, 호스트는 스토리지 시스템에 대한 억세스를 위해 파일 시스템을 운용할 수 있으며, 중복 제거를 수행한 후 해시 인덱스(Hash Index 1 ~ Hash Index 3)와 이에 대응하는 파일 정보(예컨대 파일 아이디, File ID 1 ~ File ID 3)를 메모리에 저장할 수 있다. 또한, 해시 인덱스(Hash Index 1 ~ Hash Index 3) 및 파일 정보(File ID 1 ~ File ID 3)에 대응하여 참조 카운트(Ref CNT 1 ~ Ref CNT 3)가 더 저장될 수 있다. 또한, 호스트는 파일 시스템을 통해 스토리지 시스템에 대한 논리적 어드레스(LBA S1 ~ LBA S3)를 스토리지 시스템으로 제공하고, 스토리지 시스템은 논리적 어드레스(LBA S1 ~ LBA S3)와 물리적 어드레스(PBA 1 ~ PBA 3)의 맵핑 정보를 저장할 수 있다. 즉, 호스트에서는 이중 테이블 관리를 수행함으로써 데이터 중복 제거를 수행할 수 있으며, 이에 따라 메모리 이용 량이 증가하게 되며 또한 정보의 검색 시간이 증가될 수 있다. In addition, the host may operate a file system to access the storage system, and after deduplication is performed, hash indexes (Hash Index 1 to Hash Index 3) and corresponding file information (eg, file ID, File ID 1) ~ File ID 3) can be saved in memory. Also, reference counts Ref CNT 1 to Ref CNT 3 may be further stored in correspondence with the hash indexes Hash Index 1 to Hash Index 3 and file information File ID 1 to File ID 3 . In addition, the host provides logical addresses (LBA S1 to LBA S3) to the storage system through the file system, and the storage system provides logical addresses (LBA S1 to LBA S3) and physical addresses (PBA 1 to PBA 3) to the storage system. of mapping information can be stored. That is, the host can perform data deduplication by performing double table management, which increases memory usage and also increases information retrieval time.

반면에, 도 7b에 도시된 바와 같이, 본 발명의 실시예에 따르면 호스트는 외부 시스템으로부터의 논리적 어드레스(LBA 1 ~ LBA 3) 및 이에 대응하는 해시 인덱스(Hash Index 1 ~ Hash Index 3)의 맵핑 정보만을 관리할 수 있다. 이에 따라, 호스트는 이중으로 테이블을 관리할 필요가 없으며 호스트 단에서의 정보 처리 부담이 감소될 수 있다. On the other hand, as shown in FIG. 7B , according to an embodiment of the present invention, the host maps logical addresses (LBA 1 to LBA 3) from an external system and corresponding hash indexes (Hash Index 1 to Hash Index 3). Only information can be managed. Accordingly, the host does not need to manage the double table, and the information processing burden at the host end can be reduced.

호스트는 해시 인덱스(Hash Index 1 ~ Hash Index 3)를 스토리지 시스템으로 제공하고, 스토리지 시스템은 해시 인덱스(Hash Index 1 ~ Hash Index 3)와 물리적 어드레스(PBA 1 ~ PBA 3) 사이의 맵핑 정보를 저장 및 관리할 수 있다. 또한, 스토리지 시스템은 해시 인덱스(Hash Index 1 ~ Hash Index 3) 및 물리적 어드레스(PBA 1 ~ PBA 3)에 대응하는 참조 카운트(Ref CNT 1 ~ Ref CNT 3)를 더 저장할 수 있다. The host provides the hash index (Hash Index 1 ~ Hash Index 3) to the storage system, and the storage system stores the mapping information between the hash index (Hash Index 1 ~ Hash Index 3) and the physical address (PBA 1 ~ PBA 3) and manage. Also, the storage system may further store the hash indexes Hash Index 1 to Hash Index 3 and reference counts Ref CNT 1 to Ref CNT 3 corresponding to the physical addresses PBA 1 to PBA 3 .

도 8은 본 발명의 실시예에 따른 데이터 처리 시스템의 데이터 기록 및 독출 동작의 예를 나타내는 블록도이다. 도 8에 도시된 실시예에서, 스토리지 시스템(620)은 NAND 메모리를 포함하는 SSD로서 키-밸류(Key Value) SSD가 예시된다. 8 is a block diagram illustrating an example of data write and read operations of the data processing system according to an embodiment of the present invention. In the embodiment shown in FIG. 8 , the storage system 620 is an SSD including NAND memory, and a key-value SSD is exemplified.

일 실시예에 따른 데이터 처리 시스템은 다수 개의 스토리지 시스템들을 포함할 수 있으며, 하나 이상의 스토리지 시스템을 키-밸류(Key Value) 스토리지로 구성하고, 실제 데이터는 다른 스토리지 시스템에 저장하는 방식으로 데이터 처리 시스템이 운용될 수 있다. 일 예로서, 컴퓨팅 파워가 낮은 저사양 노드와 SSD로 키-밸류(Key Value) 스토리지를 구성하고, 외부의 클라이언트들은 키-밸류(Key Value) 스토리지에 접근하여 데이터의 저장 및 독출을 요청할 수 있다. 이 때, 스토리지 시스템(620)은 전술한 실시예에서의 인덱스로서 키(Key)를 수신할 수 있으며, 또한 데이터로서 밸류(Value)를 수신할 수 있다. 또한, 전술한 실시예에서와 유사하게, 키(Key)는 외부의 시스템으로부터 제공된 데이터에 대한 해시 연산을 통해 생성될 수 있다. A data processing system according to an embodiment may include a plurality of storage systems, one or more storage systems are configured as key-value storage, and actual data is stored in another storage system. This can be operated. As an example, a key-value storage is configured with a low-spec node and an SSD having low computing power, and external clients may access the key-value storage and request storage and reading of data. At this time, the storage system 620 may receive a key as an index in the above-described embodiment, and may also receive a value as data. Also, similar to the above-described embodiment, a key may be generated through a hash operation on data provided from an external system.

도 8을 참조하면, 데이터 처리 시스템(600)은 호스트(610) 및 스토리지 시스템(620)을 포함하고, 전술한 실시예들에 따라 스토리지 시스템(620)은 인덱스로서 키(Key)를 저장 및 관리하고, 저장된 키(Key)를 이용하여 데이터 중복 제거를 수행할 수 있다.Referring to FIG. 8 , the data processing system 600 includes a host 610 and a storage system 620 , and according to the above-described embodiments, the storage system 620 stores and manages a key as an index. and data deduplication can be performed using the stored key.

호스트(610)는 데이터 기록 및 독출 요청을 수신할 수 있다. 기록 동작을 예로 들면, 데이터(User Data)에 대한 해시 연산을 통해 키(Key)가 생성될 수 있으며, 데이터(User Data)에 대한 압축 처리를 통해 압축 데이터에 해당하는 밸류(Value)가 생성될 수 있다. 또한, 키(Key) 및 밸류(Value) 인터페이스 커맨드 처리를 통해 기록 요청(Put(Key, Value)) 및 독출 요청(Get(Key))이 생성될 수 있으며, 상기 요청들(Put(Key, Value), Get(Key))이 SSD 장치 드라이버를 통해 스토리지 시스템(620)으로 제공될 수 있다. The host 610 may receive a data write and read request. As an example of a write operation, a key may be generated through a hash operation on data (User Data), and a value corresponding to compressed data may be generated through compression processing on data (User Data). can In addition, a write request (Put(Key, Value)) and a read request (Get(Key)) may be generated through the key and value interface command processing, and the requests (Put(Key, Value)) may be generated. ), Get(Key)) may be provided to the storage system 620 through the SSD device driver.

스토리지 시스템(620)은 수신된 키(Key) 및 밸류(Value)를 이용하여 중복 제거에 기반한 기록/독출 동작을 수행할 수 있다. 일 예로, 스토리지 시스템(620)은 수신된 키(Key)와 인덱스 테이블에 저장된 정보를 참조하여 데이터의 중복 여부를 판단할 수 있다. 예컨대, 수신된 키(Key)와 기 저장된 키(Key)들과의 동일성 여부가 판단되는 것으로 가정하면, 동일한 키(Key)가 존재하지 않는 경우 스토리지 시스템(620)은 수신된 키(Key) 및 이에 대응하는 물리적 어드레스(PBA)의 맵핑 정보를 인덱스 테이블에 추가로 저장할 수 있으며, NAND 메모리에서 물리적 어드레스(PBA)가 지시하는 위치에 밸류(Value)가 저장될 수 있다. 일 저장 예로서, 수신된 키(Key), 밸류(Value) 및 이에 대응하는 메타 데이터(Meta data)가 NAND 메모리의 페이지(NAND Page)에 함께 저장될 수 있다.The storage system 620 may perform a write/read operation based on deduplication by using the received key and value. For example, the storage system 620 may determine whether data is duplicated by referring to the received key and information stored in the index table. For example, assuming that it is determined whether the received key and pre-stored keys are identical to each other, if the same key does not exist, the storage system 620 may store the received key and Mapping information of the corresponding physical address PBA may be additionally stored in the index table, and a value may be stored in a location indicated by the physical address PBA in the NAND memory. As one storage example, a received key, a value, and meta data corresponding thereto may be stored together in a NAND page of a NAND memory.

일 실시예에 따라, 인덱스 테이블에는 참조 카운트가 더 저장될 수 있으며, 동일한 키(Key)가 존재하는 경우에는 키(Key) 및 밸류(Value)를 NAND 메모리에 중복하여 저장함이 없이, 상기 동일한 키(Key)에 대응하는 참조 카운트를 업데이트함으로써 데이터 기록 동작이 완료될 수 있다. 또한, 일 실시예에 따라, 키(Key) 및 밸류(Value)에 대해 스토리지 시스템(620) 내에서 해시 연산 처리가 더 수행될 수 있으며, 이 때 스토리지 시스템(620)에 저장되는 키(Key) 및 밸류(Value)의 사이즈가 더 감소될 수 있다.According to an embodiment, a reference count may be further stored in the index table, and when the same key exists, the same key and the same key are not repeatedly stored in the NAND memory. The data write operation can be completed by updating the reference count corresponding to (Key). In addition, according to an embodiment, a hash operation processing may be further performed in the storage system 620 for a key and a value, and at this time, the key stored in the storage system 620 . and the size of Value may be further reduced.

한편, 독출 동작을 예로 들면, 호스트(610)는 해시 엔진을 통해 데이터(User Data)에 대응하는 키(Key)를 생성하고, 생성된 키(Key)를 데이터 독출을 위한 정보로서 스토리지 시스템(620)으로 제공할 수 있다. 일 예로, 키(Key)와 함께 독출 요청(Get(Key))이 SSD 장치 드라이버를 통해 스토리지 시스템(620)으로 제공될 수 있다. On the other hand, taking the read operation as an example, the host 610 generates a key corresponding to data (User Data) through a hash engine, and uses the generated key as information for data readout by the storage system 620 ) can be provided. For example, a read request Get(Key) together with a key may be provided to the storage system 620 through the SSD device driver.

스토리지 시스템(620)은 키(Key)에 대응하는 데이터(User Data)의 독출 요청(Get(Key))을 수신하고, 인덱스 테이블을 통해 키(Key)에 맵핑된 물리적 어드레스(PBA)를 판단할 수 있으며, 물리적 어드레스(PBA)가 지시하는 위치의 데이터를 독출할 수 있다. 일 실시예에 따라 NAND 메모리에 저장된 정보들 중 데이터에 해당하는 밸류(Value)가 독출되어 호스트(610)로 제공될 수 있으며, 호스트(610)는 밸류(Value)에 대한 압축 해제를 통해 데이터(User Data)를 복원할 수 있다.The storage system 620 receives a read request (Get(Key)) of data (User Data) corresponding to a key (Key), and determines a physical address (PBA) mapped to a key (Key) through an index table. and data at a location indicated by the physical address PBA can be read. According to an embodiment, a value corresponding to data among information stored in the NAND memory may be read and provided to the host 610, and the host 610 decompresses the data (Value) User Data) can be restored.

도 9는 본 발명의 예시적인 실시예에 따른 호스트의 동작 방법을 나타내는 플로우차트이다.9 is a flowchart illustrating a method of operating a host according to an exemplary embodiment of the present invention.

도 9를 참조하면, 호스트는 외부 시스템 측에서의 어드레스로서 논리적 어드레스(LBA) 및 이에 대응하는 데이터(Data)를 수신한다(S11). 호스트는 데이터에 대한 연산 처리를 통해 인덱스(Index)를 생성하며, 생성된 인덱스(Index)를 호스트 내에 저장할 수 있다(S12). 일 예로서, 호스트는 해시 함수를 이용하여 해시 인덱스를 생성할 수 있으며, 생성된 해시 인덱스를 상기 논리적 어드레스에 정렬하여 메모리에 저장할 수 있다.Referring to FIG. 9 , the host receives a logical address LBA and data corresponding thereto as an address on the external system side ( S11 ). The host may create an index through arithmetic processing on data, and store the created index in the host (S12). As an example, the host may generate a hash index using a hash function, and align the generated hash index to the logical address and store it in a memory.

호스트는 데이터(Data) 및 이에 대응하는 인덱스(Index)를 스토리지 시스템(예컨대, SSD)으로 제공한다(S13). 스토리지 시스템은 호스트로부터의 기록 요청에 따라 데이터 중복 제거가 적용된 기록 동작을 수행할 수 있으며, 일 실시예에 따라 데이터 중복 제거가 수행된 경우, 호스트는 스토리지 시스템으로부터 참조 카운트 정보를 수신할 수 있다(S14). 수신된 참조 카운트 정보는 차후 데이터 관리 동작에 이용될 수 있다.The host provides data and a corresponding index to the storage system (eg, SSD) (S13). The storage system may perform a write operation to which data deduplication is applied according to a write request from the host, and according to an embodiment, when data deduplication is performed, the host may receive reference count information from the storage system ( S14). The received reference count information may be used for subsequent data management operations.

도 10은 본 발명의 예시적인 실시예에 따른 스토리지 시스템의 동작 방법을 나타내는 플로우차트이다.10 is a flowchart illustrating a method of operating a storage system according to an exemplary embodiment of the present invention.

스토리지 시스템은 호스트로부터 데이터(Data) 및 이에 대응하는 인덱스(Index)를 수신하고(S21), 데이터 중복 제거를 위해 인덱스 테이블에 기 저장된 정보들을 검색할 수 있다(S22). 또한, 검색 결과에 따라, 상기 수신된 인덱스(Index)에 정렬하여 저장된 정보가 판단되거나 수신된 인덱스(Index)와 동일한 인덱스의 존재 여부가 판단될 수 있으며, 상기 판단 결과에 따라 호스트로부터 수신된 데이터의 중복 여부가 판단될 수 있다(S23).The storage system may receive data and a corresponding index from the host (S21), and search information previously stored in the index table for data deduplication (S22). In addition, according to the search result, it may be determined whether information stored in alignment with the received index or whether the same index as the received index exists may be determined, and the data received from the host according to the determination result It can be determined whether the overlap of (S23).

중복 데이터가 아닌 것으로 판단된 경우, 스토리지 시스템은 수신된 인덱스(Index)에 대응하는 물리적 어드레스(PBA)를 생성하고, 인덱스(Index)와 물리적 어드레스(PBA)의 맵핑 정보를 저장할 수 있다(S24). 또한, 동일한 인덱스가 존재하지 않으므로, 호스트로부터 제공된 데이터(Data)는 스토리지 시스템에 처음 저장되는 것으로 판단되고, 이에 따라 물리적 어드레스(PBA)에 대응하는 위치에 데이터(Data)가 저장될 수 있다(S25).When it is determined that the data is not duplicate, the storage system may generate a physical address PBA corresponding to the received index Index and store mapping information between the index Index and the physical address PBA ( S24 ). . Also, since the same index does not exist, it is determined that the data provided from the host is initially stored in the storage system, and accordingly, the data may be stored at a location corresponding to the physical address PBA (S25). ).

반면에, 중복 데이터인 것으로 판단된 경우, 호스트로부터 제공된 데이터(Data)와 동일한 데이터가 존재하는 것으로 판단되고, 이에 따라 호스트로부터 제공된 데이터(Data)를 중복하여 저장함이 없이 기록 요청에 대한 처리를 완료할 수 있다. 일 예로서, 호스트로부터 제공된 인덱스(Index)와 동일한 인덱스에 대응하는 참조 카운트가 업데이트되고(S26), 업데이트된 참조 카운트 정보가 호스트로 제공될 수 있다(S27). 참조 카운트의 업데이트는, 참조 카운트의 값을 증가 또는 감소시킴에 의해 수행될 수 있으며, 일 예로서 동일한 인덱스에 대한 기록 요청이 수신될 때마다 참조 카운트가 1 씩 증가될 수 있다.On the other hand, if it is determined that the data is duplicate, it is determined that the same data as the data provided from the host exists, and accordingly, the processing of the write request is completed without redundantly storing the data provided from the host. can do. As an example, the reference count corresponding to the same index as the index provided from the host may be updated (S26), and the updated reference count information may be provided to the host (S27). The update of the reference count may be performed by increasing or decreasing the value of the reference count, and as an example, the reference count may be incremented by 1 whenever a write request for the same index is received.

이하에서는, 데이터 중복 제거와 관련하여, 본 발명의 실시예에 적용 가능한 다양한 동작 예들이 설명된다. 도 11 내지 도 18은 본 발명의 데이터 처리 시스템에서 호스트와 스토리지 시스템 사이의 통신 예들을 나타내는 도면이다. 또한, 이하의 실시예들에서, 스토리지 시스템은 Key-Value SSD(KV SSD)인 것으로 가정되며, 밸류(Value)는 데이터로 지칭될 수 있다.Hereinafter, in relation to data deduplication, various operation examples applicable to an embodiment of the present invention will be described. 11 to 18 are diagrams illustrating examples of communication between a host and a storage system in the data processing system of the present invention. In addition, in the following embodiments, it is assumed that the storage system is a Key-Value SSD (KV SSD), and Value may be referred to as data.

도 11의 예에서는, 데이터 충돌(Collision)을 고려하지 않은 데이터 중복 제거 동작이 설명된다. 일 예로서, 해시 함수 등을 이용하여 생성되는 키(Key)는 데이터(Value)에 비해 작은 사이즈를 가지며, 이에 따라 데이터(Value)가 상이함에도 불구하고 동일한 키(Key)가 생성되는 데이터 충돌이 발생될 가능성이 존재한다.In the example of FIG. 11 , a data deduplication operation without considering data collision is described. As an example, a key generated using a hash function or the like has a smaller size compared to the data value, and accordingly, data collision in which the same key is generated even though the data values are different There is a possibility that it will occur.

도 11을 참조하면, 호스트는 데이터 기록 요청을 수신함에 따라 데이터(Value)로부터 키(Key)를 생성하고, 기록 요청(PUT(Key, Value))을 스토리지 시스템으로 제공한다. 스토리지 시스템은 키(Key)를 이용한 데이터 중복 확인 동작을 수행하고, 중복 데이터에 해당하는 경우 데이터(Value)를 중복하여 저장함이 없이 참조 카운트(Ref CNT) 만을 업데이트할 수 있다. 스토리지 시스템은 해당 기록 요청(PUT(Key, Value))에 대해 중복 제거 처리가 수행되었음을 나타내는 정보(Info_DD)를 호스트로 제공할 수 있으며, 또한 업데이트된 참조 카운트(Ref CNT)를 호스트로 제공할 수 있다. 호스트는 상기 정보(Info_DD)에 기반하여 데이터 중복 제거가 수행되었음을 판단할 수 있다.Referring to FIG. 11 , upon receiving a data write request, a host generates a key from data Value and provides a write request (PUT (Key, Value)) to the storage system. The storage system may perform a data redundancy check operation using a key, and in case of duplicate data, only update the reference count (Ref CNT) without redundantly storing the data (Value). The storage system may provide information (Info_DD) to the host indicating that deduplication processing has been performed for the corresponding write request (PUT (Key, Value)), and may also provide an updated reference count (Ref CNT) to the host. have. The host may determine that data deduplication has been performed based on the information Info_DD.

한편, 도 12 및 도 13은 데이터 충돌(Collision)을 고려한 데이터 중복 제거 동작의 일 예를 나타낸다. 도 12에서는 호스트 측에서 데이터 충돌(Collision) 판단을 수행하고, 도 13에서는 스토리지 시스템 측에서 데이터 충돌 판단을 수행하는 예를 나타낸다.Meanwhile, FIGS. 12 and 13 show an example of a data deduplication operation in consideration of data collision. 12 illustrates an example in which data collision determination is performed on the host side, and data collision determination is performed on the storage system side in FIG. 13 .

도 12를 참조하면, 도 11의 실시예에서와 같이 호스트는 데이터(Value)로부터 키(key)를 생성하고, 기록 요청(PUT(Key, Value))을 스토리지 시스템으로 제공한다. 또한, 스토리지 시스템은 키(Key)를 이용한 중복 확인 동작을 수행하고, 중복 데이터에 해당하는 경우에는 데이터 중복이 발생하였음을 나타내는 정보(Info_D)와 함께, 수신된 키(Key)에 대응하는 데이터(Value)를 독출하여 호스트로 제공할 수 있다.Referring to FIG. 12 , as in the embodiment of FIG. 11 , the host generates a key from data Value and provides a write request (PUT (Key, Value)) to the storage system. In addition, the storage system performs a duplicate check operation using a key, and in the case of duplicate data, data corresponding to the received key (Key) together with information (Info_D) indicating that data duplicate has occurred Value) can be read and provided to the host.

호스트는, 외부 시스템으로부터 기록 요청된 데이터(Value)와 스토리지 시스템으로부터 제공된 데이터(Value)에 대한 비교 동작을 통해 데이터 충돌 여부를 확인할 수 있다. 일 예로서, 호스트는 기록 요청된 데이터(Value)와 스토리지 시스템으로부터 제공된 데이터(Value)에 대한 비트 또는 바이트 단위의 비교 동작을 통해 데이터(Value)의 동일성 여부를 판단하고, 그 판단 결과(Res_C)를 스토리지 시스템으로 제공할 수 있다. The host may check whether there is a data conflict through a comparison operation between the data (Value) requested to be written from the external system and the data (Value) provided from the storage system. As an example, the host determines whether the data (Value) is the same through a bit or byte comparison operation between the write-requested data (Value) and the data (Value) provided from the storage system, and the determination result (Res_C) can be provided as a storage system.

기록 요청된 데이터(Value)와 스토리지 시스템으로부터 제공된 데이터(Value)가 동일함을 나타내는 판단 결과(Res_C)에 따라, 스토리지 시스템은 데이터(Value)를 중복하여 저장함이 없이 참조 카운트(Ref CNT) 만을 업데이트하고, 중복 제거가 수행되었음을 나타내는 정보(Info_DD)를 호스트로 제공할 수 있다. 또한, 업데이트된 참조 카운트(Ref CNT)가 호스트로 더 제공될 수 있다. According to the determination result Res_C indicating that the write-requested data (Value) and the data (Value) provided from the storage system are the same, the storage system updates only the reference count (Ref CNT) without redundantly storing the data (Value) and information (Info_DD) indicating that duplication has been performed may be provided to the host. In addition, an updated reference count (Ref CNT) may be further provided to the host.

한편, 기록 요청된 데이터(Value)와 스토리지 시스템으로부터 제공된 데이터(Value)가 상이함에 따라 데이터 충돌이 발생한 것으로 판단되고, 호스트는 데이터 충돌 제거를 위한 관리 동작을 수행할 수 있다. 상기 데이터 충돌 제거는 다양한 방식에 따라 수행될 수 있으며, 일 예로서 호스트는 충돌이 발생된 데이터(Value)에 대한 다른 해시 값을 갖는 키(Key')를 생성할 수 있다. 일 실시예에 따라, 데이터 충돌 시 호스트는 새로운 기록 요청(PUT(Key', Value))을 스토리지 시스템으로 제공할 수 있으며, 스토리지 시스템은 기록 요청(PUT(Key', Value))에 응답하여 데이터(Value)를 저장함으로써 데이터 충돌이 방지될 수 있다.On the other hand, it is determined that a data conflict has occurred because the data (Value) requested to be written is different from the data (Value) provided from the storage system, and the host may perform a management operation for removing the data conflict. The data collision removal may be performed according to various methods, and as an example, the host may generate a key (Key') having a different hash value for the data (Value) in which the collision occurs. According to an exemplary embodiment, in case of data collision, the host may provide a new write request (PUT(Key', Value)) to the storage system, and the storage system responds to the write request (PUT(Key', Value)) for data By storing (Value), data collisions can be avoided.

한편, 도 13을 참조하면, 전술한 도 11 및 도 12의 실시예에서와 같이 호스트는 해시 함수를 통해 키(Key)를 생성하고, 기록 요청(PUT(Key, Value))을 스토리지 시스템으로 제공한다. 또한, 스토리지 시스템은 키(Key)를 이용한 중복 확인 동작을 수행할 수 있다.Meanwhile, referring to FIG. 13 , as in the above-described embodiments of FIGS. 11 and 12 , the host generates a key through a hash function and provides a write request (PUT (Key, Value)) to the storage system. do. Also, the storage system may perform a duplicate check operation using a key.

키(Key)를 이용한 중복 확인 동작에 따라 중복 데이터로 판단된 경우, 스토리지 시스템 측에서 데이터 충돌의 발생 여부를 판단할 수 있다. 일 예로서, 스토리지 시스템은 키(Key)에 대응하는 데이터(Value)를 독출하고, 독출된 데이터(Value)와 호스트로부터 제공된 데이터(Value)와의 동일성 여부를 판단할 수 있다. 만약, 독출된 데이터(Value)와 호스트로부터 제공된 데이터(Value)가 동일한 경우에는 데이터 충돌이 발생되지 않은 것으로 판단되고, 이에 따라 스토리지 시스템은 데이터(Value)를 중복하여 저장함이 없이 참조 카운트(Ref CNT) 만을 업데이트할 수 있다. 또한, 중복 제거가 수행되었음을 나타내는 정보(Info_DD) 및 업데이트된 참조 카운트(Ref CNT)를 호스트로 제공할 수 있다. When it is determined that duplicate data is duplicated according to a duplicate check operation using a key, the storage system may determine whether data collision occurs. As an example, the storage system may read data (Value) corresponding to a key (Key) and determine whether the read data (Value) is identical to data (Value) provided from the host. If the read data (Value) and the data (Value) provided from the host are the same, it is determined that no data collision has occurred, and accordingly, the storage system does not duplicate the data (Value) and does not store the reference count (Ref CNT). ) can only be updated. In addition, information indicating that deduplication has been performed (Info_DD) and an updated reference count (Ref CNT) may be provided to the host.

반면에, 데이터 충돌이 발생된 것으로 판단된 경우, 스토리지 시스템은 데이터 충돌이 발생되었음을 나타내는 정보(Info_C)를 호스트로 제공할 수 있다. 전술한 실시예에서와 동일 또는 유사하게 호스트는 데이터 충돌 제거를 위한 관리 동작을 수행할 수 있으며, 일 예로서 호스트는 충돌이 발생된 데이터(Value)에 대한 다른 해시 값을 갖는 키(Key')를 생성하고 새로운 기록 요청(PUT(Key', Value))을 스토리지 시스템으로 제공할 수 있다. On the other hand, when it is determined that a data collision has occurred, the storage system may provide information Info_C indicating that a data collision has occurred to the host. In the same or similar manner as in the above embodiment, the host may perform a management operation for data collision removal, and as an example, the host may have a key (Key') having a different hash value for the data (Value) in which the collision has occurred. can be created and provide a new write request (PUT(Key', Value)) to the storage system.

전술한 실시예들에서는 데이터 충돌 제거를 위해 새로운 해쉬 연산이 수행되는 예가 설명되었으나 본 발명의 실시예는 이에 국한될 필요가 없다. 예컨대, 충돌이 발생된 데이터를 스토리지 시스템에 기록하고 테이블 정보를 연관 리스트(Linked-list) 형태로 관리함으로써 데이터 충돌이 제거되어도 무방하며, 충돌 회수를 카운팅하여 이에 기반한 데이터 충돌 관리가 수행되거나 기타 다른 다양한 관리 방식이 적용되어도 무방할 것이다.In the above-described embodiments, an example in which a new hash operation is performed for data collision removal has been described, but the embodiment of the present invention is not limited thereto. For example, data conflicts can be eliminated by recording data in which conflicts occur in the storage system and managing table information in the form of a linked-list, counting the number of conflicts and performing data conflict management or other It will be safe even if various management methods are applied.

이하의 실시예들을 나타내는 도면들에서는, 데이터(Value)가 "ABCD"에 해당하고, 데이터(Value)로부터 "123"에 해당하는 키(Key)가 생성되는 것으로 가정한다. 도 14는 참조 카운트의 관리와 관련된 데이터 처리 시스템의 일 동작 예를 나타낸다. In the drawings showing the following embodiments, it is assumed that data (Value) corresponds to “ABCD” and a key corresponding to “123” is generated from data Value. 14 shows an example of an operation of a data processing system related to management of a reference count.

도 14를 참조하면, 호스트는 데이터(Value)로부터 "123"에 해당하는 키(Key)를 생성하고, 데이터(Value)에 대한 기록 요청(PUT(123, ABCD))을 스토리지 시스템으로 제공한다. 스토리지 시스템은 수신된 데이터(Value)가 중복 데이터에 해당하지 않음을 판단하고, 기록 요청(PUT(123, ABCD))에 따라 데이터(Value)를 정상적으로 기록함과 함께, 기록 완료를 나타내는 신호를 호스트로 제공할 수 있다. Referring to FIG. 14 , a host generates a key corresponding to “123” from data Value and provides a write request (PUT 123, ABCD) for data Value to the storage system. The storage system determines that the received data (Value) does not correspond to duplicate data, records the data (Value) normally according to the write request (PUT (123, ABCD)), and sends a signal indicating the completion of writing to the host. can provide

이후, 호스트는 중복 데이터 "ABCD"에 대한 기록 요청을 스토리지 시스템으로 제공할 수 있다. 스토리지 시스템은 키(Key)를 이용한 확인을 통해 상기 수신된 기록 요청이 중복 데이터에 대한 기록 요청에 해당하는 것으로 판단하고, 데이터 중복이 발생하였음을 나타내는 정보(Info_D)를 호스트로 제공할 수 있다. 호스트는 상기 정보(Info_D)에 대응하여 "123"에 해당하는 키(Key)를 포함하는 독출 요청(GET(123))을 스토리지 시스템으로 제공할 수 있으며, 스토리지 시스템은 독출 요청(GET(123))에 따라 데이터(Value)를 독출하여 호스트로 제공할 수 있다.The host may then provide a write request for the redundant data “ABCD” to the storage system. The storage system may determine that the received write request corresponds to a write request for duplicate data through verification using a key, and may provide information Info_D indicating that data duplication has occurred to the host. The host may provide a read request (GET 123) including a key corresponding to "123" to the storage system in response to the information Info_D, and the storage system may provide the read request (GET 123) ), the data (Value) can be read and provided to the host.

호스트는, 전술한 실시예에서와 동일하게 스토리지 시스템으로부터 독출된 데이터(Value)를 이용하여 데이터의 충돌 여부를 판단할 수 있다. 판단 결과, 스토리지 시스템으로부터 수신된 데이터(Value)와 기록 요청된 데이터(Value)가 동일한 경우, 데이터 충돌이 발생되지 않음에 따라 호스트는 "123"에 해당하는 키(Key)에 대한 참조 카운트를 증가시킬 것을 스토리지 시스템으로 요청할 수 있다. 스토리지 시스템은 호스트로부터의 요청에 응답하여 "123"에 해당하는 키(Key)에 대한 참조 카운트를 증가(또는, 업데이트)함으로써 기록 동작을 완료할 수 있다.The host may determine whether data collides using the data (Value) read from the storage system in the same manner as in the above-described embodiment. As a result of the determination, if the data (Value) received from the storage system and the data requested to be written (Value) are the same, the host increments the reference count for the key corresponding to “123” as there is no data conflict. It can be requested by the storage system. The storage system may complete the write operation by incrementing (or updating) a reference count for a key corresponding to “123” in response to a request from the host.

상기와 같은 실시예에 따르면, 키(Key)들에 대한 참조 카운트가 스토리지 시스템 내에서 관리되는 반면에, 키(Key)들에 대한 참조 카운트를 조절하는 동작은 호스트에 의해 수행될 수 있다. 즉, 중복 데이터에 대한 기록 요청시, 데이터의 충돌 여부가 호스트에 의해 판단되고, 판단 결과에 기반하여 스토리지 시스템 내의 참조 카운트가 호스트에 의해 관리될 수 있다.According to the above embodiment, while reference counts for keys are managed in the storage system, an operation for adjusting reference counts for keys may be performed by the host. That is, when a write request for redundant data is requested, whether data collides is determined by the host, and a reference count in the storage system may be managed by the host based on the determination result.

도 15는 스토리지 시스템 내에서 참조 카운트 정보를 이용하는 동작의 일 예를 나타낸다. 15 illustrates an example of an operation using reference count information in a storage system.

도 15를 참조하면, 전술한 실시예에서와 같이 호스트는 "ABCD"를 갖는 데이터(Value)로부터 "123"에 해당하는 키(Key)를 생성하고, 데이터(Value)에 대한 기록 요청(PUT(123, ABCD))을 스토리지 시스템으로 제공한다. 스토리지 시스템은 데이터(Value)의 중복 여부를 판단하고, 판단 결과에 따라 데이터(Value)를 기록하거나, 또는 데이터(Value)를 기록함이 없이 참조 카운트만을 업데이트할 수 있다. 또한, 중복 제거가 수행되었음을 나타내는 정보(Info_DD)가 호스트로 제공될 수 있다.Referring to FIG. 15 , as in the above-described embodiment, the host generates a key corresponding to “123” from data (Value) having “ABCD”, and requests a write (PUT (PUT) 123, ABCD)) as a storage system. The storage system may determine whether data (Value) is duplicated, record data (Value) according to the determination result, or update only the reference count without recording data (Value). Also, information (Info_DD) indicating that duplication has been performed may be provided to the host.

호스트는 외부 시스템으로부터의 요청에 따라 기존의 데이터(Value)에 대한 오버라이트나 부분 업데이트 동작을 수행할 수 있다. 일 예로서, 호스트는 "123"에 해당하는 키(Key)에 대응하는 데이터(Value)를 다른 값으로 오버라이트하기 위한 요청(PUT_OVERWRITE)을 제공하거나, 또는 키(Key)에 대응하는 데이터(Value)중 일부의 값을 업데이트하기 위한 요청(PUT_PARTIAL_UPDATE)을 제공할 수 있다.The host may perform an overwrite or partial update operation on existing data (Value) according to a request from an external system. As an example, the host provides a request (PUT_OVERWRITE) to overwrite data (Value) corresponding to a key (Key) corresponding to “123” with another value, or data corresponding to a key (Value) ) may provide a request (PUT_PARTIAL_UPDATE) to update the value of some of them.

스토리지 시스템은 호스트로부터의 오버라이트 요청 또는 부분 업데이트 요청의 수행 여부를 판단할 수 있으며, 일 예로서 스토리지 시스템은 "123"에 해당하는 키(Key)에 대응하는 참조 카운트를 확인함으로써 수행 여부를 판단할 수 있다. 만약, 참조 카운트를 확인한 결과 "123"에 해당하는 키(Key)에 대응하는 데이터가 외부 시스템 측에서의 다수의 논리적 어드레스들에 의해 참조되는 데이터인 경우에는, 스토리지 시스템은 상기 데이터를 변경할 수 없음을 나타내는 정보(Failed)를 호스트로 제공할 수 있다. 호스트는 스토리지 시스템으로부터의 정보(Failed)에 기반하여 데이터의 변경 여부를 재차 판단할 수 있다.The storage system may determine whether to perform an overwrite request or a partial update request from the host. As an example, the storage system determines whether to perform by checking a reference count corresponding to a key corresponding to “123”. can do. If, as a result of checking the reference count, data corresponding to a key corresponding to “123” is data referenced by a plurality of logical addresses on the external system side, it indicates that the storage system cannot change the data. Information (Failed) can be provided to the host. The host may determine again whether data is changed based on the information (Failed) from the storage system.

도 16은 데이터 중복 제거 여부가 호스트에 의해 결정되는 일 예를 나타낸다. 16 illustrates an example in which data deduplication is determined by a host.

도 16을 참조하면, 전술한 실시예에서와 같이 호스트는 데이터(Value) "ABCD"로부터 "123"에 해당하는 키(Key)를 생성하고, 데이터(Value)에 대한 기록 요청(PUT(123, ABCD))을 스토리지 시스템으로 제공한다. 스토리지 시스템은 키(Key)를 이용한 중복 확인 동작을 수행하고, 판단 결과를 호스트로 제공할 수 있다. 상기 데이터(Value)가 중복 데이터가 아닌 경우에는 기록 요청된 데이터(Value)가 정상적으로 기록되고, 상기 데이터(Value)가 중복 데이터인 경우에는 데이터 중복이 발생하였음을 나타내는 정보(Info_D)가 호스트로 제공될 수 있다.Referring to FIG. 16 , as in the above-described embodiment, the host generates a key corresponding to “123” from the data “ABCD”, and requests a record for the data (Value) (PUT 123, ABCD)) as a storage system. The storage system may perform a duplicate check operation using a key and provide the determination result to the host. When the data (Value) is not duplicate data, the data (Value) requested to be recorded is normally recorded, and when the data (Value) is duplicate data, information (Info_D) indicating that data duplication has occurred is provided to the host can be

호스트는 데이터 중복 제거 기능을 선택적으로 수행할 수 있다. 예컨대, 호스트는 데이터 중복 제거 모드에 따라 동작할 수 있으며, 중복 제거 모드가 적용되는 경우 호스트는 데이터 중복 제거 요청(Req_Dedup)을 스토리지 시스템으로 제공할 수 있다. 스토리지 시스템은 상기 데이터 중복 제거 요청(Req_Dedup)에 응답하여 키(Key)에 대응하는 참조 카운트를 업데이트하고, 중복 제거가 수행되었음을 나타내는 정보(Info_DD)를 호스트로 제공할 수 있다. 반면에, 중복 제거 모드가 적용되지 않은 경우에는, 호스트는 데이터 중복 요청(Req_Dup)을 스토리지 시스템으로 제공할 수 있으며, 스토리지 시스템은 데이터(Value)를 중복하여 저장하고 데이터(Value)가 정상적으로 저장되었음을 나타내는 정보를 호스트로 제공할 수 있다.The host can optionally perform data deduplication. For example, the host may operate according to the data deduplication mode, and when the deduplication mode is applied, the host may provide a data deduplication request (Req_Dedup) to the storage system. The storage system may update a reference count corresponding to a key in response to the data deduplication request Req_Dedup and provide information Info_DD indicating that deduplication has been performed to the host. On the other hand, if the deduplication mode is not applied, the host may provide a data duplication request (Req_Dup) to the storage system, and the storage system duplicates the data (Value) and indicates that the data (Value) is normally stored. The indicated information may be provided to the host.

상기와 같은 중복 제거의 적용은 다양한 기준에 의해 수행될 수 있다. 일 예로서, 호스트는 데이터(Value)에 따라 그 중요도를 판단할 수 있으며, 상대적으로 중요한 데이터(또는, 저장 안정성이 요구되는 데이터)에 대해서는 데이터(Value)를 중복 저장할 것을 요청할 수 있다. 반면에, 상대적으로 중요하지 않은 데이터(Value)에 대해서는 데이터 중복 제거를 적용할 수 있다.The application of deduplication as described above may be performed according to various criteria. As an example, the host may determine its importance according to the data (Value), and may request to redundantly store the data (Value) for relatively important data (or data requiring storage stability). On the other hand, data deduplication may be applied to relatively insignificant data (Value).

일 실시예에 따라, 스토리지 시스템은 그 내부에 저장된 참조 카운트를 호스트로 제공할 수 있으며, 호스트는 데이터 중복 제거를 적용함에 있어서 참조 카운트의 값을 판단할 수 있다. 일 예로서, 특정 키(Key)에 대응하는 참조 카운트의 값이 상대적으로 큰 경우, 이는 상기 특정 키(Key)에 대응하는 데이터(Value)가 많이 참조되고 있음을 나타낼 수 있다. 이 때, 호스트는 특정 키(Key)에 대응하는 참조 카운트의 값이 임계값을 초과하는 지를 판단하고, 상기 특정 키(Key)에 대응하는 데이터의 기록 동작시, 참조 카운트의 값과 임계값과의 비교 결과에 따라 데이터 중복 제거를 적용하거나 적용하지 않을 수 있다.According to an embodiment, the storage system may provide the reference count stored therein to the host, and the host may determine the value of the reference count in applying data deduplication. As an example, when the value of the reference count corresponding to the specific key is relatively large, this may indicate that a lot of data (Value) corresponding to the specific key are being referenced. At this time, the host determines whether the value of the reference count corresponding to the specific key exceeds the threshold, and when writing data corresponding to the specific key, the value of the reference count and the threshold Data deduplication may or may not be applied depending on the comparison result of

도 17은 데이터 처리 시스템에서 참조 카운트를 이용한 일 동작 예를 나타낸다. 도 17에 도시된 특징들 중 앞선 실시예에서 설명된 특징과 동일한 것에 대해서는 구체적인 설명이 생략된다.17 shows an example of an operation using a reference count in a data processing system. Among the features shown in FIG. 17 , detailed descriptions of features that are the same as those described in the previous embodiment will be omitted.

도 17을 참조하면, 본 발명의 실시예에 따른 데이터 처리 시스템은 호스트와 스토리지 시스템을 포함하고, 키(Key)에 대응하는 참조 카운트는 스토리지 시스템 내에 저장될 수 있다. 또한, 스토리지 시스템은 참조 카운트를 호스트로 제공할 수 있다. 참조 카운트는 다양한 방식에 따라 스토리지 시스템으로부터 호스트로 제공될 수 있다. 일 예로서, 데이터 중복 제거가 적용될 때마다 해당 키(Key)에 대응하는 참조 카운트가 호스트로 제공될 수 있다. 또는, 데이터 중복 제거와는 무관하게, 주기적 또는 비주기적으로 스토리지 시스템 내의 메모리에 저장된 참조 카운트가 호스트로 제공될 수도 있다.Referring to FIG. 17 , a data processing system according to an embodiment of the present invention includes a host and a storage system, and a reference count corresponding to a key may be stored in the storage system. The storage system may also provide a reference count to the host. The reference count may be provided from the storage system to the host in various ways. As an example, whenever data deduplication is applied, a reference count corresponding to a corresponding key may be provided to the host. Alternatively, irrespective of data deduplication, a reference count stored in a memory in the storage system may be provided to the host periodically or aperiodically.

호스트는 스토리지 시스템으로부터 제공된 참조 카운트를 저장할 수 있으며, 데이터 관리 동작을 위해 참조 카운트를 판단할 수 있다. 예컨대, 참조 카운트가 상대적으로 많은(또는, 임계값을 초과하는) 키(Key)에 대해서는, 해당 키(Key)에 대응하는 데이터의 중요도가 높은 것으로 판단하고 이에 따른 관리 동작을 수행할 수 있다. 일 실시예로서, 호스트는 해당 키(Key)에 대응하는 데이터의 백업 요청을 스토리지 시스템으로 제공할 수 있다. 또는 호스트는 해당 키(Key)에 대응하는 데이터를 신뢰도가 높은 저장 영역(예컨대, NAND 메모리에서 싱글 레벨 셀(SLC) 영역 등)으로 이동시킬 것을 스토리지 시스템에 요청할 수 있다. 스토리지 시스템은 호스트로부터의 관리 요청에 따른 동작을 수행할 것이다.The host may store the reference count provided from the storage system, and may determine the reference count for data management operations. For example, for a key having a relatively large reference count (or exceeding a threshold value), it is determined that the importance of data corresponding to the key is high, and a management operation may be performed accordingly. As an embodiment, the host may provide a data backup request corresponding to the corresponding key to the storage system. Alternatively, the host may request the storage system to move data corresponding to the corresponding key to a high-reliability storage area (eg, a single-level cell (SLC) area in a NAND memory). The storage system may perform an operation according to a management request from the host.

도 18은 데이터 처리 시스템에서 참조 카운트를 이용한 일 동작 예를 나타낸다. 도 18에 도시된 특징들 중 앞선 실시예에서 설명된 특징과 동일한 것에 대해서는 구체적인 설명이 생략된다.18 shows an example of an operation using a reference count in a data processing system. Among the features shown in FIG. 18 , detailed descriptions of features that are the same as those described in the previous embodiment will be omitted.

도 18에서는 참조 카운트가 스토리지 시스템에 저장되는 반면에, 참조 카운트의 값이 호스트로부터 스토리지 시스템으로 제공되는 예가 도시된다. 또한, 데이터 중복 여부의 판단이나 데이터 충돌 판단을 위한 동작이 호스트에서 수행되는 예가 도시되나, 본 발명의 실시예는 이에 국한될 필요가 없을 것이다. 예컨대, 데이터 충돌 관리가 수행되지 않거나, 또는 데이터 충돌 관리가 스토리지 시스템 측에서 수행되는 경우에도 본 실시예가 적용될 수 있을 것이다.18 shows an example in which the reference count is stored in the storage system, while the value of the reference count is provided from the host to the storage system. In addition, although an example in which an operation for determining whether data is duplicated or determining a data collision is performed in the host is illustrated, the embodiment of the present invention need not be limited thereto. For example, the present embodiment may be applied even when data conflict management is not performed or data conflict management is performed at the storage system side.

도 18을 참조하면, 호스트는 "ABCD"를 갖는 데이터(Value)로부터 "123"에 해당하는 키(Key)를 생성하고, 데이터(Value)에 대한 기록 요청(PUT(123, ABCD))이 스토리지 시스템으로 제공됨에 따라 스토리지 시스템은 데이터(Value)를 저장할 것이다. 이후, 동일한 키(예컨대, "123")가 생성됨에 따라 호스트는 데이터 중복이 발생하였음을 판단할 수 있으며, 호스트는 "123"에 해당하는 키(Key)에 대응하는 데이터(Value)의 독출 요청을 스토리지 시스템으로 제공할 수 있다. 스토리지 시스템은 키(Key)에 대응하는 데이터(Value)와 함께, 키(Key)에 대응하는 참조 카운트(예컨대, 1)를 호스트로 제공할 수 있다. Referring to FIG. 18 , the host generates a key corresponding to “123” from data (Value) having “ABCD”, and a write request (PUT(123, ABCD)) for the data (Value) is stored in the storage As provided to the system, the storage system will store data (Value). Thereafter, as the same key (eg, “123”) is generated, the host may determine that data duplication has occurred, and the host requests to read data value corresponding to the key corresponding to “123”. can be provided as a storage system. The storage system may provide a reference count (eg, 1) corresponding to the key (eg, 1) to the host together with data (Value) corresponding to the key (Key).

호스트는 스토리지 시스템으로부터 수신된 데이터(Value)를 이용하여 데이터 충돌 여부를 판단한다. 만약, 데이터 충돌이 발생한 것으로 판단된 경우에는, 호스트는 전술한 실시예들에 따른 데이터 충돌 관리 동작을 수행할 수 있다. 반면에, 데이터 충돌이 발생하지 않은 것으로 판단된 경우에는 데이터 기록 요청을 재차 스토리지 시스템으로 제공함과 함께 해당 키(Key)에 대응하는 참조 카운트를 2의 값으로 변경하여 스토리지 시스템으로 제공할 수 있다. 스토리지 시스템은 데이터 중복 제거에 기반하여 기록 동작을 수행하며, 일 예로서 중복된 데이터(Value)를 기록함이 없이 호스트로부터 제공된 정보에 따라 해당 키(Key)에 대응하는 참조 카운트를 1 에서 2 로 업데이트할 수 있다.The host determines whether there is a data collision by using the data (Value) received from the storage system. If it is determined that a data collision has occurred, the host may perform the data collision management operation according to the above-described embodiments. On the other hand, if it is determined that no data collision has occurred, the data write request may be provided to the storage system again, and the reference count corresponding to the corresponding key may be changed to a value of 2 and provided to the storage system. The storage system performs a write operation based on data deduplication, and as an example, updates the reference count corresponding to the corresponding key from 1 to 2 according to the information provided from the host without recording the duplicate data (Value). can do.

도 19는 본 발명의 실시 예에 따른 서버 시스템을 포함하는 네트워크 시스템을 나타내는 블록도이다. 도 19에서는 서버 시스템과 함께 다수의 터미널들(예컨대, 컴퓨팅 노드)이 함께 도시되며, 서버 시스템은 전술한 실시예들에 따른 데이터 처리 시스템을 이용하여 구현될 수 있다.19 is a block diagram illustrating a network system including a server system according to an embodiment of the present invention. 19 illustrates a plurality of terminals (eg, a computing node) together with a server system, and the server system may be implemented using the data processing system according to the above-described embodiments.

도 19를 참조하면, 네트워크 시스템(700)은 서버 시스템(710)과 함께, 네트워크(720)를 통해 통신하는 다수의 터미널들(731_1 ~ 731_n)을 포함할 수 있다. 서버 시스템(710)은 서버(711)와 스토리지 시스템으로서 SSD(712)를 포함할 수 있다. 서버(711)는 전술한 실시예들에서의 호스트의 기능을 수행할 수 있다.Referring to FIG. 19 , the network system 700 may include a plurality of terminals 731_1 to 731_n that communicate through the network 720 together with the server system 710 . The server system 710 may include a server 711 and an SSD 712 as a storage system. The server 711 may perform the function of a host in the above-described embodiments.

서버(711)는 네트워크(720)에 연결된 다수의 터미널들(731_1 ~ 731_n)로부터 전송된 요청들(requests)을 처리할 수 있다. 일 예로서, 서버(711)는 다수의 터미널들(731_1 ~ 731_n)로부터 제공되는 데이터를 SSD(712)에 저장할 수 있다. 또한, 전술한 실시예들에 따라, 데이터 중복 제거 기능을 적용함으로써 동일한 데이터가 중복하여 저장되는 것을 감소 또는 방지함으로써 SSD(712)의 저장 공간이 효율적으로 이용될 수 있다. The server 711 may process requests transmitted from a plurality of terminals 731_1 to 731_n connected to the network 720 . As an example, the server 711 may store data provided from the plurality of terminals 731_1 to 731_n in the SSD 712 . In addition, according to the above-described embodiments, the storage space of the SSD 712 can be efficiently used by reducing or preventing redundant storage of the same data by applying the data deduplication function.

SSD(712)는 전술한 실시예들에 따라 해시 인덱스 및 참조 카운트를 관리할 수 있다. 일 실시예에 따라, SSD(712)는 호스트로부터 데이터 및 이에 대응하는 해시 인덱스를 수신하고, 해시 인덱스를 하여 데이터의 중복 여부를 판단할 수 있으며, 판단 결과에 따라 데이터 저장 및 참조 카운트의 업데이트 동작을 수행할 수 있다. 또한, 특정 키(Key)에 대응하는 참조 카운트의 값이 큰 경우, 해당 데이터는 다수의 터미널들(731_1 ~ 731_n) 중 하나 이상의 터미널들에서 이용되는 데이터에 해당할 수 있으며, 전술한 실시예들에 따라 참조 카운트를 이용한 데이터 관리 동작이 수행될 수 있을 것이다.The SSD 712 may manage the hash index and the reference count according to the above-described embodiments. According to an embodiment, the SSD 712 may receive data and a hash index corresponding thereto from the host, and determine whether data is duplicated by performing the hash index, and store data and update the reference count according to the determination result. can be performed. In addition, when the value of the reference count corresponding to the specific key is large, the corresponding data may correspond to data used in one or more terminals among the plurality of terminals 731_1 to 731_n. Accordingly, a data management operation using the reference count may be performed.

이상에서와 같이 도면과 명세서에서 예시적인 실시예들이 개시되었다. 본 명세서에서 특정한 용어를 사용하여 실시예들을 설명되었으나, 이는 단지 본 개시의 기술적 사상을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 개시의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 개시의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Exemplary embodiments have been disclosed in the drawings and specification as described above. Although the embodiments have been described using specific terms in the present specification, these are used only for the purpose of explaining the technical spirit of the present disclosure and are not used to limit the meaning or the scope of the present disclosure described in the claims. . Therefore, it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the true technical protection scope of the present disclosure should be defined by the technical spirit of the appended claims.

Claims

a storage device for storing data; and
A controller including a memory for directly receiving an index generated from the data by a hash engine in the host together with the data from the host, and storing mapping information between the index and a physical address and a reference count corresponding to the index provided,
The controller determines whether data received from the host corresponding to the index corresponds to duplicate data by determining the mapping information or the reference count read from the memory according to the index received from the host, and the received data performing deduplication processing by updating the reference count when
The storage device is a key-value storage device that stores the data received from the host as a value, and stores the index received from the host as a key related to the value.

delete

According to claim 1,
The index is a hash value generated through an operation using a hash function on the data.

According to claim 1, wherein the controller,
When an index having the same value as the index received from the host exists in the memory, it is determined that data received with the index corresponds to duplicate data.

5. The method of claim 4,
The memory stores first to N-th reference counts corresponding to the first to N-th indexes (where N is an integer of 2 or more);
and the controller, when it is determined that the index received from the host is the same as the first index, performs deduplication by increasing a value of a first reference count corresponding to the first index.

According to claim 1, wherein the controller,
and providing the updated reference count to the host.

According to claim 1, wherein the controller,
providing first information indicating that the data received from the host is redundant data to the host;
and receiving a reference count update request from the host and performing an update on the reference count in response thereto.

According to claim 1, wherein the controller,
and a compressor for compressing the data received from the host and providing the compressed data to the storage device.

According to claim 1, wherein the controller,
and a hash engine that performs a hash operation on at least one of the index and the data received from the host.

According to claim 1, wherein the controller,
processor; and
and an operation memory for storing a deduplication control module that determines whether the data is duplicated and controls an update operation on the reference count as a program executable by the processor.

A method of operating a storage system, comprising:
directly receiving from the host a first index generated from the first data by a hash engine in the host together with the first data from the host;
It is determined whether the received first index is the same as an index corresponding to data previously stored in the memory, based on mapping information between an index and a physical address and a read operation for a memory storing a reference count corresponding to the index to do;
updating the reference count pre-stored in the memory without writing the first data according to data deduplication when the same index as the first index exists; and
providing the updated reference count to the host;
The storage system is a key-value storage system that stores the data received from the host as a value, and stores the index received from the host as a key related to the value.

12. The method of claim 11,
The storage system stores second to N-th indexes and corresponding second to N-th reference counts (where N is an integer greater than or equal to 3);
When the first index is the same as the second index according to the determination result, a value of a second reference count corresponding to the second index is increased.

13. The method of claim 12,
The providing of the updated reference count comprises providing a second reference count, the value of which is increased, to the host.

12. The method of claim 11,
When the same index as the first index does not exist,
mapping the first index to a first physical address;
writing the first data to a location indicated by the first physical address; and
and storing the first physical address corresponding to the first index and a first reference count in the memory.

12. The method of claim 11,
When the same index as the first index exists,
providing information indicating that the first data is redundant data to the host; and
Further comprising the step of receiving a deduplication request or a duplicate storage request from the host,
The data deduplication is performed in response to the deduplication request from the host, and the first data is stored in duplicate in response to the duplicate storage request from the host.

delete