KR20190088874A

KR20190088874A - A system and method for hybrid data reliability for object storage devices

Info

Publication number: KR20190088874A
Application number: KR1020180145797A
Authority: KR
Inventors: 레카 피추마니; 기양석
Original assignee: 삼성전자주식회사
Priority date: 2018-01-19
Filing date: 2018-11-22
Publication date: 2019-07-29
Also published as: JP7171452B2; JP2019128959A; CN110058806A; KR102663422B1; DE102018131523A1

Abstract

According to an embodiment of the present invention, a method for storing data in a key-value reliability system including the N number of storage devices grouped into a reliability group as a single logical unit and managed by a virtual device management layer comprises the steps of: determining whether data meets a threshold corresponding to a reliability mechanism for storing the data; selecting the reliability mechanism if the threshold is met; and storing the data according to the selected reliability mechanism, wherein N is an integer.

Description

[0001] SYSTEM AND METHOD FOR HYBRID DATA RELIABILITY FOR OBJECT STORAGE DEVICES [0002]

본 발명의 실시 예들의 하나 또는 그 이상의 사상들은 일반적으로 데이터 스토리지 시스템들과 관련되고, 더욱 상세하게는, 복수의 키-밸류 스토리지 장치들을 포함하는 키-밸류 신뢰성 시스템에 키-밸류 데이터를 신뢰할 수 있게 저장하는 신뢰성 메커니즘을 선택하는 방법에 관한 것이다. One or more aspects of embodiments of the present invention generally relate to data storage systems, and more particularly, to a key-value reliability system that includes a plurality of key- And a method for selecting a reliability mechanism for storing the data.

소거 코딩(erase coding)과 같은 데이터 신뢰성 메커니즘들(data reliability mechanisms)은 복수의 스토리지 장치들을 포함하는 다양한 설비들에서 스토리지 장치 고장들 및 데이터 변형(data corruptions)으로 인한 데이터 손실을 해결하는데 사용될 수 있다.Data reliability mechanisms, such as erase coding, can be used to address data loss due to storage device failures and data corruptions in a variety of facilities including a plurality of storage devices .

종래의 솔리드 스테이트 드라이브들(SSDs; solid state drives)은 일반적으로 블록 인터페이스(block interface)만 사용하고, RAID(redundant array of independent disks) 방식을 통해, 소거 코딩(erasure coding), 또는 복제(replication)를 통해 데이터 신뢰성을 제공할 수 있다. 객체 포맷들이 크기가 다양해지고, 조직화되지 않음에 따라, 객체 및 블록 레벨 인터페이스들 사이에서 효율적인 데이터 변환(data conversion)이 요구된다. 더욱이, 공산 효율성 및 고속 액세스 시간 특성들을 유지하면서 데이터 신뢰성을 보장하는 것이 필요하다. Conventional solid state drives (SSDs) generally use only a block interface, erasure coding, or replication through a redundant array of independent disks (RAID) Lt; RTI ID = 0.0 > reliability. &Lt; / RTI > As object formats vary in size and are not organized, efficient data conversion is required between object and block level interfaces. Furthermore, there is a need to ensure data reliability while maintaining the communicative efficiency and fast access time characteristics.

RAID와 같은 기법들은 종래의 블록 스토리지 장치들에 대하여 잘 연구되어 있다. 그러나 상대적으로 새로운 키-밸류 스토리지 장치들은 종래의 블록 장치들과 다른 인터페이스들 및 다른 스토리지 시맨틱(storage semantics)을 가질 수 있다. 따라서 다양한 새로운 키-밸류 스토리지 장치들은 키-밸류 데이터 및 키-밸류 스토리지 장치들에 적합하거나 또는 그것에 맞추어진 새로운 데이터 신뢰성 기법들로부터 잠재적인 이점을 가질 수 있다. Techniques such as RAID are well studied for conventional block storage devices. However, the relatively new key-value storage devices may have conventional block devices and other interfaces and other storage semantics. Thus, a variety of new key-value storage devices may have potential benefits from new data reliability techniques that fit or are tailored to key-value data and key-value storage devices.

실시 예들의 신뢰성 메커니즘들이 가상 장치 관리 레이어가 고장난 메모리 장치에 존재하는 모든 키들을 복구하고 새로운 메모리 장치로 복사할 수 있게 하는 단일 키 복구 절차를 각각 수행할 수 있기 때문에, 본문에서 기재된 실시 예들은 메모리 스토리지 분야에서 향상을 제공할 수 있다.Since the reliability mechanisms of the embodiments can each perform a single key recovery procedure that allows the virtual device management layer to recover all keys present in the failed memory device and copy them to the new memory device, And can provide improvements in storage.

본 발명의 실시 예에 따르면, 단일 논리적 유닛으로서 신뢰성 그룹으로 그룹화되고, 가상 장치 관리 레이어에 의해 관리되는 N(단, N은 정수)개의 스토리지 장치들을 포함하는 키-밸류 신뢰성 시스템의 데이터 저장 방법이 제공된다. 상기 방법은 상기 데이터가 상기 데이터를 저장하기 위한 신뢰성 메커니즘에 대응하는 임계치를 만족하는지 판별하는 단계, 상기 임계치가 만족된 경우, 상기 신뢰성 메커니즘을 선택하는 단계, 및 상기 선택된 신뢰성 메커니즘에 따라 상기 데이터를 저장하는 단계를 포함한다. According to an embodiment of the present invention, there is provided a data storage method of a key-value reliability system including N (where N is an integer) storage devices grouped into a reliability group as a single logical unit and managed by a virtual device management layer / RTI > The method comprising the steps of: determining if the data satisfies a threshold corresponding to a reliability mechanism for storing the data; if the threshold is satisfied, selecting the reliability mechanism; .

상기 임계치는 상기 데이터의 객체 크기, 상기 데이터의 처리량 고려(throughput consideration), 상기 데이터의 읽기/쓰기 온도(read/write temperature), 및 상기 N개의 스토리지 장치들의 기본 소거 코딩 능력 중 하나 또는 그 이상에 기반된다.Wherein the threshold is one or more of an object size of the data, a throughput consideration of the data, a read / write temperature of the data, and a basic erasure coding capability of the N storage devices .

상기 방법은 하나 또는 그 이상의 블룸 필터들 또는 캐시들을 사용하여 상기 신뢰성 메커니즘에 대한 상기 데이터를 테스트하는 단계를 더 포함한다.The method further includes testing the data for the reliability mechanism using one or more Bloom filters or caches.

상기 방법은 상기 선택된 신뢰성 메커니즘, 상기 데이터를 저장하는 상기 N개의 스토리지 장치들 각각에 대한 하나 또는 그 이상의 체크섬들, 상기 데이터를 저장하는 상기 N개의 스토리지 장치들 각각에 저장된 상기 데이터의 밸류들의 객체 크기들, 및 상기 N개의 스토리지 장치들 중 어떤 것에 상기 데이터가 저장되었는지를 가리키는 상기 N개의 스토리지 장치들의 패리티 그룹 멤버들의 위치를 기록하기 위한 메타데이터를 상기 데이터에 대응하는 키와 함께 삽입하는 단계를 더 포함한다.The method comprising: selecting the reliability mechanism, one or more checksums for each of the N storage devices storing the data, an object size of the data values stored in each of the N storage devices storing the data, And inserting metadata for recording the location of the parity group members of the N storage devices indicating which of the N storage devices have been stored with a key corresponding to the data .

상기 선택된 신뢰성 메커니즘은 객체 복제를 포함하고, 상기 데이터를 저장하는 단계는 KV 밸류를 선택하는 단계, 상기 선택된 KV 밸류에 대응하는 키를 해싱하기 위한 해시를 연산하는 단계, 상기 KV 밸류와 대응하는 키 객체들의 복제본들을 저장하기 위해 상기 N개의 스토리지 장치들 중 일부 스토리지 장치들을 판별하는 단계, 및 동일한 사용자 키명칭 하에서, 상기 판별된 일부 스토리지 장치들 각각에 상기 KV 밸류와 대응하는 업데이트된 밸류들을 기입하는 단계를 포함한다.Wherein the selected trust mechanism comprises an object copy, the step of storing the data comprises selecting a KV value, computing a hash for hashing the key corresponding to the selected KV value, Identifying some of the N storage devices to store replicas of objects, and writing the updated values corresponding to the KV value to each of the identified storage devices under the same user key name .

상기 선택된 신뢰성 메커니즘은 패킹(packing)을 포함하고, 상기 데이터를 저장하는 단계는 상기 신뢰성 그룹의 상기 N개의 스토리지 장치들 중 k(단, k는 정수)개의 스토리지 장치들에 저장된 k개의 키 객체들을 선택하는 단계, 상기 k개의 키 객체들에 대응하는 k개의 밸류 객체들을 회수하는 단계, 상기 k개의 밸류 객체들 전부의 가상 크기가 동일해 지도록 상기 k개의 밸류 객체들 중 가장 큰 밸류 크기를 갖지 않는 밸류 객체들의 종단들에 가상 제로들을 패딩하는 단계, 상기 k개의 키 객체들로부터 r(단, r은 정수)개의 패리티 객체들을 생성하는 단계, 상기 k개의 키 객체들을 상기 k개의 스토리지 장치들로 기입하는 단계, 및 상기 r개의 패리티 객체들을 상기 N개의 스토리지 장치들 중 r개의 스토리지 장치들에 기입하는 단계를 포함하고, 상기 r개의 스토리지 장치들 각각은 상기 k개의 스토리지 장치들과 구분되고, 단, k+r=N이다. Wherein the selected reliability mechanism comprises packing and wherein storing the data comprises storing k key objects stored in k (where k is an integer) storage devices of the N storage devices of the trust group The method comprising: retrieving k number of value objects corresponding to the k number of key objects, selecting one of the k number of value objects having the largest value size among the k number of value objects so that the virtual sizes of all k number of value objects become equal (K) key objects to the k storage devices, the method comprising the steps of: (a) padding virtual zeroes to endpoints of the value objects, generating r (where r is an integer) parity objects from the k key objects, And writing the r parity objects to r storage devices among the N storage devices, Each of the support device is separated from the k number of storage devices, a single, k + r = N.

상기 선택된 신뢰성 메커니즘은 전형적인 소거 코딩을 사용하는 패킹을 포함하고, 상기 N개의 스토리지 장치들은 전형적인 (k,r) MDS(maximum distance separable) 소거 코딩으로 구성된다.The selected reliability mechanism includes packing using typical erasure coding, and the N storage devices are configured with typical (k, r) maximum distance separable (MDS) erasure coding.

상기 선택된 신뢰성 메커니즘은 재생성 소거 코딩을 사용하는 패킹을 포함하고, 상기 N개의 스토리지 장치들은 (k,r,d) 재생성 소거 코딩으로 구성된다.The selected reliability mechanism includes packing using regenerative erasure coding, and the N storage devices are configured with (k, r, d) regenerative erasure coding.

상기 선택된 신뢰성 메커니즘은 스플릿팅을 포함하고, 상기 데이터를 저장하는 단계는 KV 밸류를 선택하는 단계, 상기 KV 밸류를 k(단, k는 정수)개의 동일 크기의 객체들로 분할하는 단계, 상기 k개의 동일 크기의 객체들로부터 r(단 r은 정수)개의 패리티 객체들을 생성하는 단계, 상기 선택된 KV 밸류에 대응하는 키를 해싱하기 위한 해시를 연산하는 단계, 상기 해시를 기반으로 상기 N개의 스토리지 장치들 중 상기 KV 밸류가 위치할 주요 장치를 판별하는 단계, 및 상기 주요 장치로부터 시작하여 연속하는 순서로 상기 k개의 동일 크기의 객체들 및 상기 r개의 패리티 객체들 각각을 상기 N개의 스토리지 장치들에 기입하는 단계를 포함하고, 단, k+r=N이다.Wherein the selected reliability mechanism includes splitting, wherein storing the data comprises: selecting a KV value; dividing the KV value into k (where k is an integer) equal sized objects; Generating a parity object of r (where r is an integer) number of objects of the same size, computing a hash for hashing the key corresponding to the selected KV value, Determining the major device in which the KV value will be located, and determining, for each of the K storage devices, the k identical-sized objects and the r parity objects in the consecutive order starting from the main device to the N storage devices , Where k + r = N.

상기 선택된 신뢰성 메커니즘은 전형적인 소거 코딩을 사용하는 스플릿팅을 포함하고, 상기 N개의 스토리지 장치들은 전형적인 (k,r) MDS(maximum distance separable) 소거 코딩으로 구성된다.The selected reliability mechanism includes splitting using typical erasure coding, and the N storage devices are configured with typical (k, r) maximum distance separable (MDS) erasure coding.

상기 선택된 신뢰성 메커니즘은 재생성 소거 코딩을 사용하는 패킹을 포함하고, 상기 N개의 스토리지 장치들은 (k,r,d) 재생성 소거 코딩으로 구성되고, 상기 데이터를 저장하는 단계는 상기 재생성 소거 코딩을 사용하여 상기 k개의 동일 크기의 객체들을 m(단, m은 정수)개의 서브패킷들로 분할하는 단계, 및 상기 r개의 패리티 객체들 각각을 m개의 패리티 서브패킷들로 분할할하는 단계를 더 포함한다.Wherein the selected reliability mechanism comprises packing using regenerative erasure coding, wherein the N storage devices are configured with (k, r, d) regenerative erasure coding, and wherein the step of storing the data comprises using the regenerative erasure coding Dividing the k equal sized objects into m (where m is an integer) subpackets, and dividing each of the r parity objects into m parity subpackets.

본 발명의 다른 실시 예에 따르면, 선택된 신뢰성 메커니즘을 기반으로 데이터를 저장하는 데이터 신뢰성 시스템이 제공된다. 상기 데이터 신뢰성 시스템은 선택된 신뢰성 메커니즘을 기반으로 데이터를 저장하는 데이터 신뢰성 시스템에 있어서, 스테이트리스 데이터 보호를 사용하여 가상 장치로서 구성되는 N(단, N은 정수)개의 스토리지 장치들, 및 상기 선택된 신뢰성 메커니즘에 따라, 상기 N개의 스토리지 장치들을 상기 가상 장치로서 관리하여 상기 N개의 스토리지 장치들 중 선택된 스토리지 장치들에 데이터를 저장하도록 구성된 가상 장치 관리 레이어를 포함하고, 상기 가상 장치 관리 레이어는 상기 데이터가 상기 데이터를 저장하기 위한 신뢰성 메커니즘에 대응하는 임계치를 만족하는지 판별하고, 상기 임계치가 만족된 경우, 상기 신뢰성 메커니즘을 선택하고, 상기 선택된 신뢰성 메커니즘에 따라 상기 데이터를 저장하도록 구성된다.According to another embodiment of the present invention, a data reliability system for storing data based on a selected reliability mechanism is provided. The data reliability system comprising: N (where N is an integer) storage devices configured as virtual devices using stateless data protection; And a virtual device management layer configured to manage the N storage devices as the virtual device and store data in selected ones of the N storage devices according to a mechanism, Determining whether the threshold meets a threshold corresponding to a reliability mechanism for storing the data, selecting the reliability mechanism if the threshold is satisfied, and storing the data according to the selected reliability mechanism.

상기 선택된 신뢰성 메커니즘은 객체 복제를 포함하고, 상기 가상 장치 관리 레이어는 KV 밸류를 선택하고, 상기 선택된 KV 밸류에 대응하는 키를 해싱하기 위한 해시를 연산하고, 상기 KV 밸류에 대응하는 키 객체들의 복제본들을 저장하기 위해 상기 N개의 스토리지 장치들 중 일부 스토리지 장치들을 판별하고, 동일한 사용자 키명칭하에서 상기 판별된 일부 스토리지 장치들 각각에 상기 KV 밸류에 대응하는 업데이트된 밸류들을 기입함으로써, 상기 데이터를 저장하도록 구성된다.Wherein the selected trust mechanism comprises an object copy, the virtual device management layer selects a KV value, computes a hash for hashing the key corresponding to the selected KV value, and replicates a key object corresponding to the KV value And storing the data by writing updated values corresponding to the KV value to each of the identified storage devices under the same user key name to identify the storage devices of the N storage devices .

상기 선택된 신뢰성 메커니즘은 패킹을 포함하고, 상기 가상 장치 관리 레이어는 상기 N개의 스토리리 장치들 중 k(단, k는 정수)개의 스토리지 장치들에 저장된 k개의 키 객체들을 선택하고, 상기 k개의 키 객체들에 대응하는 k개의 밸류 객체들을 회수하고, 상기 k개의 밸류 객체들 모두의 가상 밸류 크기가 동일해 지도록, 상기 k개의 밸류 객체들 중 가장 큰 밸류 크기를 갖지 않는 밸류 객체들의 종단들에 가상 제로들을 패딩하고, 상기 k개의 키 객체들로부터 r(단, r은 정수)개의 패리티 객체들을 생성하고, 상기 k개의 키 객체들을 상기 k개의 스토리지 장치들로 기입하고, 상기 r개의 패리티 객체들을 상기 N개의 스토리지 장치들 중 r개의 스토리지 장치들에 기입함으로써, 상기 데이터를 저장하도록 구성되고, 상기 r개의 스토리지 장치들 각각은 상기 k개의 스토리지 장치들과 구분되고, k+r=N이다.Wherein the selected reliability mechanism comprises packing and the virtual device management layer selects k key objects stored in k (where k is an integer) storage devices among the N storage devices, and the k keys Wherein the k value objects corresponding to the objects are collected and the virtual value magnitudes of all of the k value objects are equal, (R, integer) parity objects from the k key objects, writes the k key objects to the k storage devices, and maps the r parity objects to the k storage devices Wherein each of the r storage devices is configured to store the data by writing to r storage devices of the N storage devices, Of being distinguished from the storage device, and r + k = N.

상기 선택된 신뢰성 메커니즘은 스플릿팅을 포함하고, 상기 가상 장치 관리 레이어는 KV 밸류를 선택하고, 상기 KV 밸류를 k(단, k는 정수)개의 동일 크기의 객체들로 분할하고, 상기 k개의 동일 크기의 객체들로부터 r개의 패리티 객체들을 생성하고, 단, r은 정수, 상기 선택된 KV 밸류에 대응하는 키를 해싱하기 위한 해시를 연산하고, 상기 해시를 기반으로 상기 N개의 스토리지 장치들 중 상기 KV 밸류가 위치할 주요 장치를 판별하고, 상기 주요 장치로부터 시작하여 연속하는 순서로 상기 k개의 동일 크기의 객체들 및 상기 r개의 패리티 객체들 각각을 상기 N개의 스토리지 장치들에 기입함으로써, 상기 데이터를 저장하도록 구성되고, 단, k+r=N이다.Wherein the selected trust mechanism comprises splitting, the virtual device management layer chooses a KV value, divides the KV value into k (where k is an integer) equal sized objects, , Wherein r is an integer, calculates a hash for hashing the key corresponding to the selected KV value, and calculates the KV value of the N storage devices based on the hash, And writing each of the k identical-sized objects and the r parity objects to the N storage devices in a sequential order starting from the main device, thereby storing the data , Where k + r = N.

상기 선택된 신뢰성 메커니즘은 재생성 소거 코딩을 사용하는 스플릿팅을 포함하고, 상기 N개의 스토리지 장치들은 (k,r,d) 재생성 소거 코딩으로 구성되고, 상기 가상 장치 관리 레이어는 상기 재생성 소거 코딩을 사용하여 상기 k개의 동일 크기의 객체들을 m(단, m은 정수)개의 서브 패킷들로 분할하고, 상기 r개의 패리티 객체들 각각을 m개의 패리티 서브 패킷들로 분할함으로써, 상기 데이터를 저장하도록 더 구성된다.Wherein the selected reliability mechanism comprises splitting using regenerative erasure coding, the N storage devices are configured with (k, r, d) regenerative erasure coding, and the virtual device management layer uses the regenerative erasure coding And is further configured to store the data by dividing the k pieces of the same sized objects into m (where m is an integer) subpackets and dividing each of the r parity objects into m parity subpackets .

본 발명의 또 다른 실시 예에 따르면, 프로세서에 의해 실행되는 경우, 단일 논리적 유닛으로서 신뢰성 그룹으로 그룹화되고 가상 장치 관리 레이어에 의해 관리되는 N개의 스토리지 장치들을 포함하는 키-밸류 신뢰성 시스템에 데이터를 저장하는 방법이 수행되는 컴퓨터 코드를 포함하는 비-일시적 컴퓨터 판독 가능한 매체가 제공된다. 상기 방법은 상기 데이터가 상기 데이터를 저장하기 위한 신뢰성 메커니즘에 대응하는 임계치를 만족하는지 판별하는 단계, 상기 임계치가 만족된 경우, 상기 신뢰성 메커니즘을 선택하는 단계, 및 상기 선택된 신뢰성 메커니즘에 따라 데이터를 저장하는 단계를 포함한다.According to another embodiment of the invention, when executed by a processor, storing data in a key-value reliability system comprising N storage devices grouped into a reliability group as a single logical unit and managed by a virtual device management layer There is provided a non-transient computer readable medium comprising computer code in which a method is performed. The method comprising the steps of: determining if the data satisfies a threshold corresponding to a reliability mechanism for storing the data; if the threshold is met, selecting the reliability mechanism; and storing the data according to the selected reliability mechanism .

상기 선택된 신뢰성 메커니즘은 객체 복제를 포함하고, 상기 데이터를 저장하는 단계는 KV 밸류를 선택하는 단계, 상기 선택된 KV 밸류에 대응하는 키를 해싱하기 위한 해시를 연산하는 단계, 상기 KV 밸류에 대응하는 키 객체들의 복제본을 저장하기 위해 상기 N개의 스토리지 장치들 중 일부 스토리지 장치들을 선택하는 단계, 및 동일한 사용자 키명칭 하에서, 상기 KV 밸류에 대응하는 업데이트된 밸류들을 상기 판별된 일부 스토리지 장치들 각각에 기입하는 단계를 포함한다. Wherein the selected trust mechanism comprises an object copy, the step of storing the data comprises selecting a KV value, computing a hash for hashing the key corresponding to the selected KV value, calculating a key corresponding to the KV value Selecting some of the N storage devices to store a copy of the objects, and, under the same user key name, writing updated values corresponding to the KV value to each of the determined some storage devices .

상기 선택된 신뢰성 메커니즘은 패킹을 포함하고, 상기 데이터를 저장하는 단계는 상기 신뢰성 그룹의 상기 N개의 스토리지 장치들 중 k(단, k는 정수)개의 스토리지 장치들에 저장된 k개의 키 객체들을 선택하는 단계, 상기 k개의 키 객체들에 대응하는 k개의 밸류 객체들을 회수하는 단계, 상기 k개의 밸류 객체들 모두의 가상 밸류 크기가 동일해 지도록 상기 k개의 밸류 객체들 중 가장 큰 밸류 크기를 갖지 않는 밸류 객체들의 종단들에 가상 제로들을 패딩하는 단계, 상기 k개의 키 객체들로부터 r(단, r은 정수)개의 패리티 객체들을 생성하는 단계, 상기 k개의 키 객체들을 상기 k개의 스토리지 장치들에 기입하는 단계, 및 상기 r개의 패리티 객체들을 상기 N개의 스토리지 장치들 중 r개의 스토리지 장치들에 기입하는 단계를 포함하고, 상기 r개의 스토리지 장치들 각각은 상기 k개의 스토리지 장치들과 구분되고, k+r=N이다.Wherein the selected reliability mechanism comprises packing and wherein the storing the data comprises selecting k key objects stored in k (where k is an integer) storage devices of the N storage devices of the reliability group , Retrieving k value objects corresponding to the k key objects, calculating a value object having the largest value size among the k value objects so that the virtual value magnitudes of all the k value objects become equal, Generating r (where r is an integer) parity objects from the k key objects, writing the k key objects to the k storage devices, And writing the r parity objects to r storage devices of the N storage devices, wherein the r stories Each of the storage devices is distinguished from the k storage devices, and k + r = N.

상기 선택된 신뢰성 메커니즘은 스플릿팅을 포함하고, 상기 데이터를 저장하는 단계는 KV 밸류를 선택하는 단계, 상기 KV 밸류를 k(단, k는 정수)개의 동일 크기의 객체들로 분할하는 단계, 상기 k개의 동일 크기의 객체들로부터 r(단, r은 정수)개의 패리티 객체들을 생성하는 단계, 상기 선택된 KV 밸류에 대응하는 키를 해싱하기 위한 해시를 연산하는 단계, 상기 해시를 기반으로, 상기 N개의 스토리지 장치들 중 상기 KV 밸류가 위치할 주요 장치를 선택하는 단계, 및 상기 주요 장치로부터 시작하여 연속한 순서로, 상기 k개의 동일 크기의 객체들 및 상기 r개의 패리티 객체들 각각을 상기 N개의 스토리지 장치들에 기입하는 단계를 포함하고, k+r=N이다.Wherein the selected reliability mechanism includes splitting, wherein storing the data comprises: selecting a KV value; dividing the KV value into k (where k is an integer) equal sized objects; Generating r (where r is an integer) parity objects from the same sized objects, computing a hash for hashing the key corresponding to the selected KV value, Selecting one of the storage devices to which the KV value is to be placed; and selecting, from the main device, the k number of identical sized objects and each of the r parity objects in the N storage Lt; RTI ID = 0.0 > k + r = N. &Lt; / RTI >

상술된 및/또는 다른 사상들은 첨부된 도면들과 함께 이하의 실시 예들의 상세한 설명으로부터 좀 더 명확해질 것이다.
도 1은 본 발명의 실시 예에 따른, 선택된 신뢰성 메커니즘을 기반으로 키-밸류 데이터를 저장하는 키-밸류 신뢰성 시스템을 보여주는 블록도이다.
도 2는 본 발명의 실시 예에 따른, 키-밸류 쌍의 데이터의 크기에 대응하는 크기 임계치를 기반으로 키-밸류 신뢰성 시스템에 의해 사용될 신뢰성 메커니즘의 선택을 보여주는 순서도이다.
도 3은 본 발명의 실시 예에 따른, 전형적인 소거 코딩을 사용하는 K-객체 (k,r) 소거 코딩, 또는 다중 객체 "패킹(Packing)"의 신뢰성 메커니즘에 따라 키-밸류 데이터를 저장하도록 구성된 KV 스토리지 장치들의 그룹을 보여주는 블록도이다.
도 4는 본 발명의 실시 예에 따른, 전형적인 소거 코딩을 사용하는 K-객체 (k,r) 소거 코딩, 또는 다중 객체 "패킹(Packing)"의 신뢰성 메커니즘에 따른 밸류 객체들 및 패리티 객체들의 저장을 보여주는 블록도이다.
도 5는 본 발명의 실시 예에 따른, 전형적인 소거 코딩을 사용하는, 단일 객체 (k,r) 소거 코딩, 또는 "스플릿팅(Splitting)"의 신뢰성 메커니즘에 따라 키-밸류 데이터를 저장하도록 구성된 KV 스토리지 장치들의 그룹을 보여주는 블록도이다.
도 6은 본 발명의 실시 예에 따른, 재생성 소거 코딩을 사용하는 단일 객체(k,r,d) 소거 코딩, 또는 "스플릿팅(Splitting)"의 신뢰성 메커니즘에 따라 키-밸류 데이터를 저장하도록 구성된 KV 스토리지 장치들의 그룹을 보여주는 블록도이다. The foregoing and / or other aspects of the invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.
1 is a block diagram illustrating a key-value reliability system for storing key-value data based on a selected trust mechanism, in accordance with an embodiment of the present invention.
Figure 2 is a flow diagram illustrating the selection of a reliability mechanism to be used by a key-value reliability system based on a size threshold corresponding to a size of data of a key-value pair, in accordance with an embodiment of the present invention.
Figure 3 is a block diagram of an embodiment of the present invention configured to store key-value data according to a trust mechanism of K-object (k, r) erasure coding, or multiple object "Packing" using typical erasure coding Lt; / RTI > is a block diagram showing a group of KV storage devices.
Figure 4 illustrates the storage of value objects and parity objects according to a reliability mechanism of K-object (k, r) erasure coding, or multi-object "packing" using typical erasure coding, FIG.
Figure 5 is a block diagram of an exemplary embodiment of a KV (k, r) erasure coded or "Splitting" trust mechanism configured to store key-value data, using typical erasure coding, &Lt; / RTI > is a block diagram illustrating a group of storage devices.
Figure 6 is a block diagram of an embodiment of the present invention configured to store key-value data according to a reliability mechanism of single object (k, r, d) erasure coding, or "Splitting" using regenerative erasure coding Lt; / RTI > is a block diagram showing a group of KV storage devices.

본 발명의 다양한 특징들 및 본 발명을 달성하는 방법들은 첨부된 도면들 및 실시 예들에 대한 이하의 상세한 설명을 참조하여 더욱 상세하게 이해될 수 있다. 이하에서, 실시 예들은 첨부된 도면들을 참조하여 더욱 상세하게 설명될 것이다. 유사한 참조 번호들은 전체에서 유사한 구성 요소들을 지칭한다. 그러나 본 발명이 다양한 다른 형태로 구형될 수 있으며, 설명된 실시 예들에 국한되는 것으로 이해되어서는 안될 것이다. 오히려, 이러한 실시 예들은 본 발명의 전반적인 이해를 돕기 위한 예시로서 제공되며, 당업자에게 본 발명의 기술적 특징 및 양상을 충분히 전달할 것이다. 따라서, 당업자가 본 발명의 특징 및 양상을 완전하게 이해하는데 불필요한 프로세스들, 요소들, 및 기법들은 설명되지 않을 수 있다. 다르게 언급되지 않는 한, 첨부된 도면들 및 기재된 상세한 설명 전반에 걸쳐, 유사한 참조 번호들은 유사한 요소들을 지칭하며, 그것들의 설명은 반복되지 않을 것이다. 도면들에서, 요소들, 계층들, 및 영역들의 상대적인 크기는 명확성을 위하여 과장될 수 있다.The various features of the present invention and methods of achieving the present invention can be understood in more detail with reference to the following detailed description of the accompanying drawings and embodiments. In the following, embodiments will be described in more detail with reference to the accompanying drawings. Like numbers refer to like elements throughout. It should be understood, however, that the invention may be embodied in various other forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided as illustrations to assist in an overall understanding of the present invention, and will fully convey the technical features and aspects of the present invention to those skilled in the art. Accordingly, the processes, elements, and techniques that are not required for those of ordinary skill in the art to fully understand the features and aspects of the present invention may not be described. Unless otherwise stated, throughout the accompanying drawings and the detailed description that follows, like reference numerals refer to like elements, and their description will not be repeated. In the drawings, the relative sizes of elements, layers, and regions may be exaggerated for clarity.

다양한 실시 예들이 실시 예들 및/또는 중간 구조들의 개략도인 단면도를 참조하여 본문에서 설명된다. 이와 같이, 예를 들어, 제조 기술 및/또는 허용 오차의 결과로서 도시된 형태들로부터의 변형이 예상되어야 한다. 더욱이, 본문에서 기재된 특정한 구조적 또는 기능적 설명은 본 발명의 사상에 따른 실시 예들을 설명하기 위한 단순한 예시들이다. 즉, 본문에 기재된 실시 예들은 영역들의 특정하게 도시된 형태들에 제한되는 것으로 해석되어서는 안되며, 예를 들어, 제고의 결과인 형상의 편차를 포함하여야 한다. 예를 들어, 직사각으로 도시된 임플란트 영역은, 전형적으로, 임플란트로부터 비-임플란트 영역으로으로의 이진-변화라기보다는 그것들의 에지에서 둥글거나 또는 곡선의 특징들 및/또는 임플란트 농도의 기울기를 가질 것이다. 마찬가지로, 임플란트에 의해 형성된 매립 영역은 매립 영역과 임플란트가 방생하는 표면 사이의 영역에서 일부 임플란트를 야기할 수 있다. 즉, 도면들에 도시된 영역들은 본질적으로 개략적인 것이며, 그것들의 형상들이 장치의 영역의 실제 형상을 도시하는 것을 의되하지 않으며, 이를 제한하는 것이 아니다. 추가적으로, 당업자가 인식할 수 있는 바와 같이, 기재된 실시 예들은 본 발명의 사상 또는 범위로부터 벗어남 없이 다양한 다른 방식들로 변형될 수 있다. Various embodiments are described herein with reference to cross-sectional views that are schematic illustrations of embodiments and / or intermediate structures. As such, variations from the shapes shown, for example, as a result of manufacturing techniques and / or tolerances, should be expected. Moreover, the specific structural or functional description set forth in the text is merely illustrative of the embodiments in accordance with the teachings of the present invention. That is, the embodiments described herein should not be construed as limited to the specifically illustrated forms of the areas, but should include variations in shape, for example, as a result of enhancement. For example, implant regions shown in a rectangle will typically have round or curved features and / or slopes of implant concentration at their edges rather than binary-change from implant to non-implant region . Likewise, the buried region formed by the implant can cause some implants in the region between the buried region and the surface on which the implant dies. That is, the areas shown in the figures are schematic in nature, and their shapes are not intended to be limiting of, and not limiting, to the actual shape of the area of the device. In addition, as will be appreciated by those skilled in the art, the described embodiments may be modified in various other ways without departing from the spirit or scope of the invention.

이하의 상세한 설명에서, 설명의 편의 및 다양한 실시 예들의 이해를 돕기 위하여, 다수의 특정 설명들이 제공된다. 그러나 다양한 실시 예뜰이 상세한 설명 없이 또는 하나 또는 그 이상의 균등한 대체물과 함께 구현될 수 있다. 다른 예에서, 잘 알려진 구조들 및 장치들은 다양한 실시 예들을 불필요하게 모호하게 하게 하지 않도록 블록도의 형태로 도시된다. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the conveniences and various embodiments of the present invention. However, various embodiments may be implemented without detailed description or with one or more equivalent alternatives. In other instances, well-known structures and devices are shown in block diagram form in order not to unnecessarily obscure the various embodiments.

"제1(first)", "제2(second)", "제3(third)" 등과 같은 용어들은 다양한 요소들, 구성들, 영역들, 계층들, 및/또는 구역들을 설명하기 위하여 본문에서 사용되나, 이러한 요소들, 구성들, 영역들, 계층들, 및/또는 구역들은 이러한 용어들에 한정되지 않음이 잘 이해될 것이다. 이러한 용어들은 하나의 요소, 구성, 영역, 계층, 또는 구역을 다른 하나의 요소, 구성, 영역, 계층, 또는 구역으로부터 구분하기 위해서만 사용된다. 즉, 이하에서 기재되는 제1 요소, 구성, 영역, 계층, 또는 구역은 본 발명의 사상 및 범위로부터 벗어남 없이, 제2 요소, 구성, 영역, 계층, 또는 구역으로 불릴 수 있다. The terms "first," "second," "third," and the like are used interchangeably throughout this document to describe various elements, structures, regions, layers, and / But it will be understood that such elements, structures, regions, layers, and / or regions are not limited to these terms. These terms are used only to distinguish one element, structure, region, layer, or region from another element, structure, region, layer, or region. That is, the first element, structure, region, layer, or section described below may be referred to as a second element, structure, region, layer, or section without departing from the spirit and scope of the present invention.

"~의 아래(beneath, below, lower, under)", "~의 위(above, upper)" 등과 같은 공간적으로 상대적인 용어들(spatially relative terms)은 도면들에서 도시된 다른 하나의 요소(들) 또는 특징(들)과 하나의 요소 또는 특징들과의 관련성을 용이하게 설명하기 위하여 본문에서 사용될 수 있다. 공간적으로 상대적인 용어들은 도면에 도시된 지향성에 추가적으로 동작 또는 사용에서 장치의 다른 지향성(orientations)을 포함하는 것으로 의도됨이 잘 이해될 것이다. 예를 들어, 도면들에서 장치가 뒤집어지는 경우, 다른 요소들 또는 특징들의 "아래(below or beneath or under)"로 설명된 요소들은 다른 요소들 또는 특징들의 "상부(above)"로 향할 것이다. 즉, "아래(below, under)"의 예시적인 용어들은 위 및 아래의 방향들을 모두 포함할 수 있다. 장치가 다른 방향(예를 들어, 90도로 회전하거나 또는 다른 방향)을 지향할 수 있고, 본문에서 사용되는 공간적으로 상대적인 설명들은 이에 따라 해석되어야 한다. 유사하게, 제1 부분이 제2 부분 "상(on)"에 정렬된 것으로 설명된 경우, 이는 중력 방향을 기반으로 그것의 상부면에 대한 한정 없이, 제1 부분이 제2 부분의 상부면 또는 하부면에 정렬된 것을 가리킨다. Spatially relative terms, such as "beneath, below, lower, under", "above", and the like, refer to the other element (s) Or may be used in the text to readily describe the relationship between the feature (s) and one element or feature (s). It will be appreciated that spatially relative terms are intended to encompass different orientations of the device in operation or use, in addition to the orientation shown in the figures. For example, in the drawings, when an apparatus is inverted, elements described as "below or beneath or under" other elements or features will point to "above" other elements or features. That is, exemplary terms of "below, under" may include both up and down directions. The device may be oriented in a different direction (e. G., Rotated 90 degrees or in the other direction) and the spatially relative descriptions used in the text should be interpreted accordingly. Similarly, if the first portion is described as being aligned with the second portion "on ", it is understood that without limitation to its upper surface based on gravity direction, And the lower surface is aligned.

요소, 계층, 영역, 또는 구성이 다른 요소, 계층, 영역, 또는 구성과 "연결된(on, connected to, 또는 coupled to)" 것으로 지칭되는 경우, 다른 요소, 계층, 영역, 또는 구성과 직접적으로(directly) 연결되거나 또는 하나 또는 그 이상의 중간 요소들, 계층들, 영역들, 또는 구성들이 존재할 수 있다. 그러나 "직접적으로 연결된"의 용어는 하나의 구성 요소가 중간 구성 없이 다른 구성 요소와 직접적으로 연결되는 것을 지칭한다. 한편, "사이(between, immediately between)" 또는 "인접(adjacent to 또는 directly adjacent to)과 같이 구성 사이의 관계를 설명하는 다른 표현들은 유사하게 해석될 수 있다. 추가적으로, 요소 또는 계층이 두 개의 요소들 또는 계층들 사이(between)인 것으로 지칭되는 경우, 요소들 또는 구성들 사이에 오직 요소 또는 계층만 존재하거나 또는 하나 또는 그 이상의 중간 요소들 또는 계층들이 더 존재할 수 있는 것으로 이해될 수 있다. Layer, region, or configuration is referred to as being "on", "connected to", or "coupled to" another element, layer, directly connected, or may have one or more intermediate elements, layers, regions, or configurations. However, the term "directly connected" refers to the direct connection of one component to another without intermediate configuration. On the other hand, other expressions describing the relationship between configurations such as " between immediately immediately "or" adjacent to or directly adjacent to " Or between layers, it is to be understood that there may be only one element or layer between elements or configurations, or one or more intermediate elements or layers may be present.

본문에서 사용되는 용어들은 오직 특정한 실시 예들을 설명하기 위한 예시적인 것이며, 본 발명이 이에 한정되는 것은 아니다. 본문에서 사용되는 바와 같이, 맥락에서 명확히 다르게 지칭되지 않는 한, 단수 용어들은 복수의 형태들을 포함하는 것으로 의도된다. "포함하다"의 용어가 상세한 설명에서 사용되는 경우, 열겨된 특징들, 정수들, 단계들, 동작들, 요소들, 및/또는 구성들의 존재를 정의하나, 하나 또는 그 이상의 다른 특징들, 정수들, 단계들, 동작들, 요소들, 구성들, 및/또는 그것들의 그룹들의 존재 또는 추가를 배제하지 않는다. 본문에서 사용되는 바와 같이, "및/또는"의 용어는 관련되어 열거된 목록들의 하나 또는 그 이상의 모든 조합 또는 일부를 포함한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in this text, singular terms are intended to include the plural forms, unless the context clearly dictates otherwise. Where the term "comprises" is used in the detailed description, the term " comprising " when used in the detailed description defines the presence of stated features, integers, steps, operations, elements, and / Steps, operations, elements, configurations, and / or groups thereof. As used herein, the terms "and / or" include any and all combinations or portions of one or more of the lists listed in the associated context.

본문에서 사용되는 바와 같이, "실질적인(substantiall)", "약(about, approximately)"의 용어들 및 유사한 용어들은 근사의 용어들로서 사용되며, 정도의 용어들(terms of degree)로서 사용되지 않으며, 당업자에 의해 인식될 수 있는 측정되거나 또는 연산된 값들에서 고유한 편차들을 설명하는 것으로 의도된다. 본문에서 사용되는 "약(about, approximately)"의 용어는 언급된 값을 포함하며, 특정한 양의 측정과 연관된 오차 및 의심스러운 측정치(예를 들어, 측정 시스템의 한계들)를 고려한, 당업자에 의해 결정된 특정한 값에 대한 허용 가능한 편차 내임을 의미한다. 예를 들어, "약(about)"은 하나 또는 그 이상의 표준 편차들 내임을 또는 언급된 값의 ± 30%, 20%, 10%, 5% 내임을 의미할 수 있다. 본문에서 사용되는 바와 같이, "사용(use, using, 및 used)"의 용어들은 "활용(utilize, utilizing, 및 utilized)"과 동의어로 간주될 수 있다. 또한 "예시적인(exemplary)"의 용어는 " 예(example) 또는 설명illustration)을 지칭하는 것으로 의도된다.As used herein, the terms "substantiall", "about" and similar terms are used as approximate terms and are not used as terms of degree, Is intended to describe deviations inherent in the measured or calculated values that can be recognized by those skilled in the art. As used herein, the term "about, approximately" includes values mentioned, and is used by those skilled in the art, taking into account errors associated with a particular amount of measurement and suspicious measurements (e.g., Quot; means within an allowable deviation for a particular value determined. For example, "about" may mean within one or more standard deviations or within ± 30%, 20%, 10%, 5% of the stated value. As used herein, the terms "use, using, and used" may be considered synonymous with "utilize, utilized, and utilized". Also, the word "exemplary" is intended to refer to an "exemplary " or explanatory illustration.

특정한 실시 예가 다르게 구현되는 경우, 특정한 프로세스 순서가 기재된 순서와 다르게 수행될 수 있다. 예를 들어, 2개의 연속하여 설명된 프로세스들은 실질적으로 동시에 수행되거나 또는 기재된 순서와 반대의 순서로 수행딜 수 있다.If a particular embodiment is implemented differently, then the particular process order may be performed differently from the order described. For example, two consecutively described processes may be performed substantially concurrently or in the reverse order described.

본문에 기재된 본 발명의 실시 예들에 따른 전기 또는 전자 장치들 및/또는 다른 연관된 장치들 또는 구성들은 적절한 하드웨어어, 펌웨어(예를 들어, 애플리케이션-특정 집적 회로(ASIC; application-specific integrated circuit), 소프트웨어, 또는 소프트웨어, 펌웨어, 및 하드웨어의 조합을 사용하여 구현될 수 있다. 예를 들어, 이러한 장치들의 다양한 구성들은 하나의 집적 회로(IC; integrated circuit) 칩 또는 별도의 IC 칩들에 형성될 수 있다. 더욱이, 이러한 장치들의 다양한 구성들은 연성 인쇄 회로 필름(flexible printed circuit film), 테이프 캐리어 패키지(TCP; tape carrier package), 인쇄 회로 기판(PCB; printed circuit board)에 구현될 수 있거나 또는 하나의 기판(substrate) 상에 형성될 수 있다. 더욱이, 이러한 장치들의 다양한 구성들은 본문에서 설명된 다양한 기능들을 수행하기 위하여 다른 시스템 구성들과 통신하고, 컴퓨터 프로그램 명령어들을 실행하는, 하나 또는 그 이상의 컴퓨팅 장치들에서의, 하나 또는 그 이상의 프로세서들에서 구동하는 프로세스 또는 쓰레드일 수 있다. 컴퓨터 프로그램 명령어들은 RAM(random access memory)와 같은 표준 메모리 장치를 사용하는 컴퓨팅 장치에 구현될 수 있는 메모리에 저장된다. 컴퓨터 프로그램 명령어들은, 예를 들어, CD-ROM, 플래시 드라이브, 등과 같은 다른 비-일시적인 컴퓨터 판독 가능한 매체에 저장될 수 있다. 또한 당업자는 본 발명의 예시적인 실시 예들의 사상 및 양상으로부터의 벗어남 없이, 다양한 컴퓨팅 장치들의 기능이 단일 컴퓨팅 장치로 조합 또는 집적될 수 있거나 또는 특정 컴퓨팅 장치의 기능이 하나 또는 그 이상의 다른 컴퓨팅 장치들로 분산될 수 있음을 이해할 수 있다. The electrical and / or electronic devices and / or other associated devices or configurations in accordance with embodiments of the invention described herein may be implemented within a computer-readable medium such as a computer-readable medium, such as, for example, Software, or a combination of software, firmware, and hardware. For example, various configurations of such devices may be formed in an integrated circuit (IC) chip or in separate IC chips The various configurations of these devices may be implemented in a flexible printed circuit film, a tape carrier package (TCP), a printed circuit board (PCB) various configurations of such devices may be used to perform various functions described in the text, And may be a process or thread running on one or more processors in one or more computing devices that communicate with other system configurations and execute computer program instructions. The computer program instructions may be stored in other non-volatile computer readable media, such as, for example, a CD-ROM, flash drive, or the like. It should be understood by those skilled in the art that the functions of the various computing devices may be combined or integrated into a single computing device without departing from the spirit and aspects of the exemplary embodiments of the present invention, It will be understood that it may be distributed to other computing devices There.

다르게 정의되지 않는 한, 본문에서 사용되는 기술적/과학적 용어들을 포함하는 모든 용어들은 본 발명이 속하는 기술 분야에서의 당업자에 의해 공통적으로 이해되는 것과 동일한 의미를 갖는다. 공용 사전에 정의된 것들과 같은 용어들은 연관된 기술의 맥락 및/또는 본 발명의 상세한 설명에서 그것들의 의미와 일관된 의미를 갖는 것으로 해석되어야 할 것이며, 본문에 명세적으로 정의되지 않는 한, 이상적이거나 또는 지나치게 형식적인 의미로 해석되어서는 안 된다. Unless defined otherwise, all terms including technical / scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in the common dictionary shall be construed as having a meaning consistent with their meaning in the context of the related art and / or in the detailed description of the present invention, and are, unless defined specifically in the text, It should not be interpreted in an overly formal sense.

이하에서 설명되는 바와 같이, 본 발명의 실시 예들은 하나의 논리적 유닛으로 그룹화된 복수의 키-밸류(KV; key-value) 스토리지 장치들로 구성된 키-밸류 신뢰성 시스템(key-value reliability)에 키-밸류 데이터를 신뢰할 수 있게 저장하는 방법을 제공한다. 더욱이, 본 발명의 실시 예들은 드라이브들을 관리하고 키-밸류(KV; key-value) 쌍들의 저장을 제어하는 스테이트리스 하이브리스 신뢰성 관리자를 제공한다. 스테이트리스 하이브리스 신뢰성 관리자는 객체 복제(Object Replication); K-객체 (k,r) 소거 코딩-패킹(K-Object (k,r) erasure coding - Packing); 단일 객체 (k,r) 소거 코딩-스플릿팅(Single Object (k,r) erasure coding - Splitting); K-객체 (k,r,d) 재생성 코딩 - 패킹(K-Object (k,r,d) regeneration coding - Packing); 단일 객체 (k,r,d) 재생성 코딩-스플릿팅(Single Object (k,r,d) regeneration coding - Splitting)을 포함하는 복수의 플러그가능한 신뢰성 메커니즘들/기법들/구현들에 의존한다. As will be described below, embodiments of the present invention provide a key-value reliability system comprising a plurality of key-value (KV) storage devices grouped into one logical unit, - Provides a way to reliably store value data. Furthermore, embodiments of the present invention provide a stateless high reliability manager that manages drives and controls the storage of key-value (KV) pairs. Stateless Hibris Reliability Manager supports object replication; K-object (k, r) erasure coding-packing (K-Object (k, r) erasure coding-packing); Single Object (k, r) Erasure Coding - Splitting (Single Object (k, r) Erasure Coding - Splitting); K-object (k, r, d) regeneration coding-packing (K-Object (k, r, d)); , And a plurality of pluggable reliability mechanisms / techniques / implementations including a single object (k, r, d) regeneration coding-splitting (Single Object (k, r, d) regeneration coding-splitting).

복수의 플러그가능한 신뢰성 메커니즘들에 의존한 스테이트리스 하이브리드 신뢰성 관리자가 장치들을 관리할 수 있고, KV 쌍들의 저장을 제어할 수 있고, 신뢰성 메커니즘의 선택에 대하여 기재된 방법들이 다른 크기들의 KV 쌍들의 효율적인 저장, 회수, 및 복구를 보장할 수 있기 때문에, 기재된 실시 예들은 메모리 스토리지(예를 들어, 키-밸류 스토리지 장치들 내에 키-밸류 데이터의 저장)를 향상시킬 수 있다. A stateless hybrid reliability manager that is dependent on a plurality of pluggable reliability mechanisms can manage devices, control storage of KV pairs, and methods described for the selection of reliability mechanisms can efficiently store KV pairs of different sizes , Recovery, and recovery, the described embodiments may improve memory storage (e.g., storage of key-value data in key-value storage devices).

도 1은 본 발명의 실시 예에 따른, 선택된 신뢰성 메커니즘을 기반으로 키-밸류 데이터를 저장하는 키-밸류 신뢰성 시스템을 보여주는 블록도이다. 1 is a block diagram illustrating a key-value reliability system for storing key-value data based on a selected trust mechanism, in accordance with an embodiment of the present invention.

도 1을 참조하면, 상술된 바와 같이, 다양한 새로운 키-밸류(KV) 스토리지 장치들/메모리 장치들/드라이브들/KV-SSD들(130)은 키-밸류 데이터 및 KV 스토리지 장치들(130)에 채용되거나 또는 맞추어진 새로운 데이터 신뢰성 메커니즘으로부터 잠재적으로 이점을 얻을 수 있다. 따라서 이러한 KV 스토리지 장치들(130)을 위한 하이브리드 키-밸류 신뢰성 시스템은 하나 또는 그 이상의 플러그 가능 신뢰성 메커니즘들(pluggable reliability mechanisms)에 따라 하이브리드 신뢰성 메커니즘을 사용하여 KV 스토리지 장치들(130)을 관리하고, 그것들에 KV 쌍들의 저장을 제어하는 스테이트리스 하이브리드 신뢰성 관리자/가상 장치 관리자 레이어/가상 장치 관리 레이어(120)를 포함할 수 있다. 비록 본문에서 기재된 KV 스토리지 장치들을 지칭하는데 SSD들(solid-state drives)이 일반적으로 사용되나, 다른 스토리지 장치들이 본 발명의 실시 예에 따라 사용될 수 있다. 본 발명의 실시 예들에 따른 가상 장치 관리 레이어(120)의 디자인 및 동작들은 이하에서 설명될 것이다.1, a variety of new key-value (KV) storage devices / memory devices / drives / KV-SSDs 130 are coupled to key-value data and KV storage devices 130, Can be potentially benefited from the new data reliability mechanism employed or tailored to the system. Thus, a hybrid key-value reliability system for these KV storage devices 130 manages the KV storage devices 130 using a hybrid reliability mechanism in accordance with one or more pluggable reliability mechanisms < RTI ID = 0.0 > , And a stateless hybrid reliability manager / virtual device manager layer / virtual device management layer 120 that controls the storage of KV pairs in them. Although solid-state drives are commonly used to refer to the KV storage devices described herein, other storage devices may be used in accordance with embodiments of the present invention. The design and operation of the virtual device management layer 120 according to embodiments of the present invention will be described below.

현재 실시 예에서, 가상 장치 관리 레이어(120)는 키-밸류 신뢰성 시스템에 키-밸류 데이터/KV 쌍(170)을 신뢰성 있게 저장하는 방법을 가능하게 할 수 있다. 키-밸류 신뢰성 시스템은 하나의 논리적 유닛(logical unit)으로 그롭화된 복수의 KV 스토리지 장치들(130)을 포함할 수 있다. 논리적 유닛은 신뢰성 그룹(140)으로 지칭될 수 있다. In the current embodiment, virtual device management layer 120 may enable a method for reliably storing key-value data / KV pair 170 in a key-value trusted system. The key-value reliability system may include a plurality of KV storage devices 130 organized in one logical unit. The logical unit may be referred to as the reliability group 140.

신뢰성 그룹(140)의 KV 스토리지 장치들(130)은 키-밸류 데이터(170)와 대응할 수 있는, 소거 코딩된 데이터 및/또는 복제된 데이터의 각각의 청크들을 저장할 수 있다. 신뢰성 그룹(140)의 KV 스토리지 장치들(130)은 키-밸류 동작들이 가상 장치 관리 레이어(120)를 통해 지정(be directed to)되는 단일 가상 장치(110)로 나타난다.The KV storage devices 130 of the trust group 140 may store respective chunks of erasure coded data and / or replicated data, which may correspond to the key-value data 170. The KV storage devices 130 of the trust group 140 appear as a single virtual device 110 in which key-value operations are directed through the virtual device management layer 120. [

가상 장치(110)는 가상 장치 관리 레이어(120)로서 스테이트리스 하이브리드 신뢰성 관리자(stateless hybrid reliability manager)를 포함할 수 있다. 즉, 가상 장치 관리 레이어(120)는 스테이트리스 방식(즉, 키-밸류와 장치 사이의 맵핑을 유지하는 것을 필요로 하지 않음)으로 동작할 수 있다.The virtual device 110 may include a stateless hybrid reliability manager as the virtual device management layer 120. That is, the virtual device management layer 120 may operate in a stateless manner (i.e., it does not need to maintain the mapping between the key-value and the device).

따라서 가상 장치(110)는 N개의 KV 스토리지 장치들(130)(단, N은 정수)(예를 들어, KV-SSD들(130-1, 130-2, 130-3, 130-4,… 130-N))을 통해 키-밸류 데이터(170)를 저장할 수 있고, 가상 장치 관리 레이어(120)를 통해 KV 스토리지 장치들(130)에 키-밸류 데이터(170)를 저장할 수 있다. 즉, 가상 장치 관리 레이어(120)는 KV 스토리지 장치들(130)을 관리할 수 있고, 그것들에 KV 쌍들을 저장하는 것을 제어할 수 있다.Thus, the virtual device 110 may include N KV storage devices 130 (where N is an integer) (e.g., KV-SSDs 130-1, 130-2, 130-3, 130-4, ...). Value data 170 via the virtual device management layer 120 and store the key-value data 170 in the KV storage devices 130 via the virtual device management layer 120. The key- That is, the virtual device management layer 120 can manage the KV storage devices 130 and can control storing KV pairs in them.

다른 실시 예들에서, 키-밸류 신뢰성 시스템은 키-밸류 데이터(170)의 키들과 연관된 메타 데이터 및/또는 데이터를 선택적으로 저장하여 동작 속도를 향상시키는 캐시를 또한 포함할 수 있다. 신뢰성 메커니즘들은 KV 쌍들에 대응하는 메타데이터를 포함하도록 KV 쌍들의 밸류를 첨부할 수 있다. 즉, 키 및 밸류 모두는 메타데이터 식별자 "MetaID"와 대응하는 정보와 함께 첨부되어 KV 쌍에 특정한 추가적인 메타데이터를 저장할 수 있다. 메타데이터는 체크섬(checksum), 데이터를 저장하는데 사용되는 신뢰성 메커니즘들을 식별하기 위한 신뢰성 메커니즘 식별자, 소거 코드 식별자, 객체 크기들, 패리티 그룹 번호들의 위치 등을 포함할 수 있다.In other embodiments, the key-value reliability system may also include a cache that selectively stores metadata and / or data associated with the keys of the key-value data 170 to improve the speed of operation. Reliability mechanisms may attach values of KV pairs to include metadata corresponding to KV pairs. That is, both the key and the value can be attached with the metadata identifier "MetaID" and the corresponding information to store additional metadata specific to the KV pair. The metadata may include a checksum, a reliability mechanism identifier to identify the reliability mechanisms used to store the data, an erasure code identifier, object sizes, location of parity group numbers, and so on.

다른 실시 예들에서, 키-밸류 신뢰성 시스템은 이하에서 설명되는 신뢰성 메커니즘들과 대응하는 블룸 필터들(bloom filters)을 더 포함할 수 있다. 블룸 필터들은 대응하는 신뢰성 메커니즘을 사용하여 저장된 키들을 저장할 수 있고, 그에 따라, 읽기 동작들에서 키-밸류 신뢰성 시스템을 도울 수 있다. 따라서 키-밸류 신뢰성 시스템의 하나 또는 그 이상의 블룸 필터들 또는 캐시들은 현존하는 신뢰성 메커니즘들에 대하여 키들을 빠르게 테스트하는 것을 가능하게 할 수 있다. In other embodiments, the key-value reliability system may further include bloom filters corresponding to the reliability mechanisms described below. Bloom filters can store the stored keys using a corresponding reliability mechanism, and thus can help the key-value reliability system in read operations. Thus, one or more Bloom filters or caches in a key-value reliability system may be able to quickly test keys against existing reliability mechanisms.

본문에서 설명되는 신뢰성 메커니즘들 각각은 KV 스토리지 장치들(130) 중 대응하는 하나의 숫자의 키 모듈로(key modulo)에 대한 동일한 해시 함수를 사용하여 키-밸류 데이터(170)와 대응하는 KV 쌍의 첫 번째 사본 또는 청크를 먼저 저장할 수 있다. 즉, 플러그 가능한 신뢰성 메커니즘들 각각에 대하여, 신뢰성 메커니즘은 사용자 키와 동일한 키를 사용하여 적어도 첫 번째 사본/청크를 저장할 수 있다. Each of the reliability mechanisms described herein uses key-value data 170 and a corresponding KV pair 170 using the same hash function for the key modulo of the corresponding one of the KV storage devices 130. [ You can save the first copy or chunk first. That is, for each of the pluggable reliability mechanisms, the trust mechanism may store at least the first copy / chunk using the same key as the user key.

상술된 바와 같이, 본 발명의 실시 예들은 복수의 KV 스토리지 장치들(130)에 키-밸류 데이터(170)의 신뢰성 있는 저장을 보장하는 복수의 플러그 가능한 신뢰성 메커니즘들(multiple pluggable reliability mechanisms)을 제공한다. 따라서 가상 장치 관리 레이어(120)는 신뢰성 메커니즘들을 필요로 할 수 있고, 신뢰성 메커니즘들 중 어느 것이 사용되는지 판별할 수 있다.As described above, embodiments of the present invention provide multiple pluggable reliability mechanisms to ensure reliable storage of key-value data 170 in a plurality of KV storage devices 130 do. Thus, the virtual device management layer 120 may require reliability mechanisms and may determine which of the reliability mechanisms is used.

신뢰성 메커니즘들은 가상 장치(110)의 설치 동안 설정되는 밸류-크기 임계치들(value-size thresholds), 및/또는 객체 읽기/쓰기 주기(object read/write frequency)와 같은 정책들에 기반될 수 있다. 따라서 가상 장치 관리 레이어(120)는 시스템의 언급된 정책들을 기반으로 적절한 신뢰성 메커니즘을 선택할 수 있다. The trust mechanisms may be based on policies such as value-size thresholds set during installation of virtual machine 110, and / or object read / write frequency. Thus, the virtual device management layer 120 may select an appropriate reliability mechanism based on the mentioned policies of the system.

본 발명의 실시 예들의 5개의 신뢰성 메커니즘들에 대하여, 가상 장치 관리 레이어(120)에 의해 신뢰성 메커니즘이 어떻게 동작하는지, 그리고 신뢰성 메커니즘이 언제 적절하게 사용되고 선택될 수 있는지 이하에서 설명된다. 이러한 신뢰성 메커니즘들은 객체 복제(Object Replication), K-객체 (k,r) 소거 코딩 소거 코딩 - 패킹(K-Object (k,r) erasure coding - Packing), 단일 객체 (k,r) 소거 코딩 - 스플릿팅(Single Object (k,r) erasure coding - Splitting), K-객체 (k,r,d) 재생성 코딩 - 패킹(K-Object (k,r,d) regeneration coding - Packing), 및 단일 객체(k,r,d) 재생성 코딩 - 스플릿팅(Single Object (k,r,d) regeneration coding - Splitting)으로 지칭될 수 있다.For the five reliability mechanisms of the embodiments of the present invention, how the reliability mechanism operates by the virtual device management layer 120, and when the reliability mechanism can be used and selected appropriately is described below. These reliability mechanisms include Object Replication, K-Object (k, r) Erasure Coding-Packing (K-Object) (K, r, d) regeneration coding-packing), a single object (k, r) erasure coding-splitting, (k, r, d) regeneration coding (Splitting).

도 2는 본 발명의 실시 예에 따른, KV 쌍의 데이터의 크기에 대응하는 크기 임계치에 기반된 키-밸류 신뢰성 시스템에 의해 사용되는 신뢰성 메커니즘의 선택을 보여주는 순서도(200)이다. FIG. 2 is a flowchart 200 illustrating the selection of a reliability mechanism used by a key-value reliability system based on a magnitude threshold corresponding to a magnitude of KV pair of data, in accordance with an embodiment of the present invention.

도 2를 참조하면, 크기 임계치에 기반된 지원되는 신뢰성 메커니즘들의 전체(예를 들어, 5개의 상술된 신뢰성 메커니즘들)에 대하여, 가상 장치 관리 레이어(120)는 데이터(예를 들어, 키-밸류 데이터(170))의 밸류 크기를 판별할 수 있고, 밸류 크기가 각각의 신뢰성 메커니즘에 대응하는 주어진 임계치(t_i)보다 작은지 판별할 수 있고, 밸류 크기 임계치 요구를 만족하는 제1 신뢰성 메커니즘을 선택할 수 있다.Referring to Figure 2, for the entire supported reliability mechanisms (e.g., five of the above-described reliability mechanisms) based on size thresholds, the virtual device management layer 120 provides data (e.g., key- Data 170), determine whether the value of the value is less than a given threshold (t _i ) corresponding to the respective reliability mechanism, and determine a first reliability mechanism that meets the value-size threshold requirement You can choose.

예를 들어, S210에서, 가상 장치 관리 레이어(120)는 "n"(n은 정수)개의 크기 임계치 기반의 지원되는 신뢰성 메커니즘들을 수신할 수 있다. S220에서, 가상 장치 관리 레이어(120)는 1부터 n의 순서로 한번에 하나씩, 신뢰성 메커니즘들 각각을 단순히 검토할 수 있다. S230에서, 지원되는 신뢰성 메커니즘들 각각의 검토에서, 가상 장치 관리 레이어(120)는 데이터의 밸류 크기가 각각의 신뢰성 메커니즘과 대응하는 임계치(t_i)보다 작은지 판별할 수 있다.For example, at S210, the virtual device management layer 120 may receive supported reliability mechanisms based on "n " (n is an integer) size thresholds. At S220, the virtual device management layer 120 may simply review each of the reliability mechanisms, one at a time, in the order of 1 to n. In S230, supported review of each of the reliability mechanism that is, the virtual device management layer 120 may have the size of the value data can be determined, it is smaller than the threshold value (t _i) for each of the reliability and response mechanism.

S240에서, 데이터의 밸류 크기보다 크거나 같은 임계치(ti)를 갖는 신뢰성 메커니즘이 발견된 경우, 가상 장치 관리 레이어(120)는 사용을 위해 그 신뢰성 메커니즘을 선택할 수 있다. S250에서, S240에서 사용될 신뢰성 메커니즘이 판별되거나 또는 S220의 최종 반복에서 밸류 크기를 만족하는 적절한 임계치(t_i)를 갖는 신뢰성 메커니즘이 n개의 신뢰성 메커니즘들 중에서 없는 것으로 판별된 경우, 가상 장치 관리 레이어(120)는 사용될 신뢰성 메커니즘을 판별하는 것을 종료할 수 있다.At S240, if a trust mechanism with a threshold (ti) greater than or equal to the value size of the data is found, the virtual machine management layer 120 may select its trust mechanism for use. In S250, if the reliability mechanism to be used in S240 is determined, or if it is determined in the final iteration of S220 that the reliability mechanism with the appropriate threshold (t _i ) satisfying the value size is not among the n reliability mechanisms, 120 may terminate the determination of the reliability mechanism to be used.

현재 실시 예에서, "n"은 본문에 기재된 실시 예들의 5개의 다른 신뢰성 메커니즘에 따라 5와 같을 수 있다. 비교적 매우 작은 키-밸류들(즉, 밸류 크기가 상대적으로 작은 경우)에 대하여, 가상 장치 관리 레이어(120)는 사용을 위해 객체 복제(Object Replication)의 신뢰성 메커니즘을 선택할 수 있다. 약간 더 큰 키-밸류들에 대하여, 가상 장치 관리 레이어(120)는 패킹(Packing), 그리고 그 다음에 스플릿팅(Splitting)(예를 들어, 순서대로)의 신뢰성 메커니즘을 선택할 수 있는 한편, 각각에 대하여 전형적인 소거 코딩을 사용할 수 있다. 그러나 더 큰 키-밸류들에 대하여, 가상 장치 관리 레이어(120)는 패킹(Packing), 그 다음에 스플릿팅(Splitting)을 선택할 수 있는 한편, 전형적인 소거 코딩 대신에 재생성 소거 코딩(regeneration erasure coding)을 사용할 수 있다. In the present embodiment, "n" may be equal to 5 according to five different reliability mechanisms of the embodiments described herein. For relatively small key-values (i. E., When the value size is relatively small), the virtual device management layer 120 may select a reliability mechanism for object replication for use. For slightly larger key-values, the virtual device management layer 120 may choose a packing mechanism, and then a splitting (e.g., in order) reliability mechanism, while each A typical erasure coding can be used. However, for larger key-values, virtual device management layer 120 may choose packing, then splitting, while using regeneration erasure coding instead of typical erasure coding, Can be used.

본 발명의 실시 예들에서, 사용을 위한 신뢰성 메커니즘의 선택은 객체의 객체 크기, 객체 대한 처리량 요구 조건들, 대응하는 키-밸류 쌍의 읽기/쓰기 온도(read/write temperature), 복수의 KV 스토리지 장치들의 기본 코딩 능력들, 및/또는 키가 핫인지 또는 콜드인지에 대한 검출 중 하나 또는 그 이상에 기반될 수 있다. 예를 들어, "핫(hot)" 키들은, 그것들의 밸류 크기와 무관하게, 객체 복제(Object Replication)의 신뢰성 메커니즘을 사용할 수 있는 반면에, "콜드(cold)" 키들은 그것들의 밸류 크기에 따라 소거 코딩 방식의 신뢰성 메커니즘들 중 하나로 적용될 수 있다. 다른 실시 예로서, 객체 복제(Object Replication)의 신뢰성 메커니즘을 사용할지에 대한 결정은 크기 및 쓰기 온도 모두에 기반될 수 있다. 따라서 객체 읽기/쓰기 주기에 대응하는 임계치가, 도 2의 순서도(200)에서 크기에 대응하는 임계치 대신에, 신뢰성 메커니즘을 판별하는데 사용될 수 있다.In embodiments of the present invention, the choice of the trust mechanism for use may be based on the object size of the object, the throughput requirements for the object, the read / write temperature of the corresponding key-value pair, And / or detection of whether the key is hot or cold. &Lt; RTI ID = 0.0 > For example, "hot" keys can use the reliability mechanism of Object Replication, regardless of their value size, while "cold" Can be applied as one of the reliability mechanisms of the erasure coding scheme. As another example, the determination of whether to use the reliability mechanism of Object Replication may be based on both size and write temperature. Thus, a threshold corresponding to the object read / write period may be used to determine the reliability mechanism, instead of a threshold corresponding to magnitude in the flowchart 200 of FIG.

5개의 신뢰성 메커니즘들의 각각의 동작들은 이하에서 설명된다.The respective operations of the five reliability mechanisms are described below.

도 1을 다시 참조하면, 이전에 언급된 바와 같이, KV 쌍의 밸류 크기가 상대적으로 작은 경우, "객체 복제(Object Replication)"의 신뢰성 메커니즘이 가상 장치 관리 레이어(120)에 의한 선택에 적합할 수 있다. 객체 복제(Object Replication)는 객체마다(예를 들어, KV 쌍/키-밸류 데이터(170) 마다) 적용될 수 있다. 비록 객체 복제(Object Replication)의 신뢰성 메커니즘이 높은 스토리지 오버헤드(storage overhead)를 가지나, 낮은 읽기 및 복구 비용을 갖기 때문에, 매우 작은 밸류 크기들에 적합할 수 있다.Referring back to Figure 1, as previously mentioned, if the value size of the KV pair is relatively small, then the reliability mechanism of "Object Replication" is suitable for selection by the virtual device management layer 120 . Object Replication may be applied per object (e.g., per KV pair / key-value data 170). Although the reliability mechanism of Object Replication has high storage overhead, it has low read and recovery costs and can be suitable for very small value sizes.

객체 복제(Object Replication)의 신뢰성 메커니즘은 잦은 업데이트들을 갖는 키-밸류들(예를 들어, 키-밸류 데이터(170))에 또한 적합할 수 있고, 따라서 읽기 및 쓰기 주기를 기반으로 선택될 수 있다.The trust mechanism of object replication may also be suitable for key-values (e. G., Key-value data 170) with frequent updates, and thus may be selected based on read and write cycles .

객체 복제(Object Replication) 동안, 쓰기가 발생할 때마다, 객체/키-밸류 데이터(170)는 하나 또는 그 이상의 추가적인 KV 스토리지 장치들(130)로 복제된다. 키-밸류 데이터(170)의 주요 사본(primary copy)은 키 모듈로(N)의 해시에 의해 지정된 KV 스토리지 장치들(130) 중 하나에 위치될 수 있다. 키-밸류 데이터(170)의 주요 사본의 복제본(replicas)은 순환 방식으로, 연속한 KV 스토리지 장치들(130), 또는 바로 옆에 인접한(immediately adjacent) KV 스토리지 장치들(130)에 위치될 수 있다.During object replication, each time a write occurs, the object / key-value data 170 is replicated to one or more additional KV storage devices 130. A primary copy of the key-value data 170 may be located in one of the KV storage devices 130 designated by the hash of the key module N. Replicas of key copies of the key-value data 170 may be placed in a circular fashion, in successive KV storage devices 130, or immediately adjacent KV storage devices 130 have.

가상 장치 관리 레이어(120) 또는 사용자는 키-밸류 데이터(170)의 사본을 얼마나 많이 생성할지 결정할 수 있다. 예를 들어, 가상 장치 관리 레이어(120)를 사용하는 분산된 시스템은 3-웨이 복제(3-way replication)를 선택할 수 있고, 기본적으로 3-웨이 복제를 생성할 수 있다. 그러나 시스템의 사용자는 객체의 복제본의 개수를 선택된 기본값보다 많게 혹은 적게 구성할 수 있다. The virtual device management layer 120 or the user can determine how many copies of the key-value data 170 should be generated. For example, a distributed system using the virtual device management layer 120 may select 3-way replication and may basically create 3-way replication. However, users of the system can configure the number of replicas of the object to be more or less than the selected default value.

따라서, 예를 들어, 3-웨이 복제가 사용되고, 주요 KV 스토리지 장치(130-2)가 데이터(예를 들어, 키-밸류 데이터(170))의 주요 사본을 포함하는 경우, 가상 장치 관리 레이어(120)는 데이터의 주요 사본의 복제본을 후속 복제 KV 스토리지 장치들(130-3, 130-4)에 저장할 수 있고, 데이터의 모든 사본들은 동일하다. 즉, 데이터의 사본들은 데이터의 주요 사본을 포함하는 KV 스토리지 장치(130-2)에 후속하는 두 개의 (또는 그 이상의) 바로 옆에 후속한 KV 스토리지 장치들(130-3, 130-4)(예를 들어, 순환 방식)에 저장된다. Thus, for example, if three-way replication is used and the primary KV storage device 130-2 includes a major copy of the data (e.g., key-value data 170), then the virtual device management layer 120 may store a replica of the primary copy of the data in subsequent replica KV storage devices 130-3 and 130-4, and all copies of the data are identical. That is, the copies of the data may include two (or more) subsequent KV storage devices 130-3, 130-4 (also referred to as " KV storage devices ") immediately following the KV storage device 130-2 For example, a circulation method).

데이터의 사본들은 복제 KV 스토리지 장치들(130-3, 130-4)에서, 주요 KV 스토리지 장치(130-2)와 동일한 키명칭(keyname)/동일한 사용자 키(user key)하에서 저장될 수 있다. 데이터의 모든 사본들은 복제된 키-밸류 데이터(170)를 식별하기 위한 식별자 및 체크섬을 포함할 수 있다. Copies of the data may be stored in duplicate KV storage devices 130-3 and 130-4 under the same key name / same user key as the primary KV storage device 130-2. All copies of the data may include an identifier and a checksum to identify the replicated key-value data 170.

따라서 특정 KV 스토리지 장치(130)가 고장인 경우(예를 들어, KV 스토리지 장치(130-3)이 고장인 경우), 고장난 KV 스토리지 장치(130)의 바로 전후의 KV 스토리지 장치들(130)(예를 들어, KV 스토리지 장치(130-3)의 바로 전후의 KV 스토리지 장치들(130-2, 130-4))에서 키 명칭들에 대하여 복구 메커니즘(recovery mechanism)을 사용하여 밸류를 복구함으로써, 복제된 키가 복구되는 것을 보장한다. Thus, if a particular KV storage device 130 fails (e.g., KV storage device 130-3 fails), the KV storage devices 130 immediately before and after the failed KV storage device 130 By restoring the values using a recovery mechanism for the key names in the KV storage devices 130-2 and 130-4 immediately before and after the KV storage device 130-3, for example, Ensuring that the replicated key is recovered.

객체 복제(Object Replication)의 신뢰성 메커니즘을 정리하면, 가상 장치 관리 레이어(120)는 키-밸류 데이터(170)를 수신할 수 있고, 키 객체를 해싱하여 키 객체의 복제본들을 저장하는데 사용될 KV 스토리지 장치(130)를 판별할 수 있다. 가상 장치 관리 레이어(120)는, 이후에, 동일한 사용자 키명칭(예를 들어, 적절한 MetaID 필드)하에서, 업데이트된 밸류들을 선택된 KV 스토리지 장치들(130)(예를 들어, 선택된 KV 스토리지 장치들(130-2, 130-3, 130-4)로 기입할 수 있다.In summary, the virtual device management layer 120 can receive the key-value data 170 and hash the key object to create a KV storage device < RTI ID = 0.0 > (130). Virtual device management layer 120 may then update the updated values to selected KV storage devices 130 (e.g., selected KV storage devices (e. G. 130-2, 130-3, and 130-4.

도 3은 본 발명의 실시 예에 따른, 전형적인 소거 코딩(traditional erasure coding)을 사용하는 K-객체 (k,r) 소거 코딩(K-Object (k,r) erasure coding) 또는 다중 객체 "패킹(Packing)"(multiple object “”)의 신뢰성 메커니즘에 따라 키-밸류 데이터를 저장하도록 구성된 KV 스토리지 장치들의 그룹을 보여주는 블록도이다. FIG. 3 is a block diagram of an embodiment of a K-object (k, r) erasure coding or a multi-object "packing (K, r) erasure coding using traditional erasure coding, &Lt; / RTI > is a block diagram illustrating a group of KV storage devices configured to store key-value data in accordance with a reliability mechanism of a " Packing "

도 3을 참조하면, 전형적인 소거 코딩을 사용하는 패킹(Packing)의 신뢰성 메커니즘은 청크들로 분할되는 것(예를 들어, 좀 더 나은 데이터 처리량을 위하여)이 적절하지 않은 작은 밸류 크기들을 갖는 데이터를 위하여 선택될 수 있다. 예를 들어, 전형적인 소거 코딩을 사용하는 패킹(Packing)의 신뢰성 메커니즘은 이전에 설명된 객체 복제(Object Replication)의 신뢰성 메커니즘의 선택의 결과인 밸류 크기들보다 더 큰 밸류 크기들(그러나 여전히 상대적으로 작음)을 갖는 데이터를 위하여 가상 장치 관리 레이어(120)에 의해 선택될 수 있다. With reference to FIG. 3, the reliability mechanism of packing using typical erasure coding is that the data having small value sizes that are not suitable for partitioning into chunks (e.g., for better data throughput) Can be selected. For example, the reliability mechanism of Packing using typical erasure coding may be based on the larger value sizes (but still relatively large) than the value sizes that are the result of the selection of the reliability mechanism of the Object Replication described earlier May be selected by the virtual device management layer 120 for data having a small size (e.g., small).

전형적인 소거 코딩을 사용하는 패킹(Packing)은 전형적인 (k,r) MDS(maximum distance separable) 소거 코딩으로 구성될 수 있고, 체계적인 MDS 코드(systemic MDS code)와 함께 사용될 수 있다. 예로서, 소거 코드는 기본적으로 (4,2) 리드-솔로몬 코드(Reed-Solomon Code)일 수 있고, (4,2) 리드-솔로몬 코드는 상대적으로 잘 연구되어 있으며, 그에 대응하는 고속 구현 라이브러리가 쉽게 이용 가능하다. Packing using typical erasure coding can be composed of typical (k, r) maximum distance separable (MDS) erasure coding and can be used with systematic MDS codes. For example, the erase code may be essentially (4,2) Reed-Solomon code, (4,2) Reed-Solomon code is relatively well studied, and the corresponding fast implementation library Is readily available.

전형적인 소거 코딩을 사용하는 패킹(Packing)의 신뢰성 메커니즘을 사용하는 것에서, 동일한 패리티 그룹/소거 코드 그룹(340)의 일부인 k개의 다른 KV 스토리지 장치들(330)의 큐들로부터의 k개의 키들/키 객체들(350)은 선택되고 소거 코딩되어 패킹된다. (k는 정수임.)In using the reliability mechanism of packing using typical erasure coding, k keys / key objects from queues of k different KV storage devices 330 that are part of the same parity group / Are selected and erasure-coded and packed. (k is an integer).

예를 들어, 가상 장치 관리 레이어(120)는 각 KV 스토리지 장치(330)(예를 들어, 도 1의 신뢰성 그룹(140)의 각 KV 스토리지 장치(130))에 대한 최근에 기입된 키 객체들(350)의 버퍼를 유지하여, 가상 장치 관리 레이어(120)가 k개의 다른 KV 스토리지 장치들로부터 k개의 키 객체들(350)을 선택하여 소거 코딩될 수 있게 하고, 이에 따라 KV 쌍들에 대응하는 k개의 키 객체들(350)을 패킹한다. For example, virtual device management layer 120 may store recently written key objects for each KV storage device 330 (e.g., each KV storage device 130 in the trust group 140 of FIG. 1) Maintains a buffer of the KV pairs 350 to allow the virtual device management layer 120 to select and cancel the k key objects 350 from k different KV storage devices, k key objects 350 are packed.

현재 실시 예에서, 가상 장치 관리 레이어(120)는 4개의 다른 KV 스토리지 장치들(330-1, 330-3, 330-4, 330-N)로부터 각각 4개의 키 객체들(350x, 350y, 350b, 350c)를 선택한다. (현재 실시 예에서, k=4)In the current embodiment, the virtual device management layer 120 includes four key objects 350x, 350y, and 350b, respectively, from four different KV storage devices 330-1, 330-3, 330-4, , And 350c. (In the present embodiment, k = 4)

도 4는 본 발명의 실시 예에 따른, 전형적인 소거 코딩을 사용하는 K-객체 (k,r) 소거 코딩, 또는 다중 객체 "패킹(Packing)"의 신뢰성 메커니즘에 따른 밸류 객체들 및 패리티 객체들의 저장을 보여주는 블록도이다.Figure 4 illustrates the storage of value objects and parity objects according to a reliability mechanism of K-object (k, r) erasure coding, or multi-object "packing" using typical erasure coding, FIG.

도 3 및 도 4를 참조하면, 키 객체들(350)은 (키 모듈로 n의 해시)^th KV 스토리지 장치(330)에 위치된다. 즉, 각 키 객체(350)에 대하여, 특정 KV 스토리지 장치(330)의 큐로 전송될 수 있는 키 모듈로 n의 각각의 해시가 수행될 수 있다. 현재 실시 예에서, 제i 키(Key_i)(350-i)는 해시되고, KV-SSD1(330-1)에 위치되고, 제j 키(Key_j)(350-j)는 해시되고, KV-SSD2(330-2)에 위치되고, 제k 키(Key_k)(350-k)는 해시되고, KV-SSD4(330-4)에 위치된다. 3 and 4, key objects 350 are located in ^th KV storage device 330 (hash of n as key module). That is, for each key object 350, each hash of n into a key module that can be sent to a queue of a particular KV storage device 330 may be performed. In the present embodiment, the i-th key (Key _i ) 350-i is hashed and located in KV-SSD1 330-1, the j-th key (Key _j ) 350- -SSD2 330-2, and the k-th key (Key _k ) 350-k is hashed and located in the KV-SSD4 330-4.

저장된 각각의 밸류 객체들(450)의 사용자 밸류 길이/밸류 크기(462)는 기입된 것과 동일하다. 그러나 소거 코딩을 가능하게 하는 일관성을 위하여, 사용자 밸류들/밸류 객체들(450)은 그것들로 첨부된 "0" 필링들/가상 제로들/가상 제로 패딩(464)을 가짐으로써 모두 동일한 크기를 갖는 것으로 보여진다. 즉, 다른 밸류 객체들(450)의 각각의 사용자 밸류 크기들(462)이 바뀔 수 있기 때문에(즉, 밸류 객체들(450)은 가별하는 길이들을 가질 수 있거나 가별-길이 키 밸류들일 수 있다.), 가상 제로 패딩(464)의 방법을 구현함으로써(즉, 코딩을 위하여 가상 제로 패딩(464)의 제로들을 밸류 객체들(450)로 패딩하는 한편, 패딩된 제로들을 포함하는 밸류 객체들(450)을 나타내는 데이터를 실제 재기입하는 것을 피함으로써), 패리티 객체들(460)은 패리트 그룹(340)에서 가장 큰 객체(들)(470)과 동일한 크기를 가질 수 있다. 따라서, 현재 실시 예에서, 밸류 객체들(450) "Val x," "Val y," 및 "Val b"는 KV 스토리지 장치들에 실제로 저장되지 않는 가상 제로들로 패딩되고, 그로 인하여 밸류 객체(470) "Val c"와 동일한 크기를 갖는 것으로 보여진다. 그러므로 패리티 객체들(460)이 연산될 수 있다. The user value length / value size 462 of each stored value objects 450 is the same as written. However, for consistency in enabling erasure coding, the user value / value objects 450 have all the same size by having attached "0" peels / virtual zeroes / virtual zero padding 464 with them Respectively. That is, since each of the user value sizes 462 of the other value objects 450 may be changed (i.e., the value objects 450 may have distinct lengths or may be fractional-length key values). ), By implementing the method of virtual zero padding 464 (i. E. By padding the zeroes of virtual zero padding 464 with value objects 450 for coding, while implementing value objects 450 containing padded zeros 450 The parity objects 460 may have the same size as the largest object (s) 470 in the parity group 340, Thus, in the current embodiment, the value objects 450 "Val x", "Val y", and "Val b" are padded with virtual zeros that are not actually stored in the KV storage devices, 470) "Val c ". Thus, parity objects 460 can be computed.

k개의 키 객체들(350)의 코딩 이후에, 가상 장치 관리 레이어(120)는 k개의 키 객체들(350)과 대응하는 k개의 밸류들/k개의 밸류 객체들(450)로부터 r개의 패리티 객체들(460)을 연산할 수 있다. 이 때, r은 정수이고, k+r=N이고, N은 패리티 그룹(340)의 KV 스토리지 장치들(330)(예를 들어, 도 1의 신뢰성 그룹(140)의 N개의 KV 스토리지 장치들(130))의 개수이다.After coding the k key objects 350, the virtual device management layer 120 extracts r parity objects from k key objects 350 and corresponding k values / k value objects 450, (460). Where k is an integer and k + r = N and N is the KV storage devices 330 of the parity group 340 (e.g., N KV storage devices of the reliability group 140 of FIG. 1) (130).

가상 장치 관리 레이어(120)는 r개의 패리티 객체들(460)을 패리티 그룹(340)의 r개의 나머지 다른 KV 스토리지 장치들(330)(즉, k개의 키 객체들(350)이 선택되고 소거 코딩되는 큐들을 포함하는 k개의 KV 스토리지 장치들(330)로부터 r개의 KV 스토리지 장치들이 구분된다.)에 저장할 수 있다. 따라서, k개의 키 객체들(350) 및 r개의 패리티 객체들(460) 각각은 N개의 스토리지 장치들(330) 중 다른 각각에 저장될 수 있고, 그것들에 대응하는 데이터는 패리티 그룹(340)의 N개의 KV 스토리지 장치들(330) 각각에 균등하게 분산된다. The virtual device management layer 120 allocates r parity objects 460 to r other remaining KV storage devices 330 of the parity group 340 (i.e., k key objects 350 are selected and erasure- Lt; RTI ID = 0.0 > kV < / RTI > Each of the k key objects 350 and the r parity objects 460 may be stored in each of the N storage devices 330 and the corresponding data may be stored in the parity group 340 Is evenly distributed to each of the N KV storage devices 330. [

비록 읽기들 및 쓰기들이 전형적인 소거 코딩을 사용하는 패킹(Packing)의 신뢰성 메커니즘에 대하여 상대적으로 간단하나, 패리티의 재연산 및 복구는 덜 단순할 수 있다. 패리티의 복구 및 연산에 대하여(예를 들어, 업데이트의 경우), 어떤 키 객체(350)가 동일한 패리티 그룹(340)에 함께 그룹화되어 그로 인해 패리티의 연산을 수행하는게 하는 것인지를 인식하기 위하여, 키 객체들(350)의 그룹들에 대한 정보는, 각 밸류 객체(450)에 대한 실제 밸류 크기(462)(즉, 가상 제로 패딩(464)이 없는 밸류 크기(462))와 함께, KV 스토리지 장치들(330)(예를 들어, 도 1의 KV 스토리지 장치들(130))의 각각에 메타데이터 객체로서 저장될 수 있다. 따라서, 현재 실시 예에서, 추가적인 메타데이터가 키 객체들(350)(예를 들어, 도 1의 신뢰성 그룹(140)에 위치된 키 객체들(350)), 키 객체들(350)에 대응하는 밸류 객체들(450) 각각의 원래 길이, 및 키 객체들(350)의 코딩 순서로 KV 스토리지 장치들(130)을 저장하는데 사용될 수 있다. Although readings and writes are relatively simple relative to the reliability mechanism of packing using typical erasure coding, parity re-arithmetic and recovery may be less simple. In order to recognize which key objects 350 are grouped together in the same parity group 340 to thereby perform the operation of the parity for the restoration and operation of the parity (for example, in the case of an update) The information about the groups of objects 350 is stored in the KV storage device 450 together with the actual value size 462 for each value object 450 (i.e., the value size 462 without the virtual zero padding 464) (E.g., KV storage devices 130 of FIG. 1). Thus, in the current embodiment, additional metadata is associated with key objects 350 (e.g., key objects 350 located in the trust group 140 of FIG. 1), key objects 350 corresponding to key objects 350 May be used to store KV storage devices 130 in the original length of each of the value objects 450, and in the coding order of the key objects 350.

예를 들어, 메타데이터 객체 밸류는 신뢰성 그룹(140)의 키 객체들(350) 전체를 가리키고, 또한 밸류 객체들(450)의 밸류 크기들(462)을 가리키는 필드를 포함할 수 있고, 패리티 객체들 키들(즉, 가상 제로 패딩(464)의 제로들을 포함하는 밸류 객체들(450)), 패리티 객체들(460)의 밸류 크기들(462), 및 r개의 패리티 객체들(460)이 저장된 대응하는 r개의 KV 스토리지 장치(330)의 식별을 위한 장치 ID들을 가리키는 다른 필드를 포함할 수 있다. For example, the metadata object value may refer to the entire key objects 350 of the trust group 140 and may also include a field indicating the value sizes 462 of the value objects 450, (I.e., value objects 450 containing the zeros of the virtual zero padding 464), value sizes 462 of the parity objects 460, and r parity objects 460 stored Lt; RTI ID = 0.0 > KV < / RTI >

데이터는 사용자 키를 사용하여 저장될 수 있다. 메타데이터는 사용자 키 및 "Metadata"를 지칭하는 MetaID 지시자를 사용하여 형성된 내부 키에 저장될 수 있다. 더욱이, 밸류 객체들(450)이 어디에서 종료되는지 가상 제로 패딩(464)의 제로들이 어디에서 시작하는지 판별함으로써 밸류 객체들(450)이 재생성될 경우에 정확한 재구성을 위하여 밸류 크기들(462)은 메타데이터에 저장되어 가상 제로 패딩(464)의 위치(즉, 제로들이 어디에 추가되었는지)를 인식하게 할 수 있다. The data may be stored using a user key. The metadata may be stored in an internal key formed using a MetaID indicator denoting the user key and "Metadata ". Moreover, the value sizes 462 may be used for accurate reconstruction when the value objects 450 are regenerated by determining where the value objects 450 terminate and where the zeroes of the virtual zero padding 464 begin And may be stored in the meta data to recognize the location of the virtual zero padding 464 (i.e., where zeros are added).

KV 스토리지 장치들(330) 중 하나가 고장이면, 데이터 및 메타데이터 모두가 동일한 KV 스토리지 장치(330)에 저장될 수 있기 때문에, 데이터 및 메타 데이터가 잠재적으로 소실될 수 있고, 그로 인하여, 복구가 불가능할 수 있다. 그러나 이러한 상황을 방지하기 위해, 메타데이터 객체 밸류는 메타 데이터 객체 밸류에 대한 이전에 언급된 객체 복제(Object Replication)의 신뢰성 메커니즘을 구현할 수 있는 가상 장치 관리 레이어(120)의 "객체 복제 엔진(Object Replication Engine)"을 사용하여 복제될 수 있다.If one of the KV storage devices 330 fails, both data and metadata may be stored in the same KV storage device 330, so that data and metadata may potentially be lost, It may not be possible. However, in order to prevent such a situation, the metadata object value is set to "Object Replication Engine (Object) " of the virtual device management layer 120, which can implement the reliability mechanism of the previously mentioned Object Replication for the metadata object value Replication Engine) ".

추가적으로, 메타데이터 객체 밸류가 신뢰성 그룹(140)의 모든 객체들에 대해 동일하므로, KV 스토리지 장치(330)가 객체 연결(object linking)을 지원하는 경우, 동일한 메타데이터 객체 밸류는 동일한 KV 스토리지 장치(330)에 공통으로 위치하는 복수의 키 명칭들(keynames)로 연결될 수 있다. 더욱이, 집단 쓰기(batch writing)가 지원되는 경우, 객체 밸류들은 더 나은 처리량을 위해 함께 군집(batched)될 수 있다. In addition, if the KV storage device 330 supports object linking, the same metadata object values may be stored in the same KV storage device (e.g., 330 may be connected by a plurality of keynames located in common. Moreover, when batch writing is supported, object values can be batched together for better throughput.

현재 실시 예에 따른 전형적인 소거 코딩을 사용하는 패킹(Packing)의 신뢰성 메커니즘을 정리하면, 가상 장치 관리 레이어(120)는 버퍼를 통해 k개의 다른 KV 스토리지 장치들(330)로부터 k개의 최근에 저장된 키 객체들(350)을 선택할 수 있다. 가상 장치 관리 레이어(120)는, 이후에, 각각의 키 객체들(350)에 대응하는 밸류 객체(450)(패리티 그룹(440)의 가장 큰 밸류 객체(들)(470)과 다름)를 회수하고, 가상 제로 패딩(350)으로 패딩하여 밸류 객체들(450)을 동일한 크기(예를 들어, 가장 큰 밸류 객체(들)(470)의 크기)로 만들 수 있다. 가상 장치 관리 레이어(120)는, 이후에, MDS 코드 프로세스를 사용하여 k개의 키 객체들(350)로부터 r개의 패리티 객체들을 생성할 수 있다. 가상 장치 관리 레이어(120)는, 이후에, r개의 패리티 객체들(460)을 N개의 KV 스토리지 장치들(330) 중 키 객체들(350)이 선택되었던 k개의 KV 스토리지 장치들(330)과 다른 r개의 KV 스토리지 장치들(330)로 기입할 수 있다. 이 때, k+r은 N과 같다. 가상 장치 관리 레이어(120)는, 이후에, 상술된 정보를 나타내는 메타데이터 객체를 생성할 수 있다. 마지막으로, 가상 장치 관리 레이어(120)는 사용자 키 및 메타데이터 식별자로 형성된 키들과 함께 키 객체들(350) 및 패리티 객체들(460)을 N개의 KV 스토리지 장치들(330)(예를 들어, 복제 엔진과 유사함)로 기입할 수 있다.In summary, the virtual device management layer 120, through the buffer, stores k recently stored keys (k) from k different KV storage devices 330, Objects 350 may be selected. The virtual device management layer 120 then returns the value object 450 (different from the largest value object (s) 470 in the parity group 440) corresponding to each of the key objects 350 And padded with virtual zero padding 350 to make the value objects 450 the same size (e.g., the size of the largest value object (s) 470). Virtual device management layer 120 may then generate r parity objects from k key objects 350 using an MDS code process. The virtual device management layer 120 then allocates r parity objects 460 to k KV storage devices 330 where the key objects 350 of the N KV storage devices 330 were selected And may write to the other r KV storage devices 330. At this time, k + r is equal to N. The virtual device management layer 120 may thereafter generate a metadata object representing the above-described information. Finally, the virtual device management layer 120 associates the key objects 350 and the parity objects 460 with the keys formed of the user key and the metadata identifier to the N KV storage devices 330 (e.g., Similar to a replication engine).

도 5는 본 발명의 실시 예에 따른, 전형적인 소거 코딩을 사용하는 단일 객체 (k,r) 소거 코딩(Single Object (k,r) erasure coding), 또는 "스플릿팅(Splitting)"의 신뢰성 메커니즘에 따라 키-밸류 데이터를 저장하도록 구성된 KV 스토리지 장치들의 그룹을 보여주는 블록도이다.FIG. 5 is a block diagram of an embodiment of the present invention, in which a single object (k, r) erasure coding using typical erasure coding, or a reliability mechanism of "Splitting" Lt; / RTI > is a block diagram illustrating a group of KV storage devices configured to store key-value data in accordance with the present invention.

도 5를 참조하면, 이전에 언급된 객체 복제(Object Replication) 및 전형적인 소거 코딩을 사용하는 패킹(Packing)의 신뢰성 메커니즘들에 적합한 밸류들의 밸류 크기들보다 더 큰 밸류 크기들을 갖는 밸류들에 대하여, 가상 장치 관리 레이어(120)는 전형적인 소거 코딩을 사용하는 단일 객체 (k,r) 소거 코딩, 또는 "스플릿팅(splitting)"의 신뢰성 메커니즘을 선택할 수 있다. 전형적인 소거 코딩을 사용하는 스플릿팅의 신뢰성 메커니즘은 상대적으로 큰 밸류 크기를 갖고, KV 밸류(570)가 k개의 동일 크기의 스플릿들/청크들/밸류들/객체들(550)로 분할된 경우 좋은 처리량을 가질 수 있는 KV 밸류/객체(570)에 적합할 수 있는 객체/KV 쌍 단위의 신뢰성 메커니즘이다.5, for values having larger value sizes than the value sizes of the values suitable for the reliability mechanisms of Packing using the above-mentioned Object Replication and typical erasure coding, Virtual device management layer 120 may select a reliability mechanism of single object (k, r) erasure coding, or "splitting" using typical erasure coding. The reliability mechanism of splitting using typical erasure coding has a relatively large value size and may be advantageous if the KV value 570 is divided into k equal-sized splits / chunks / values / objects 550 KV pair-of-trust mechanism that may be suitable for a KV value / object 570 that may have throughput.

KV 밸류(570)를 분할한 이후에, 실시 예에 따르면, 가상 장치 관리 레이어(120)는 k개의 객체들(550) 각각에 대하여 체크섬을 연산할 수 있다. 그 이후에, 가상 장치 관리 레이어(120)는 k개의 객체들(550) 각각의 전에 메타데이터를 삽입할 수 있다.After partitioning the KV value 570, the virtual device management layer 120 may calculate a checksum for each of the k objects 550, according to an embodiment. Thereafter, the virtual device management layer 120 may insert metadata before each of the k objects 550.

전형적인 소거 코딩을 사용하는 스플릿팅은 KV 밸류(570)를 복수의 더 작은 객체들(550)로 분할하는 것, 그 이후에 KV 밸류(570)의 복수의 더 작은 객체들(550)을 k개의 연속한 스토리지 장치들(530) 상으로 분산하는 것을 포함할 수 있다. 따라서 k개의 동일 크기의 객체들(550)의 크기는 기본(underlying) KV 스토리지 장치들(530) 각각에 의해 지원될 수 있다. Splitting using typical erasure coding divides the KV value 570 into a plurality of smaller objects 550 followed by splitting a plurality of smaller objects 550 of the KV value 570 into k And spreading over successive storage devices 530. Thus, the size of k k-sized objects 550 may be supported by each of the underlying KV storage devices 530.

전형적인 소거 코딩을 사용하는 스플릿팅을 사용할 경우, 가상 장치 관리 레이어(120)는 체계적인 MDS 코드(예를 들어, 가상 장치 레이어(120)가 기본 코드로서, (4,2) 리드-솔로몬 코드와 같은 전형적인 (k,r) MDS 소거 코딩으로 구성될 수 있다.)를 사용하여 생성된 r개의 패리티 밸류들/객체들(560)을 또한 추가할 수 있다. 이후에, 상술된 전형적인 소거 코딩을 사용하는 패킹(Packing)의 신뢰성 메커니즘과 유사한 방식으로, 가상 장치 관리 레이어(120)는 k개의 객체들(550) 및 r개의 패리티 객체들(560)을 N개의 KV 스토리지 장치들(530)로 기입할 수 있다. (k+r=N)When using splitting using typical erasure coding, the virtual device management layer 120 may use systematic MDS code (e.g., the virtual device layer 120 as the base code, such as (4,2) Reed-Solomon code (K, r) MDS erasure coding), as well as the r parity values / objects 560 generated using the (k, r) MDS erasure coding. Thereafter, in a manner similar to the reliability mechanism of packing using the above-described typical erasure coding, the virtual device management layer 120 allocates k objects 550 and r parity objects 560 to N Lt; / RTI > storage devices (530). (k + r = N)

따라서 가상 장치 관리 레이어(120)는 상대적으로 큰 KV 밸류(570)를 k개의 객체들(550)로 분할할 수 있고, r개의 패리티 객체들(560)을 연산하고 추가할 수 있고, k개의 객체들(550) 및 r개의 패리티 객체들(560)을 k+r개의 KV 스토리지 장치들(530)에 저장할 수 있다.Therefore, the virtual device management layer 120 can divide a relatively large KV value 570 into k objects 550, calculate and add r parity objects 560, 550 and r parity objects 560 may be stored in k + r KV storage devices 530.

전형적인 소거 코딩을 사용하는 스플릿팅의 신뢰성 메커니즘을 사용하는 경우, KV 밸류(570)에 대응하는 키(580)를 해싱한 이후에, 가상 장치 관리 레이어(120)는 대응하는 객체를 저장(예를 들어, k개의 객체들(550) 중 첫 번째, 도 5의 실시 예에서 D1이 해쉬 마크 제로에 저장될 수 있음.)하기 위하여 주요 KV 스토리지 장치(530a)(예를 들어, 도 5에 도시된 실시 예에서, KV-SSD2)를 판별할 수 있다. k+r개의 객체들(550, 560)은 동일한 사용자 키명칭 하에서, 주요 KV 스토리지 장치(530a) 및 N-1개의 연속적인 KV 스토리지 장치들(530) 각각으로 기입될 수 있다. 즉, 도 5에 도시된 실시 예에서, k개의 객체들(550) 중 첫 번째는 주요 KV 스토리지 장치(530a) "KV-SSD2"에 기입될 수 있고, k개의 객체들(550) 중 나머지는 r개의 패리티 객체들(560)과 함께 KV 스토리지 장치들(530)"KV-SSD3" 내지 "KV-SSDN" 및 "KV-SSD1"에 순환 방식의 순서대로 기입된다. (예를 들어, 상술된 객체 복제의 신뢰성 메커니즘에 대하여 설명된 것과 유사한 방식임.)After hashing the key 580 corresponding to the KV value 570, the virtual device management layer 120 stores the corresponding object (e. G., &Lt; RTI ID = 0.0 > (E.g., first of the k objects 550, in the embodiment of FIG. 5, D1 may be stored in the hash mark zero), the primary KV storage device 530a In the embodiment, KV-SSD2) can be determined. The k + r objects 550 and 560 may be written into the primary KV storage device 530a and the N-1 contiguous KV storage devices 530, respectively, under the same user key name. 5, the first of the k objects 550 can be written to the primary KV storage device 530a "KV-SSD2", and the rest of the k objects 550 KV-SSD3 "to" KV-SSDN "and" KV-SSD1 "along with the r parity objects 560 in the order of the cyclic manner. (E. G., In a manner analogous to that described for the reliability mechanism of object replication described above).

전형적인 소거 코딩을 사용하는 스플릿팅의 신뢰성 메커니즘을 정리하면, 가상 장치 관리 레이어(120)는 상대적으로 큰 KV 객체(570)를 k개의 동일 크기의 객체들(550)로 분할할 수 있다. 가상 장치 관리 레이어(120)는, 이후에, MDC 코드 프로세스를 사용하여 k개의 객체들(550)에 대한 r개의 패리티 객체들(560)을 생성할 수 있다. 가상 장치 관리 레이어(120)는, 이후에, KV 밸류(570)에 대응하는 키를 해싱하여 객체가 위치할 주요 KV 스토리지 장치(530a)를 판별할 수 있다. 가상 장치 관리 레이어(120)는, 이후에, 가상 장치 관리 레이어(120)에 의해 생성되고 순환 방식으로 주요 KV 스토리지 장치(530a) 및 N-1개의 연속한 KV 스토리지 장치들(530)과 대응하는 적절한 MetaID 필드를 포함할 수 있는 동일한 사용자 키명칭 하에서 k+r 객체들(550, 560)을 기입할 수 있다.In summary, the virtual device management layer 120 may divide a relatively large KV object 570 into k equal sized objects 550, summarizing the reliability mechanism of splitting using typical erasure coding. The virtual device management layer 120 may then generate r parity objects 560 for k objects 550 using the MDC code process. The virtual device management layer 120 may then hash the key corresponding to the KV value 570 to determine the primary KV storage device 530a where the object will be located. The virtual device management layer 120 then creates a virtual KVM storage device 530a and N-1 contiguous KV storage devices 530, which are generated by the virtual device management layer 120 and in a circular fashion, It is possible to write the k + r objects 550 and 560 under the same user key name which may include an appropriate MetaID field.

도 3 및 도 4를 다시 참조하면, 다른 실시 예에 따르면, 가상 장치 관리 레이어(120)는 재생성 소거 코딩을 사용하는, K-객체 (k,r,d) 소거 코딩 또는 다중 객체 "패킹(Packing)"의 신뢰성 메커니즘을 선택할 수 있다(예를 들어, 도 2의 순서도(200)에 따라). 현재의 신뢰성 메커니즘은 가상 장치 관리 레이어(120)가 k개의 객체들을 k개의 KV 스토리지 장치들로 패킹한다는 점에서, 이전에 설명된 전형적인 소거 코딩을 사용하는 패킹(Packing)의 신뢰성 메커니즘과 유사하다. 그러나 재생성 소거 코딩을 사용하는 패킹(Packing)은 전형적인 (k,r) 소거 코드들을 사용하는 대신에, (k,r,d) 재생성 코드들을 사용한다. 따라서 도 3 및 도 4는 현재 실시 예에 대하여 일반적으로 참조될 수 있다.Referring again to Figures 3 and 4, in accordance with another embodiment, the virtual device management layer 120 may use K-object (k, r, d) erasure coding or multi-object "Packing (E. G., According to flowchart 200 of FIG. 2). &Lt; / RTI > The current reliability mechanism is similar to the reliability mechanism of packing using the previously described typical erasure coding in that the virtual device management layer 120 packs k objects into k KV storage devices. However, packing using regenerative erasure coding uses (k, r, d) regeneration codes instead of using typical (k, r) erasure codes. Thus, Figures 3 and 4 can be generally referenced for the current embodiment.

따라서 재생성 코드들이 적합하나, 그러나 객체들을 분할하는 것이 적합하지 않고, 객체들을 보존하는 것이 더욱 적합한 경우에, 재생성 소거 코딩을 사용하는 패킹이 사용될 수 있다. 재생성 소거 코딩을 사용하는 패킹은 이전에 언급된 객체 복제, 및 전형적인 소거 코딩을 사용하는 스플릿팅 및 패킹의 신뢰성 메커니즘들을 위하여 사용되는 밸류 크기보다 더 큰 밸류 크기들에 대하여 적합할 수 있다. 재생성 소거 코딩을 사용하는 패킹은 객체의 복수의 서브패킷들을 읽는 것이 전체 객체를 읽는 것보다 낮은 성능을 유발하지 않을 경우에 사용될 수 있다. 재생성 소거 코딩을 사용하는 패킹은, 기본 KV 스토리지 장치들(예를 들어, 도 1의 KV 스토리지 장치들(130), 또는 도 3의 KV 스토리지 장치들(330))이 복구/재구성 동안 보조할 수 있는 재생성 코드를 인식하는 KV 스토리지 장치들인 경우에, 적합할 수 있다. Thus, if the regeneration codes are suitable, but it is not feasible to partition the objects and it is more appropriate to preserve the objects, a packing using regenerative erasure coding may be used. Packing using regenerative erasure coding may be suitable for value sizes larger than the value sizes used for the previously mentioned object duplication and reliability mechanisms of the splitting and packing using typical erasure coding. Packing using regenerative erasure coding may be used when reading multiple subpackets of an object does not result in lower performance than reading the entire object. Packing using regenerative erasure coding may be used to prevent the underlying KV storage devices (e.g., KV storage devices 130 of FIG. 1, or KV storage devices 330 of FIG. 3) In the case of KV storage devices that are aware of the regeneration code.

도 6은 본 발명의 실시 예에 따른, 재생성 소거 코딩을 사용하는 단일 객체 (k,r,d) 소거 코딩, 또는 "스플릿팅(Splitting)"의 신뢰성 메커니즘에 따라 키-밸류 데이터를 저장하도록 구성된 KV 스토리지 장치들의 그룹을 보여주는 블록도이다. Figure 6 is a block diagram of an embodiment of the present invention configured to store key-value data according to a reliability mechanism of single object (k, r, d) erasure coding, or "Splitting" using regenerative erasure coding Lt; / RTI > is a block diagram showing a group of KV storage devices.

도 6을 참조하면, 현재의 신뢰성 메커니즘은, 전형적인 (k,r) MDS 소거 코딩 대신에 (k,r,d) 재생성 코드들을 사용한다는 점만 제외하면, 가상 장치 관리 레이어가 도 4에 도시된 바와 같이, 전형적인 소거 코딩을 사용하는 스플릿팅과 유사한 방식으로 동작하도록 할 수 있다. 재생성 소거 코딩을 사용하는 패킹과 같이, 현재의 신뢰성 메커니즘은 기본 KV 스토리지 장치들(630)이 복구/재구성 동안 보조할 수 있는 재생성 코드를 인식하는 KV 스토리지 장치들인 경우에 적합할 수 있다.6, the current reliability mechanism is similar to that shown in FIG. 4 except that the virtual device management layer uses (k, r, d) regeneration codes instead of the typical (k, r) MDS elimination coding Likewise, it can be made to operate in a manner similar to splitting using typical erasure coding. The current reliability mechanism, such as packing using regenerative erasure coding, may be suitable when KV storage devices 630 are KV storage devices that recognize the regeneration code that can assist during recovery / reconfiguration.

객체(670)가 이전에 기재된 신뢰성 메커니즘들과 대응하는 객체들보다 더 큰 밸류 크기를 갖고, 객체(670)의 k개의 스플릿들(680)의 복수의 서브패킷들(690)을 읽는 것이 전체 스플릿들(680)을 읽는 것(예를 들어, 전형적인 소거 코딩을 사용하는 스플리티의 신뢰성 메카니즘으로 수행됨.)보다 낮은 성능을 유발하지 않는 경우에 재생성 소거 코딩을 사용하는 스플릿팅이 적합할 수 있다. It may be desirable for object 670 to have a larger value size than the objects corresponding to the previously described reliability mechanisms and to read a plurality of subpackets 690 of k splits 680 of object 670, Splitting using regenerative erasure coding may be suitable if it does not result in lower performance than reading 680 (e.g., performed with the reliability mechanism of the splitting using typical erasure coding).

재생성 소거 코딩을 사용하는 스플릿팅의 신뢰성 메커니즘은 객체들이 k개의 동일 크기의 객체들/스플릿들(650)로 분할되고 스플릿들(650)이 다수의 서브패킷들(690)(예를 들어, 현재 실시 예에서, 스플릿(650) 당 4개의 서브패킷들(690)로 분할됨.)로 가상으로 더 분할되고, 객체(670)로부터의 복수의 서브패킷들(690)을 읽는 것이 전체 객체(670)를 읽는 것보다 더 나은 처리량을 갖는 경우에 적합한 처리량을 가질 수 있는 매우 큰 밸류 크기들을 갖는 객체들/KV 밸류들에 적합할 수 있는 객체(KV 쌍) 단위의 메커니즘이다. 밸류 크기는 모든 기본 KV 스토리지 장치들(630)에 의해 지원된다. The reliability mechanism of splitting using regenerative erasure coding is that the objects are divided into k equal sized objects / splits 650 and splits 650 are split into multiple subpackets 690 (e.g., (Which is divided into four subpackets 690 per split 650) in the embodiment, and reading multiple subpackets 690 from the object 670 is virtually divided by the entire object 670 (KV pair) units that may be suitable for objects / KV values with very large value sizes that can have the appropriate throughput if they have better throughput than if they had better throughput than if they had read through. The value size is supported by all primary KV storage devices 630.

전형적인 소거 코딩을 사용하는 스플릿팅과 유사하게, 도 4에 도시된 바와 같이, 현재 신뢰성 메커니즘의 가상 장치 관리 레이어(120)는 체계적인 재생성 코드를 사용하여 r개의 패리티 객체들(660)을 추가할 수 있고, k개의 스플릿들(650) 및 r개의 패리티 객체들(660)을 N개의 KV 스토리지 장치들(630)로 기입할 수 있다. (k+r=N) 그러나 r개의 패리티 객체들(660) 각각은 다수의 패리티 서브패킷들(692)(예를 들어, 스플릿/k 개의 객체(605) 당 서브패킷들(690)의 개수와 대응하는 숫자)로 분할될 수 있다. 전형적인 소거 코딩을 사용하는 스플릿팅과 달리, 현재 실시 예에서 기본 코드는 (4,2,5) 지그재그 코드(zigzag code)일 수 있다.Similar to splitting using typical erasure coding, the virtual device management layer 120 of the current reliability mechanism, as shown in FIG. 4, can add r parity objects 660 using a systematic regeneration code And may write k splits 650 and r parity objects 660 to the N KV storage devices 630. (k + r = N). However, each of the r parity objects 660 may include a number of parity subpackets 692 (e.g., the number of subpackets 690 per split / k objects 605) Corresponding numbers). Unlike splitting using typical erasure coding, in the present embodiment the base code can be a (4,2,5) zigzag code.

재생성 소거 코딩을 사용하는 스플릿팅의 신뢰성 메커니즘을 정리하면, 가상 장치 레이어(120)는 큰 KV 밸류(670)를 k개의 동일 크기의 객체들(650)로 분할 수 있다. 가상 장치 관리 레이어(120)는, 이후에, k개의 객체들(650) 각각을 m개의 동일 크기의 서브패킷들(690)로 분할 수 있다. m은 정수임. 가상 장치 관리 레이어(120)는, 이후에, 재생성 코딩 프로세스를 사용하여 k개의 객체들(650)에 대한 r개의 패리티 객체들(660)을 생성할 수 있고, r개의 패리티 객체들(660) 각각은 m개의 동일 크기의 패리티 서브패킷들(692)로 분할될 수 있다. 가상 장치 관리 레이어(120)는, 이후에, KV 밸류(670)에 대응하는 키를 해싱하여 객체가 위치할 주요 KV 스토리지 장치(630a)를 판별할 수 있다. 가상 장치 관리 레이어(120)는, 이후에, 가상 장치 관리 레이어(120)에 의해 생성되고 순환 방식으로 주요 KV 스토리지 장치(530a) 및 N-1개의 연속한 KV 스토리지 장치들(530)과 대응하는 적절한 MetaID 필드를 포함할 수 있는 동일한 사용자 키명칭 하에서 각각에 대하여 m개의 서브패킷들(690, 692)을 포함하는 k+r개의 객체들(650, 660)을 기입할 수 있다. By summarizing the reliability mechanism of splitting using regenerative erasure coding, the virtual device layer 120 can divide a large KV value 670 into k equal sized objects 650. The virtual device management layer 120 may then divide each of the k objects 650 into m equal-sized subpackets 690. m is an integer. The virtual device management layer 120 may then generate r parity objects 660 for k objects 650 using the regenerative coding process and each of the r parity objects 660 May be partitioned into m equal-sized parity subpackets 692. The virtual device management layer 120 may then hash the key corresponding to the KV value 670 to determine the primary KV storage device 630a where the object will be located. The virtual device management layer 120 then creates a virtual KVM storage device 530a and N-1 contiguous KV storage devices 530, which are generated by the virtual device management layer 120 and in a circular fashion, It is possible to write (k + r) objects 650, 660 including m subpackets 690, 692 for each under the same user key name that may include the appropriate MetaID field.

상술된 바에 따르면, 가상 장치 관리 레이어는 데이터의 하나 또는 그 이상의 속성들을 기반으로 데이터의 저장에 대하여 신뢰성 메커니즘들의 그룹으로부터 적절한 신뢰성 메커니즘을 선택할 수 있다. 따라서 본문에 기재된 실시 예들은 기재된 신뢰성 메커니즘들이 각각 단일 키 복구 절차를 수행할 수 있기 때문에 메모리 스토리지의 분야에서 향상을 제공한다. 전체 메모리 장치들이 고장인 경우, 본 발명의 실시 예의 가상 장치 관리 레이어는 고장난 메모리 장치에 존재하는 키들 전부를 복구하고 새로운 메모리 장치로 복사할 수 있다. 가상 장치 관리 레이어는 신뢰성 그룹의 고장난 메모리 장치와 인접한 메모리 장치들에 존재하는 키들 전체에 대한 반복 동작을 수행하고, 신뢰성 메커니즘이 고장난 메모리 장치에 존재하는 것으로 판단한 키들에 대하여 키 단위 복수 동작을 수행함으로써, 키들 전체의 복구 및 복사를 달성할 수 있다. As described above, the virtual device management layer may select an appropriate reliability mechanism from the group of reliability mechanisms for storage of data based on one or more attributes of the data. Thus, the embodiments described herein provide an improvement in the field of memory storage because the described reliability mechanisms can each perform a single key recovery procedure. If all memory devices are failing, the virtual device management layer of an embodiment of the present invention can recover all of the keys present in the failed memory device and copy them to the new memory device. The virtual device management layer performs an iterative operation on all the keys existing in the memory devices adjacent to the failed memory device of the reliability group and performs a plurality of key unit operations on the keys that the reliability mechanism judges that the failed mechanism exists in the failed memory device , Recovery and copying of the entire keys can be achieved.

기본 신뢰성 메커니즘들(예를 들어, 기본 스토리지 장치들 제약에 따른)에 의해 지원되는 것보다 큰 밸류 크기들을 갖는 매우 큰 KV 쌍들이 신뢰성 관리자에 의해 복수의 KV 쌍들로 명시적으로 분할되고, 신뢰성 메커니즘들이 복수의 스플릿들 및 스플릿 개수 정보를 밸류들에 저장된 메타데이터와 함께 저장하기 때문에, 기재된 실시 예들은 메모리 스토리지 분야에서 향상을 더 제공한다.Very large KV pairs having larger value sizes than those supported by the underlying reliability mechanisms (e.g., in accordance with the underlying storage devices constraints) are explicitly partitioned into a plurality of KV pairs by the reliability manager, The embodiments described further provide an improvement in the field of memory storage, as they store a plurality of splits and split count information together with the metadata stored in the values.

본문에 기재된 실시 예들은, 비록 특정한 용어들을 사용하였으나, 그것들은 일반적이고 기술적인 의미에서 해석되어야 하며, 이에 국한되지 않는다. 일부 예들에서, 본 발명이 속하는 기술 분야에서의 통상의 기술자에 의해 나타나는 바와 같이, 지시된 실시 예에 대하여 다르게 언급되지 않는 한, 특정 실시 예와 연관되어 설명된 특징들, 속성들, 및/또는 요소들은 다른 실시 예와 연관되어 설명된 특징들, 속성들, 및/또는 요소들과 조합되어 또는 독립적으로 사용될 수 있다. 따라서 당업자는 형태 또는 상세한 설명의 다양한 변형들이 그것들에 포함된 그것들의 기능적인 균등물로 상세한 설명 및 이하의 특허청구범위의 사상 및 범위로부터의 벗어남 없이 행해질 수 있음이 이해될 것이다.The embodiments described herein, although specific terms are employed, are to be interpreted in a general and technical sense, without being limited thereto. In some instances, unless otherwise stated with respect to the illustrated embodiment, as indicated by ordinary skill in the art to which this invention pertains, features, attributes, and / or features described in connection with the specific embodiment The elements may be used in combination or independently with the described features, attributes, and / or elements in connection with other embodiments. It will therefore be appreciated by those skilled in the art that various changes in form or details may be made therein without departing from the spirit and scope of the following claims and the detailed description thereof as a functional equivalent thereof.

Claims

A data storage method of a key-value reliability system including N (where N is an integer) storage devices grouped into a reliability group as a single logical unit and managed by a virtual device management layer,
Determining whether the data satisfies a threshold corresponding to a reliability mechanism for storing the data;
Selecting the trust mechanism if the threshold is satisfied; And
And storing the data in accordance with the selected reliability mechanism.

The method according to claim 1,
Wherein the threshold is one or more of an object size of the data, a throughput consideration of the data, a read / write temperature of the data, and a basic erasure coding capability of the N storage devices Based method.

The method according to claim 1,
Further comprising testing the data for the trust mechanism using one or more Bloom filters or caches.

The method according to claim 1,
The selected reliability mechanism, one or more checksums for each of the N storage devices storing the data, object sizes of values of the data stored in each of the N storage devices storing the data, Further comprising inserting metadata for recording the location of parity group members of the N storage devices together with a key corresponding to the data indicating to which of the N storage devices the data is stored .

The method according to claim 1,
Wherein the selected trust mechanism comprises object replication,
Wherein the step of storing the data comprises:
Selecting a KV value;
Computing a hash for hashing the key corresponding to the selected KV value;
Determining some of the N storage devices to store replicas of key objects corresponding to the KV value; And
And writing the updated values corresponding to the KV value to each of the determined partial storage devices under the same user key name.

The method according to claim 1,
Wherein the selected reliability mechanism comprises packing,
Wherein the step of storing the data comprises:
Selecting k key objects stored in k (where k is an integer) storage devices among the N storage devices of the reliability group;
Retrieving k value objects corresponding to the k key objects;
Padding virtual zeroes on the ends of the value objects having the largest value size among the k value objects so that the virtual sizes of all of the k value objects become equal;
Generating r (where r is an integer) parity objects from the k key objects;
Writing the k key objects into the k storage devices; And
And writing the r parity objects to r storage devices among the N storage devices,
Wherein each of the r storage devices is distinguished from the k storage devices, with the proviso that k + r = N.

The method according to claim 6,
The selected reliability mechanism includes packing using typical erasure coding,
Wherein the N storage devices are configured with typical (k, r) maximum distance separable (MDS) erasure coding.

The method according to claim 6,
Wherein the selected reliability mechanism comprises packing using regenerative erasure coding,
Wherein the N storage devices comprise (k, r, d) regenerative erasure coding.

The method according to claim 1,
Wherein the selected reliability mechanism comprises a splitting,
Wherein the step of storing the data comprises:
Selecting a KV value;
Dividing the KV value into k (where k is an integer) objects of the same size;
Generating r (r, integer) parity objects from the k equal-sized objects;
Computing a hash for hashing the key corresponding to the selected KV value;
Determining a primary device on which the KV value is to be located among the N storage devices based on the hash; And
Writing the k number of identical sized objects and each of the r parity objects to the N storage devices in a sequential order starting from the main device, with the proviso that when k + r = N .

10. The method of claim 9,
The selected reliability mechanism includes splitting using typical erasure coding,
Wherein the N storage devices are configured with typical (k, r) maximum distance separable (MDS) erasure coding.

10. The method of claim 9,
Wherein the selected reliability mechanism comprises packing using regenerative erasure coding,
Wherein the N storage devices comprise (k, r, d) regenerative erasure coding,
Wherein the step of storing the data comprises:
Dividing the k equal sized objects into m (where m is an integer) subpackets using the regenerative erasure coding; And
Further comprising partitioning each of the r parity objects into m parity subpackets.

A data reliability system for storing data based on a selected reliability mechanism,
N (where N is an integer) storage devices configured as virtual devices using stateless data protection; And
And a virtual device management layer configured to manage the N storage devices as the virtual device and store data in selected ones of the N storage devices according to the selected reliability mechanism,
The virtual device management layer comprising:
Determining if the data satisfies a threshold corresponding to a reliability mechanism for storing the data;
Select the trust mechanism if the threshold is satisfied;
And store the data in accordance with the selected trust mechanism.

13. The method of claim 12,
Wherein the selected trust mechanism comprises object replication,
The virtual device management layer comprising:
KV value is selected;
Calculate a hash for hashing the key corresponding to the selected KV value;
Determining some storage devices of the N storage devices to store replicas of key objects corresponding to the KV value;
And to write the updated values corresponding to the KV value to each of the determined some storage devices under the same user key name.

13. The method of claim 12,
Wherein the selected reliability mechanism comprises a packing,
The virtual device management layer comprising:
Selecting k key objects stored in k (where k is an integer) storage devices among the N storage devices;
Retrieving k value objects corresponding to the k key objects;
Padding virtual zeroes at the ends of the value objects having the largest value size among the k value objects such that the virtual value sizes of all of the k value objects become equal;
Generating r (where r is an integer) parity objects from the k key objects;
Write the k key objects into the k storage devices;
And storing the data by writing the r parity objects to r storage devices of the N storage devices,
Wherein each of the r storage devices is distinct from the k storage devices, and k + r = N.

13. The method of claim 12,
Wherein the selected reliability mechanism comprises a splitting,
The virtual device management layer comprising:
KV value is selected,
Dividing the KV value into k (where k is an integer) objects of the same size;
Generating r parity objects from k equal sized objects, where r is an integer;
Calculate a hash for hashing the key corresponding to the selected KV value;
Determining a main device on which the KV value is to be located among the N storage devices based on the hash;
Write the k pieces of the same size objects and the r parity objects to the N storage devices in a sequential order starting from the main device,
Where k + r = N.

16. The method of claim 15,
Wherein the selected reliability mechanism comprises splitting using regenerative erasure coding,
Wherein the N storage devices comprise (k, r, d) regenerative erasure coding,
Wherein the virtual device management layer divides the k pieces of the same size objects into m (where m is an integer) subpackets using the regenerative erasure coding, and allocates each of the r parity objects to m parity subpackets The data storage system further configured to store the data.

And computer code in which a method of storing data in a key-value reliability system, which is implemented as a single logical unit, is grouped into a reliability group and includes N storage devices managed by a virtual device management layer, For non-transient computer readable media,
The method comprising:
Determining whether the data satisfies a threshold corresponding to a reliability mechanism for storing the data;
Selecting the trust mechanism if the threshold is satisfied; And
And storing the data in accordance with the selected reliability mechanism.

18. The method of claim 17,
Wherein the selected trust mechanism comprises object replication,
Wherein the step of storing the data comprises:
Selecting a KV value;
Computing a hash for hashing the key corresponding to the selected KV value;
Selecting some of the N storage devices to store a replica of key objects corresponding to the KV value; And
Writing the updated values corresponding to the KV value to each of the determined some storage devices under the same user key name.

18. The method of claim 17,
Wherein the selected reliability mechanism comprises a packing,
Wherein the step of storing the data comprises:
Selecting k key objects stored in k (where k is an integer) storage devices among the N storage devices of the reliability group;
Retrieving k value objects corresponding to the k key objects;
Padding virtual zeroes at the ends of the value objects having the largest value size among the k value objects such that the virtual value sizes of all of the k value objects become equal;
Generating r (where r is an integer) parity objects from the k key objects;
Writing the k key objects to the k storage devices; And
And writing the r parity objects to r storage devices among the N storage devices,
Wherein each of the r storage devices is distinct from the k storage devices, and k + r = N.

18. The method of claim 17,
Wherein the selected reliability mechanism comprises a splitting,
Wherein the step of storing the data comprises:
Selecting a KV value;
Dividing the KV value into k (where k is an integer) objects of the same size;
Generating r (where r is an integer) parity objects from the k equal-sized objects;
Computing a hash for hashing the key corresponding to the selected KV value;
Selecting a primary device on which the KV value is to be located among the N storage devices based on the hash; And
Writing the k number of identical sized objects and each of the r parity objects to the N storage devices in a sequential order starting from the main device,
k + r = N. < / RTI >