RU2835373C9

RU2835373C9 - Method of arranging data in raid arrays for balanced load distribution during array recovery

Info

Publication number: RU2835373C9
Application number: RU2024118335A
Authority: RU
Inventors: Анна Игоревна Васенина; Иван Максимович Левицкий; Дмитрий Сергеевич Смирнов
Original assignee: Общество С Ограниченной Ответственностью "Швачер"
Filing date: 2024-07-02
Publication date: 2025-12-15

Abstract

FIELD: physics.

SUBSTANCE: invention relates to a method of arranging data in a RAID array for balanced load distribution during array recovery. Method comprises steps of creating a new RAID array using a procedure for generating a stripe map; number of free disks N, length of current stripe L and the length of the stripe arrangement map R is passed at the input of the procedure for generate a stripe map and based on the obtained data, a stripe map consisting of R concatenated permutations of the set {1, …, N} is formed; matrix of combinations M of size N×N is initialized, during which all its elements are assigned values of 0, the current stripe is initialized with an empty list; permutation is generated in the stripe map by calling the generate permutation procedure with the following input parameters: current value of combination matrix M, list of occupied disks in current stripe, length of current stripe L, number of disks in RAID array N; to generate one permutation based on input parameters, auxiliary structures are initialized, namely: list of free disks is initialized with numbers from 1 to N, list of disks in current permutation is initialized with empty list; iterative selection of a disk which is contained in the list of free disks, but is not contained in the list of occupied disks in the current stripe, using the search for the minimum sum of the elements of the combination matrix corresponding to the disk, and occupied disks in the current stripe; adding a disc to a list of occupied discs in the current stripe and to the end of the list of discs in the current permutation; once the length of the list of occupied disks in the current stripe has reached the length of the current stripe L, updating the combination matrix and assigning the list of occupied disks in the current stripe to an empty list value; procedure for generation of a permutation in a stripe arrangement map is repeated until the number of permutations reaches R.

EFFECT: faster recovery of a RAID array due to a data arrangement scheme which ensures uniform distribution of the read load across all disks during recovery of the array.

1 cl, 9 dwg

Description

ОБЛАСТЬ ТЕХНИКИAREA OF TECHNOLOGY

Заявленное техническое решение в общем относится к области вычислительной техники, а в частности к способу размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массиваThe claimed technical solution generally relates to the field of computing technology, and in particular to a method for placing data in RAID arrays for balanced load distribution during array recovery.

УРОВЕНЬ ТЕХНИКИLEVEL OF TECHNOLOGY

Одной из важнейших характеристик систем хранения данных является доступность хранимых данных. Иными словами, возможность непрерывно работать с данными в течении длительного времени. При использовании стандартных устройств хранения данных, таких как жесткие диски (HDD) или твердотельные накопители (SSD) велика вероятность поломки устройства и потери доступа к данным. Одним из универсальных способов увеличить доступность данных, хранимых на устройствах, является объединение этих устройств в RAID-массив с избыточностью хранения данных. Избыточность данных в RAID-массиве может обеспечиваться хранением полных копий данных (RAID-1, RAID-10) или хранением контрольно-восстановительных сумм для некоторых фрагментов данных (RAID-5, RAID-6, RAID-50, RAID-60). Рассмотрим подробнее второй вариант.One of the most important characteristics of data storage systems is the availability of stored data. In other words, the ability to continuously work with data over a long period of time. When using standard storage devices such as hard drives (HDDs) or solid-state drives (SSDs), there is a high risk of device failure and loss of access to data. One universal way to increase the availability of data stored on devices is to combine these devices into a RAID array with data redundancy. Data redundancy in a RAID array can be achieved by storing full copies of data (RAID-1, RAID-10) or storing checksums for certain data fragments (RAID-5, RAID-6, RAID-50, RAID-60). Let's take a closer look at the second option.

RAID-массивы с использованием контрольно-восстановительных сумм разбивают все хранимые данные на последовательные фрагменты, называемые страйпами. Страйп состоит из стрипов данных равного размера. Для обеспечения избыточности к стрипам данных так же добавляют стрипы контрольно-восстановительных сумм (1 для RAID-5, 2 для RAID-6). Каждый стрип хранится на некотором физическом устройстве, при этом ни одно устройство не входит в один и тот же страйп дважды. Чем больше дисков в RAID-массиве, тем больше вероятность поломки одновременно нескольких дисков, в связи с этим увеличивают количество дисков с контрольно-восстановительными суммами. Для масштабирования RAID-6 на большое количество дисков используется технология RAID-60, в котором диски разделяются на несколько групп. Каждая группа образует подобие RAID-6, но данные при этом чередуются между группами. То есть, если есть 2 группы дисков, то нечетные страйпы будут лежать на первой группе дисков, а четные на второй.RAID arrays using parity sums divide all stored data into consecutive fragments called stripes. A stripe consists of data stripes of equal size. To provide redundancy, parity sum stripes are added to the data stripes (1 for RAID-5, 2 for RAID-6). Each stripe is stored on a single physical device, and no device is included in the same stripe twice. The more disks in a RAID array, the greater the risk of multiple disk failures. Therefore, the number of disks with parity sums is increased. To scale RAID-6 to a large number of disks, RAID-60 technology is used, in which the disks are divided into several groups. Each group forms a similar RAID-6, but the data is interleaved between the groups. That is, if there are two groups of disks, the odd stripes will be on the first group of disks, and the even stripes on the second.

Восстановление RAID-массива с контрольно-восстановительными суммами происходит по страйпам. Если по каким-то причинам оказывается невозможным прочитать стрип с данными, то вместо этого читается вся информация с других стрипов, в том числе со стрипов с контрольно-восстановительными суммами. Используя всю информацию, хранящуюся внутри страйпа и алгоритм восстановления с контрольно-восстановительных сумм, можно точно восстановить данные, пока число поврежденных(отсутствующих) стрипов не превышает количества стрипов с контрольно-восстановительными суммами.RAID array recovery with checksums is stripe-by-strip. If for some reason a stripe containing data cannot be read, all information from other stripes is read instead, including those containing checksums. Using all the information stored within a stripe and the checksum recovery algorithm, data can be accurately recovered as long as the number of damaged (missing) stripes does not exceed the number of stripes containing checksums.

При установке нового диска взамен поврежденного производится аналогичная процедура, но данные не только вычисляются, но и записываются на новый диск. Данный процесс называется процессом восстановления RAID-массива. В случае использования RAID-60 для восстановления дисков читаются только те диски, которые были в одной группе со сломавшимся диском, диски остальных групп не используются, что может приводить к длительной перегрузке дисков и новым поломкам.When installing a new drive to replace a damaged one, a similar procedure is performed, but the data is not only calculated but also written to the new drive. This process is called RAID rebuilding. When using RAID 60, only the drives in the same group as the failed drive are read to rebuild the drives; drives in other groups are not used, which can lead to prolonged drive overload and further failures.

Одним из решений этой проблемы являются альтернативные способы размещения данных, предложенные в заявленном решении.One solution to this problem is the alternative data storage methods proposed in the stated solution.

Описанная логика RAID-массива может быть реализована двумя способами.The described RAID array logic can be implemented in two ways.

Первый способ - это использование аппаратного RAID-контроллера для управления массивом. В этом случае настройка производится до загрузки операционной системы, и операционная система не видит базовых устройств хранения данных.The first method is to use a hardware RAID controller to manage the array. In this case, configuration occurs before the operating system boots, and the underlying storage devices are invisible to the operating system.

Второй способ - это написание драйвера устройства в ядре операционной системы, такие устройства называют программно-определяемыми или виртуальными. В этом случае диски видны ядру операционной системы, и уже после загрузки драйвера создается новое устройство, которое работает с переданными ему устройствами хранения данных, реализуя переадресацию запросов базовым устройствам согласно заданной логике.The second method is to write a device driver in the operating system kernel; such devices are called software-defined or virtual. In this case, the disks are visible to the operating system kernel, and after the driver loads, a new device is created that operates on the assigned storage devices, forwarding requests to the underlying devices according to the specified logic.

Из уровня техники известен патент US9841908B1 «Declustered array of storage devices with chunk groups and support for multiple erasure schemes», патентообладатель Western Digital Technologies Inc, опубликован 12.12.2017. В данном решении описывается способ генерации сбалансированных неполных блок-дизайнов (balanced incomplete block designs, BIBD), способ генерации частичных сбалансированных неполных блок-дизайнов (PBIBD), а также способ применения сгенерированных блок-дизайнов для создания карты размещения страйпов (chunk group mapping table в терминологии патента). Генерация BIBD возможна только для конфигураций, описываемых формулой N=k², где N -количество дисков в RAID-массиве, k - количество дисков в страйпе. В ходе генерации используется случайно сгенерированная перестановка и последовательные операции вращения матриц (successive rotational operations). Генерация PBIBD используется только для тех конфигураций, для которых нельзя сгенерировать BIBD. Алгоритм использует последовательную псевдослучайную генерацию перестановок с оценкой параметров блок-дизайна на каждом шаге.The prior art includes patent US9841908B1 "Declustered array of storage devices with chunk groups and support for multiple erasure schemes" by Western Digital Technologies Inc., published on December 12, 2017. This solution describes a method for generating balanced incomplete block designs (BIBD), a method for generating partial balanced incomplete block designs (PBIBD), and a method for using the generated block designs to create a chunk group mapping table (chunk group mapping table in the patent terminology). BIBD generation is only possible for configurations described by the formula N= ^k2 , where N is the number of disks in the RAID array and k is the number of disks in a stripe. The generation uses a randomly generated permutation and successive rotational operations. PBIBD generation is used only for those configurations for which BIBD cannot be generated. The algorithm uses sequential pseudo-random generation of permutations with estimation of block design parameters at each step.

Недостатками описанного способа являются:The disadvantages of the described method are:

1. Применимость основного алгоритма генерации только для части возможных конфигураций.1. The applicability of the basic generation algorithm is only for a part of the possible configurations.

2. Использование псевдослучайных генераций требует хранения дополнительной информации на дисках.2. Using pseudo-random generation requires storing additional information on disks.

3. Время работы и длина вывода у алгоритма генерации PBIBD непредсказуемы и зависят от того, насколько удачно на каждой итерации генерируется перестановка.3. The running time and output length of the PBIBD generation algorithm are unpredictable and depend on how successfully the permutation is generated at each iteration.

4. Оцениваемые в патенте параметры блок-дизайна, такие как число блоков, содержащих любую точку, и число блоков, содержащих любые две точки, оценивают сгенерированный блок-дизайн только глобально, что может приводить к локальным перегруженным элементам.4. The block design parameters estimated in the patent, such as the number of blocks containing any point and the number of blocks containing any two points, evaluate the generated block design only globally, which may lead to locally overloaded elements.

Кроме того, из уровня техники известен патент EP2921960A2 «Method of, and apparatus for, accelerated data recovery in a storage system», патентообладатель Seagate Systems UK Ltd, опубликован 23.12.2015. В данном решении описывается способ раскладки данных, основанный на двух параметрах: ширине и количестве повторений, позволяющий оптимизировать восстановление RAID-массива через использование упреждающего чтения с базовых устройств. Подход состоит из трех основных шагов. На первом шаге определяются параметры ширины и количества повторений. Далее формируется матрица согласно выбранным параметрам, количеству дисков и количеству дисков в страйпе. На последнем шаге столбцы матрицы перемешиваются с помощью случайно сгенерированной перестановки.Also known in the prior art is patent EP2921960A2, "Method of, and apparatus for, accelerated data recovery in a storage system," owned by Seagate Systems UK Ltd. This patent describes a data layout method based on two parameters: width and number of repetitions, allowing for optimized RAID array recovery by using read-ahead from base devices. The approach consists of three main steps. In the first step, the width and number of repetitions are determined. Next, a matrix is formed based on the selected parameters, the number of disks, and the number of disks per stripe. In the final step, the matrix columns are shuffled using a randomly generated permutation.

Основными недостатками данного подхода являются:The main disadvantages of this approach are:

1. При оптимизации за счет упреждающих чтений часть дисков оказывается перегруженной запросами, что может замедлять обработку пользовательских запросов и приводить к увеличению вероятности отказа дисков, с которых производится восстановление.1. When optimizing using read-ahead, some disks become overloaded with requests, which can slow down the processing of user requests and lead to an increased likelihood of failure of the disks from which recovery is performed.

2. Использование псевдослучайной перестановки в конце генерации требует хранения дополнительной информации на дисках.2. Using pseudo-random permutation at the end of generation requires storing additional information on disks.

К сожалению, жесткие диски, которые являются на сегодняшний день основным хранилищем данных, не так надежны, как хотелось бы. И достаточно остро стоит проблема обезопасить свои файлы, чтобы не пришлось прибегать к восстановлению данных.Unfortunately, hard drives, which are the primary data storage device today, aren't as reliable as we'd like. Protecting your files to avoid data recovery is a pressing issue.

СУЩНОСТЬ ИЗОБРЕТЕНИЯESSENCE OF THE INVENTION

Недостатки известного уровня техники преодолеваются и преимущества обеспечиваются посредством предоставления компьютерно-реализуемого способа размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива.The disadvantages of the prior art are overcome and advantages are achieved by providing a computer-implemented method for placing data in RAID arrays for balanced load distribution during array rebuild.

Техническим результатом, достигающимся при решении данной проблемы, является обеспечение высокого уровня доступности данных, надежности хранения данных и увеличение скорости восстановления RAID-массива за счет схемы расположения данных, обеспечивающей равномерное распределение нагрузки чтения по всем дискам во время восстановления массива.The technical result achieved by solving this problem is to ensure a high level of data availability, data storage reliability, and an increase in the speed of RAID array recovery due to a data arrangement scheme that ensures uniform distribution of the read load across all disks during array recovery.

Кроме того, на разработанной схеме расположения данных удалось добиться максимального коэффициента несбалансированности не более чем 1.5 среди всех допустимых конфигураций (до 1024 дисков). За счет формирования страйпов с помощью матрицы сочетаний, хранящей в себе данные о том, сколько раз каждая пара дисков встречалась в одном страйпе осуществляют сбалансированное построение страйпов. Например, во время восстановления данных происходит чтение с тех дисков, которые входят в страйп с диском, восстановление которого производится.Furthermore, the developed data layout enabled us to achieve a maximum imbalance ratio of no more than 1.5 across all possible configurations (up to 1024 disks). By forming stripes using a combination matrix that stores data on how many times each pair of disks occurred in a single stripe, we achieve a balanced stripe construction. For example, during data recovery, reads are made from those disks that are in the same stripe as the disk being recovered.

К дополнительным эффектам можно отнести потребление оперативной памяти в объемах не более чем 2Мб (для худшего случая, 1024 диска) на один RAID-массив. Во время работы RAID-массива в оперативной памяти хранится только карта размещения страйпов. При фиксированной длине карты размещения страйпов в 1024 она хранит число элементов равное 1024 умножить на количество дисков. Для максимального количества дисков это 1024*1024 элемента. Размер элемента при этом 16 бит, так как этого достаточно для хранения чисел от 1 до 1024. Таким образом максимальное потребление памяти равно `1024*1024*16=16 777 216 бит=2 Мб.Additional effects include RAM consumption of no more than 2 MB (in the worst case, 1024 disks) per RAID array. While the RAID array is operating, only the stripe map is stored in RAM. With a fixed stripe map length of 1024, it stores a number of elements equal to 1024 times the number of disks. For the maximum number of disks, this is 1024 * 1024 elements. The element size is 16 bits, since this is enough to store numbers from 1 to 1024. Thus, the maximum memory consumption is 1024 * 1024 * 16 = 16,777,216 bits = 2 MB.

Указанный технический результат достигается благодаря осуществлению способа размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива, содержащий этапы, на которых:The specified technical result is achieved by implementing a method for placing data in RAID arrays for balanced load distribution during array recovery, comprising the following stages:

- создают новый RAID-массив с помощью процедуры генерации карты размещения страйпов (generate stripe map);- create a new RAID array using the generate stripe map procedure;

- на вход процедуры генерации карты размещения страйпов передают количество свободных дисков N, длину текущего страйпа L и длину карты размещения страйпов R и на основе полученных данных формируют карту размещения страйпов, состоящую из R конкатенированных перестановок множества {1, …, N};- the number of free disks N, the length of the current stripe L and the length of the stripe placement map R are passed to the input of the stripe placement map generation procedure and, based on the received data, a stripe placement map is formed consisting of R concatenated permutations of the set {1, …, N};

- осуществляют инициализацию матрицы сочетаний M размером NxN, во время которой присваивают всем ее элементам значения 0, текущий страйп инициализируется пустым списком;- initialize the matrix of combinations M of size NxN, during which all its elements are assigned the value 0, the current stripe is initialized with an empty list;

- выполняют генерацию перестановки в карте размещения страйпов путем вызова процедуры генерации перестановки (generate permutation) со следующими входными параметрами: текущее значения матрицы сочетаний M, список занятых дисков в текущем страйпе, длина текущего страйпа L, количество дисков в RAID-массиве N;- generate a permutation in the stripe allocation map by calling the generate permutation procedure with the following input parameters: the current value of the combination matrix M, the list of occupied disks in the current stripe, the length of the current stripe L, the number of disks in the RAID array N;

- для генерации одной перестановки на основе входных параметров осуществляют инициализацию вспомогательных структур, а именно: список свободных дисков инициализируется числами от 1 до N, список дисков в текущей перестановке инициализируется пустым списком;- to generate one permutation based on the input parameters, the auxiliary structures are initialized, namely: the list of free disks is initialized with numbers from 1 to N, the list of disks in the current permutation is initialized with an empty list;

- итеративно осуществляют выбор диска, который содержится в списке свободных дисков, но не содержится в списке занятых дисков в текущем страйпе, используя поиск минимальной суммы элементов матрицы сочетаний, соответствующих диску и занятых дисков в текущем страйпе;- iteratively select a disk that is contained in the list of free disks, but is not contained in the list of occupied disks in the current stripe, using the search for the minimum sum of the elements of the matrix of combinations corresponding to the disk and the occupied disks in the current stripe;

- добавляют диск в список занятых дисков в текущем страйпе и в конец списка дисков в текущей перестановке;- add the disk to the list of occupied disks in the current stripe and to the end of the list of disks in the current permutation;

- как только длина списка занятых дисков в текущем страйпе достигла длины текущего страйпа L, обновляют матрицу сочетаний и присваивают списку занятых дисков в текущем страйпе значение пустого списка;- as soon as the length of the list of occupied disks in the current stripe has reached the length of the current stripe L, the combination matrix is updated and the list of occupied disks in the current stripe is assigned the value of the empty list;

- повторяют процедуру генерации перестановки в карте размещения страйпов до тех пор, пока количество перестановок не достигло R.- repeat the procedure of generating a permutation in the stripe placement map until the number of permutations reaches R.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF DRAWINGS

Признаки и преимущества настоящего технического решения станут очевидными из приводимого ниже подробного описания и прилагаемых чертежей.The features and advantages of the present technical solution will become apparent from the detailed description below and the attached drawings.

Фиг. 1 - иллюстрирует блок-схему выполнения заявленного способа;Fig. 1 - illustrates a block diagram of the implementation of the claimed method;

Фиг. 2 - иллюстрирует состояния матрицы сочетаний после генерации трех перестановок в карте размещения страйпов;Fig. 2 - illustrates the states of the combination matrix after generating three permutations in the stripe placement map;

Фиг. 3 - иллюстрирует требования к карте размещения страйпов;Fig. 3 - illustrates the requirements for the stripe placement map;

Фиг. 4 - иллюстрирует пример итерации в генерации перестановки;Fig. 4 - illustrates an example of iteration in permutation generation;

Фиг. 5 - иллюстрирует использование карты размещения страйпов для вычисления физического размещения страйпов;Fig. 5 - illustrates the use of a stripe placement map to calculate the physical placement of stripes;

Фиг. 6 - иллюстрирует блок-схему процедуры генерации перестановок;Fig. 6 - illustrates a block diagram of the permutation generation procedure;

Фиг. 7 - иллюстрирует блок-схему процедуры поиска кандидата на добавление в страйп;Fig. 7 - illustrates a flow chart of the procedure for searching for a candidate for addition to the stripe;

Фиг. 8 - иллюстрирует блок-схему процедуры обновления матрицы сочетаний;Fig. 8 - illustrates a flow chart of the procedure for updating the combination matrix;

Фиг. 9 - иллюстрирует общий пример вычислительного устройства.Fig. 9 - illustrates a general example of a computing device.

ДЕТАЛЬНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

В приведенном ниже подробном описании реализации изобретения приведены многочисленные детали реализации, призванные обеспечить отчетливое понимание настоящего изобретения. Однако, квалифицированному в предметной области специалисту, будет очевидно каким образом можно использовать настоящее изобретение, как с данными деталями реализации, так и без них. В других случаях хорошо известные методы, процедуры и компоненты не были описаны подробно, чтобы не затруднять излишне понимание особенностей настоящего изобретения.The following detailed description of the invention includes numerous implementation details intended to provide a clear understanding of the present invention. However, one skilled in the art will readily appreciate how the present invention may be utilized with or without these implementation details. In other instances, well-known methods, procedures, and components have not been described in detail to avoid unnecessarily obscuring the features of the present invention.

Кроме того, из приведенного изложения будет ясно, что изобретение не ограничивается приведенной реализацией. Многочисленные возможные модификации, изменения, вариации и замены, сохраняющие суть и форму настоящего изобретения, будут очевидными для квалифицированных в предметной области специалистов.Furthermore, it will be clear from the foregoing description that the invention is not limited to the embodiment described. Numerous possible modifications, changes, variations, and substitutions, while preserving the spirit and form of the present invention, will be apparent to those skilled in the art.

Ниже будут описаны понятия и термины, необходимые для понимания данного технического решения.Below we will describe the concepts and terms necessary for understanding this technical solution.

RAID или RAID массив (англ. Redundant Array of Independent Disks - избыточный массив независимых (самостоятельных) дисков) - совокупность из нескольких блочных энергонезависимых устройств хранения, например дисков (SSD, HDD), объединенных в единое логическое блочное устройство таким образом, что выход из строя одного или нескольких блочных устройств в составе RAID не вызывает выхода из строя самого массива, и не приводит к потере данных.RAID or RAID array (English: Redundant Array of Independent Disks) is a set of several block non-volatile storage devices, such as disks (SSD, HDD), combined into a single logical block device in such a way that the failure of one or more block devices in the RAID does not cause the failure of the array itself and does not lead to data loss.

Стрип (Strip) - последовательный участок базового устройства хранения данных (диска RAID-массива) фиксированного размера, например 16 килобайт.Strip - a sequential section of a basic data storage device (RAID disk) of a fixed size, for example 16 kilobytes.

Страйп (Stripe) - набор стрипов, расположенных на разных базовых устройствах хранения (дисках RAID-массива) и вместе формирующих последовательный участок виртуального устройства хранения (RAID-массива). Каждый страйп содержит набор данных, а также, опционально, контрольно-восстановительные суммы, вычисляемые от набора данных страйпа. В случае хранения контрольно-восстановительных сумм, под них выделяются отдельные стрипы (по количеству различных контрольно-восстановительных сумм). Глубиной страйпа (Stripe depth) называется размер одного стрипа, входящего в состав страйпа. Шириной страйпа (Stripe width) называется объем данных, содержащийся в каждом страйпе.A stripe is a set of stripes located on different underlying storage devices (RAID array disks) and together forming a sequential section of a virtual storage device (RAID array). Each stripe contains a set of data and, optionally, checksums calculated from the stripe's data set. If checksums are stored, separate stripes are allocated for them (based on the number of different checksums). Stripe depth is the size of a single stripe within a stripe. Stripe width is the amount of data contained in each stripe.

Так если глубина страйпа равна 64 КБ, то вычислить ширину страйпа мы можем, умножив это значение на количество стрипов с данными в страйпе.So, if the stripe depth is 64 KB, then we can calculate the stripe width by multiplying this value by the number of stripes with data in the stripe.

Коэффициент несбалансированности (imbalance ratio) - метрика, использующаяся для оценки эффективности распределения нагрузки по дискам во время восстановления RAID-массива. Вычисляется для фиксированных номеров отказавших дисков как отношение количества операций ввода вывода для наиболее и наименее загруженных дисков при восстановлении RAID-массива. Для оценки RAID-массива в целом используется минимальное, среднее и максимальное значение коэффициента несбалансированности среди всех возможных отказов дисков.The imbalance ratio is a metric used to evaluate the efficiency of load distribution across disks during RAID array rebuild. It is calculated for a fixed number of failed disks as the ratio of the number of I/O operations performed on the most and least loaded disks during RAID array rebuild. The minimum, average, and maximum imbalance ratio values among all possible disk failures are used to evaluate the RAID array as a whole.

Карта размещения страйпов (stripe map) - структура, хранящая информацию о том, на каком базовом устройстве (диске RAID-массива) расположен каждый стрип каждого страйпа RAID-массива. Для экономии ресурсов используются карты, длина которых значительно меньше, чем количество стрипов в RAID-массиве, доступе к карте размещения страйпов при этом осуществляется по модулю от деления на длину карты размещения страйпов.A stripe map is a structure that stores information about which base device (RAID disk) each stripe of each stripe in the RAID array is located on. To conserve resources, stripe maps are used whose length is significantly shorter than the number of stripes in the RAID array. Access to the stripe map is performed modulo the division by the stripe map length.

Матрица сочетаний - квадратная таблица чисел с длиной стороны, равной количеству дисков в RAID-массиве. Число в таблице, стоящее в i-м столбце, j-й строке обозначает, сколько раз диск i и диск j встречались в одном страйпе согласно карте размещения страйпов.A combination matrix is a square table of numbers with sides equal to the number of disks in the RAID array. The number in the i-th column and j-th row of the table indicates how many times disk i and disk j appear in the same stripe, according to the stripe map.

Данное техническое решение может быть реализовано на компьютере, в виде автоматизированной информационной системы (АИС), распределенной компьютерной системы, или машиночитаемого носителя, содержащего инструкции для выполнения вышеупомянутого способа размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива с помощью вычислительных средств (например, процессора).This technical solution can be implemented on a computer, in the form of an automated information system (AIS), a distributed computer system, or a machine-readable medium containing instructions for performing the above-mentioned method of placing data in RAID arrays for balanced load distribution during array recovery using computing means (for example, a processor).

На Фиг. 1 представлена блок-схема выполнения заявленного способа (100) размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива.Fig. 1 shows a block diagram of the implementation of the claimed method (100) of placing data in RAID arrays for balanced load distribution during array recovery.

На первом этапе (101) создают новый RAID-массив с помощью процедуры генерации карты размещения страйпов (generate stripe map).In the first step (101), a new RAID array is created using the generate stripe map procedure.

На этапе (102) на вход процедуры генерации карты размещения страйпов передают количество свободных дисков N, длину текущего страйпа L и длину карты размещения страйпов R и на основе полученных данных формируют карту размещения страйпов, состоящую из R конкатенированных перестановок множества {1, …, N}.At step (102), the number of free disks N, the length of the current stripe L and the length of the stripe placement map R are passed to the input of the stripe placement map generation procedure, and based on the data received, a stripe placement map is formed, consisting of R concatenated permutations of the set {1, …, N}.

Карта размещения страйпов - это массив длиной k*N, где N - количество дисков, k - фиксированный коэффициент, в нашем случае равный 1024.The stripe placement map is an array of length k*N, where N is the number of disks and k is a fixed coefficient, in our case equal to 1024.

К карте размещения страйпов предъявляются следующие требования:The following requirements apply to the stripe placement map:

Карта размещения страйпов может восприниматься как конкатенация k перестановок множества {1,...,N}, что выступает гарантией равномерного использования дисков т.к. каждый диск n при этом встречается ровно k раз.The stripe placement map can be thought of as a concatenation of k permutations of the set {1,...,N}, which guarantees uniform disk usage since each disk n occurs exactly k times.

Карта размещения страйпов может восприниматься как конкатенация (k*N)/M перестановок множества {1,...,M}, где M - длина страйпа, что выступает гарантией правильно сформированного (без повторений) страйпа.The stripe placement map can be thought of as a concatenation of (k*N)/M permutations of the set {1,...,M}, where M is the stripe length, which guarantees a well-formed (without repetitions) stripe.

На фигуре 3 приведен пример требования к карте размещения страйпов, где N - количество дисков, М - длина страйпа, d_i- номер диска из множества {1, …, N}, а i - индекс в карте размещения страйпов из множества {1, …, k*N}.Figure 3 shows an example of a requirement for a stripe allocation map, where N is the number of disks, M is the stripe length, d _i is the disk number from the set {1, …, N}, and i is the index in the stripe allocation map from the set {1, …, k*N}.

На этапе (103) осуществляют инициализацию матрицы сочетаний M размером NxN, где N - количество дисков, во время которой присваивают всем ее элементам значения 0, текущий страйп инициализируется пустым списком.At step (103), the matrix of combinations M of size NxN is initialized, where N is the number of disks, during which all its elements are assigned the value 0, the current stripe is initialized with an empty list.

Инициализация матрицы сочетаний M - это присваивание всем ее элементам значения 0. Текущий страйп инициализируется пустым списком. Текущий страйп - это название вспомогательной структуры типа список. Элементы списка являются номерами дисков. В разные моменты в списке может находиться от 0 до L (параметр длины страйпа, передаваемый на вход) элементов. Заполнение списка «текущий страйп» происходит в ходе генерации перестановки. Как только его длина достигает L, список сбрасывается. Сам по себе он не сохраняется, а играет роль вспомогательной структуры для генерации перестановки.Initializing a permutation matrix M means assigning all its elements the value 0. The current stripe is initialized to an empty list. The current stripe is the name of an auxiliary structure of the list type. The elements of the list are disk numbers. At any given moment, the list may contain from 0 to L (the stripe length parameter passed as input) elements. The current stripe list is populated during permutation generation. Once its length reaches L, the list is reset. It is not stored in itself, but serves as an auxiliary structure for permutation generation.

Например, элемент матрицы сочетаний M[i][j] отвечает за то, сколько раз диск i и диск j встречались в одном страйпе. Так как отношение нахождения в одном страйпе симметрично, то матрица сочетаний M симметрична относительно своей главной диагонали. В целях оптимизации использования памяти можно хранить только верхне-треугольную часть матрицы сочетаний М. Для простоты изложения в рамках описания алгоритма используется полный вариант матрицы сочетаний М, в котором M[i][j]=M[j][i] для всех i, j из множества {1,...,N}.For example, the element of the combination matrix M[i][j] corresponds to the number of times disk i and disk j occur in the same stripe. Since the relationship of occurrence in the same stripe is symmetric, the combination matrix M is symmetric with respect to its main diagonal. To optimize memory usage, only the upper triangular part of the combination matrix M can be stored. For simplicity, the description of the algorithm uses the full version of the combination matrix M, in which M[i][j]=M[j][i] for all i, j from the set {1,...,N}.

Значение коэффициента несбалансированности линейно зависит от отношения минимального и максимального (за исключением элементов на главной диагонали) элементов в матрице сочетаний М.The value of the imbalance coefficient linearly depends on the ratio of the minimum and maximum (excluding elements on the main diagonal) elements in the combination matrix M.

Представленный выше способ размещения данных составляет карту страйпов пытаясь сбалансировать элементы матрицы сочетаний: напрямую использует текущие данные о сбалансированности сочетаний дисков в страйпах и старается найти лучшую комбинацию на основе этого.The data placement method presented above creates a stripe map by trying to balance the elements of the combination matrix: it directly uses the current data about the balance of disk combinations in stripes and tries to find the best combination based on this.

Матрица сочетаний и карта размещения страйпов располагаются в оперативной памяти. При этом матрица сочетаний это временный объект, память под который выделяется только на время заполнения карты размещения страйпов. Карта размещения страйпов же наоборот объект персистентный и располагается в оперативной памяти постоянно. Именно она определяет на каком диске расположен тот или иной блок информации.The combination matrix and stripe allocation map are located in RAM. The combination matrix is a temporary object, allocated memory only while the stripe allocation map is being filled. The stripe allocation map, on the other hand, is persistent and permanently resides in RAM. It determines which disk a given block of information is located on.

На этапе (104) выполняют генерацию перестановки в карте размещения страйпов (фиг.6) путем вызова процедуры генерации перестановки (generate permutation) со следующими входными параметрами: текущее значения матрицы сочетаний M, список занятых дисков в текущем страйпе, длина текущего страйпа L, количество дисков в RAID-массиве N.At step (104), a permutation is generated in the stripe placement map (Fig. 6) by calling the generate permutation procedure with the following input parameters: the current value of the combination matrix M, the list of occupied disks in the current stripe, the length of the current stripe L, the number of disks in the RAID array N.

Карта размещения страйпов состоит из K конкатенированных перестановок. Перестановка - это произвольный упорядоченный набор всех элементов множества дисков без повторений. Например, перестановками множества {1,2,3} являются перестановки 1, 2, 3; 3, 1, 2 и др.The stripe allocation map consists of K concatenated permutations. A permutation is an arbitrary ordered set of all elements of the set of disks without repetitions. For example, the permutations of the set {1,2,3} are 1, 2, 3; 3, 1, 2, etc.

Выбор именно такого способа генерации обусловлен тем, что, используя конкатенацию перестановок гарантируется, что все диски встречаются в карте размещения страйпов с одинаковой частотой. В свою очередь, это гарантирует равномерное использование пространства всех дисков.This generation method was chosen because using permutation concatenation ensures that all disks appear in the stripe allocation map with equal frequency. This, in turn, ensures uniform space utilization across all disks.

Передача входных параметров позволяет сгенерировать перестановку, которая с учетом сгенерированных ранее перестановок позволит получить хороший коэффициент несбалансированности для дисков, что также является задачей данного решения. Наиболее важными тут являются матрица сочетаний М и список дисков в текущем страйпе, именно по ним происходит выбор нового диска. Далее этот новый диск добавляется в страйп и в перестановку.Passing input parameters allows us to generate a permutation that, taking into account previously generated permutations, will yield a good imbalance coefficient for the disks, which is also the goal of this solution. The most important factors here are the combination matrix M and the list of disks in the current stripe; these are used to select a new disk. This new disk is then added to the stripe and to the permutation.

На этапе (105) для генерации одной перестановки на основе входных параметров осуществляют инициализацию вспомогательных структур, а именно: список свободных дисков инициализируется числами от 1 до N, список дисков в текущей перестановке инициализируется пустым списком.At step (105), to generate one permutation based on the input parameters, the auxiliary structures are initialized, namely: the list of free disks is initialized with numbers from 1 to N, the list of disks in the current permutation is initialized with an empty list.

На этапе (106) итеративно осуществляют выбор диска, который содержится в списке свободных дисков, но не содержится в списке занятых дисков в текущем страйпе, используя поиск минимальной суммы элементов матрицы сочетаний, соответствующих диску и занятых дисков в текущем страйпе. Минимальную сумму элементов матрицы сочетаний рассчитывают на этапе генерации перестановки. На каждом шаге в генерации перестановки добавляют один диск. Диск выбирается на основе минимума сумм в матрице сочетаний. Суммируемые элементы при этом выбираются по следующей логике, номером строки всегда является номер диска-кандидата. Номер столбца меняется в цикле, проходящем по всем элементам массива текущего страйпа.At step (106), a disk that is on the free disk list but not on the occupied disk list in the current stripe is iteratively selected. This search involves finding the minimum sum of the elements of the combination matrix corresponding to the disk and the occupied disks in the current stripe. The minimum sum of the combination matrix elements is calculated during the permutation generation step. At each step of permutation generation, one disk is added. The disk is selected based on the minimum sum in the combination matrix. The elements to be summed are selected according to the following logic. The row number is always the number of the candidate disk. The column number changes in a loop that iterates through all the elements of the current stripe's array.

Процедура выбора диска (фиг.7) получает на вход матрицу сочетаний М, список дисков в текущем страйпе S и непустой список доступных дисков А. Первым делом инициализируем вспомогательные переменные, это номер диска с наименьшей локальной суммой и ему присваивается первый элемент списка доступных дисков. Минимум локальных сумм и ему присваивается значение максимального значения типа INT. Далее итеративно проходим по всем элементам списка А. Присваиваем в начале каждой итерации значение локальной суммы, равное нулю. Проходим по всем элементам списка дисков в текущем страйпе S и добавляем к локальной сумме значение матрицы сочетаний M, стоящее в ячейке (значение текущего элемента непустого списка А, значение текущего элемента непустого списка S). после прохода по списку S получаем значение локальной суммы. Если оно меньше значения текущего минимума локальных сумм, то запоминаем номер диска. Процедура возвращает диск из списка A с наименьшим значением локальной суммы.The disk selection procedure (Fig. 7) receives as input a combination matrix M, a list of disks in the current stripe S, and a non-empty list of available disks A. First, we initialize the auxiliary variables: the number of the disk with the smallest local sum, and the first element of the list of available disks is assigned to it. The minimum of local sums is assigned the maximum value of the INT type, and it is assigned the minimum of local sums. Next, we iterate over all elements of list A. At the beginning of each iteration, we assign a local sum value equal to zero. We iterate over all elements of the list of disks in the current stripe S and add to the local sum the value of the combination matrix M, located in the cell (the value of the current element of the non-empty list A, the value of the current element of the non-empty list S). After iterating over list S, we obtain the value of the local sum. If it is less than the value of the current minimum of local sums, then we store the disk number. The procedure returns the disk from list A with the smallest local sum value.

На этапе (107) добавляют диск в список занятых дисков в текущем страйпе и в конец списка дисков в текущей перестановке.At step (107), the disk is added to the list of occupied disks in the current stripe and to the end of the list of disks in the current permutation.

На этапе (108) как только длина списка занятых дисков в текущем страйпе достигла длины текущего страйпа L, обновляют матрицу сочетаний и присваивают списку занятых дисков в текущем страйпе значение пустого списка и повторяют процедуру генерации перестановки в карте размещения страйпов до тех пор, пока количество перестановок не достигло R.At step (108), as soon as the length of the list of occupied disks in the current stripe has reached the length of the current stripe L, the combination matrix is updated and the list of occupied disks in the current stripe is assigned the value of the empty list and the procedure for generating a permutation in the stripe allocation map is repeated until the number of permutations has reached R.

Процедура обновления матрицы сочетаний (фиг.8) получает на вход матрицу сочетаний и текущий страйп. Итеративно проходим по всем возможным значениям кортежа (d1, d2), где d1 и d2 это номера дисков из списка S и увеличиваем на 1 значение M[d1][d2]. После обработки всех кортежей процедура возвращает обновленную матрицу сочетаний.The combination matrix update procedure (Fig. 8) receives the combination matrix and the current stripe as input. Iteratively iterates through all possible values of the tuple (d1, d2), where d1 and d2 are the disk numbers from the list S, and increments M[d1][d2] by 1. After processing all tuples, the procedure returns the updated combination matrix.

После исполнения всех указанных выше этапов цикла осуществляют возвращение карты размещения страйпов, которая продолжит храниться в оперативной памяти на протяжении всего времени работы RAID-массива.After all the above stages of the cycle are completed, the stripe placement map is returned, which will continue to be stored in RAM for the entire duration of the RAID array operation.

Как следует из указанного выше, заявленное решение позволяет обеспечить высокий уровень надежности хранения данных и увеличение скорости восстановления RAID-массива за счет схемы расположения данных, обеспечивающей равномерное распределение нагрузки чтения по всем дискам во время восстановления данных.As follows from the above, the declared solution ensures a high level of data storage reliability and increases the speed of RAID array recovery due to the data arrangement scheme, which ensures uniform distribution of the read load across all disks during data recovery.

На Фиг. 9 представлен общий пример вычислительного устройства (900), которое может представлять собой, например, компьютер, сервер, ноутбук, смартфон, SoC (System-on-a-Chip/Система на кристалле) и т.п. Устройство (900) может применяться для полной или частичной реализации заявленного способа (100).Fig. 9 shows a general example of a computing device (900), which may be, for example, a computer, a server, a laptop, a smartphone, a SoC (System-on-a-Chip), etc. The device (900) may be used for the full or partial implementation of the claimed method (100).

В общем случае устройство (900) содержит такие компоненты, как: один или более процессоров (901), по меньшей мере одну оперативную память (902), средство постоянного хранения данных (903), интерфейсы ввода/вывода (904) включая релейные выходы для соединения с контроллерами управления движения ленточного конвейера, средство В/В (905), средства сетевого взаимодействия (906).In general, the device (900) contains components such as: one or more processors (901), at least one random access memory (902), a permanent data storage means (903), input/output interfaces (904) including relay outputs for connection to belt conveyor motion controllers, an I/O means (905), and network interaction means (906).

Процессор (901) устройства выполняет основные вычислительные операции, необходимые для функционирования устройства (900) или функционала одного или более его компонентов. Процессор (901) исполняет необходимые машиночитаемые команды, содержащиеся в оперативной памяти (902).The processor (901) of the device performs the basic computing operations necessary for the operation of the device (900) or the functionality of one or more of its components. The processor (901) executes the necessary machine-readable instructions contained in the RAM (902).

Память (902), как правило, выполнена в виде ОЗУ и содержит необходимую программную логику, обеспечивающую требуемый функционал. Средство хранения данных (903) может выполняться в виде HDD, SSD дисков, рейд массива, сетевого хранилища, флэш-памяти, оптических накопителей информации (CD, DVD, MD, BlueRay дисков) и т.п. Средство (903) позволяет выполнять долгосрочное хранение различного вида информации, например, запись магнитограмм, истории обработки запросов (логов), идентификаторов пользователей, данные камер, изображения и т.п.Memory (902) is typically implemented as RAM and contains the necessary software logic to provide the required functionality. Data storage (903) can be implemented as HDD, SSD, RAID array, network storage, flash memory, optical storage (CD, DVD, MD, Blue-Ray discs), etc. Data storage (903) enables long-term storage of various types of information, such as magnetic tape recordings, request processing history (logs), user IDs, camera data, images, etc.

Интерфейсы (904) представляют собой стандартные средства для подключения и работы с вычислительными устройствами. Интерфейсы (904) могут представлять, например, релейные соединения, USB, RS232/422/485 или другие, RJ45, LPT, UART, СОМ, HDMI, PS/2, Lightning, Fire Wire и т.п. для работы, в том числе, по протоколам Modbus и сетям Probfibus, Profinet или сетям иного типа. Выбор интерфейсов (904) зависит от конкретного исполнения устройства (900), которое может представлять собой, вычислительный блок (вычислительный модуль), например на базе ЦПУ (одного или нескольких процессоров), микроконтроллера и т.п., персональный компьютер, мейнфрейм, серверный кластер, тонкий клиент, смартфон, ноутбук и т.п., а также подключаемых сторонних устройств.Interfaces (904) are standard means for connecting and working with computing devices. Interfaces (904) may be, for example, relay connections, USB, RS232/422/485 or others, RJ45, LPT, UART, COM, HDMI, PS/2, Lightning, Fire Wire, etc. for work, including, via Modbus protocols and Probfibus, Profinet networks or other types of networks. The choice of interfaces (904) depends on the specific design of the device (900), which may be a computing unit (computing module), for example, based on a CPU (one or more processors), a microcontroller, etc., a personal computer, a mainframe, a server cluster, a thin client, a smartphone, a laptop, etc., as well as connected third-party devices.

В качестве средств В/В данных (905) может использоваться: клавиатура, джойстик, дисплей (сенсорный дисплей), проектор, тачпад, манипулятор мышь, трекбол, световое перо, динамики, микрофон и т.п.The following can be used as I/O data means (905): keyboard, joystick, display (touch display), projector, touchpad, mouse, trackball, light pen, speakers, microphone, etc.

Средства сетевого взаимодействия (906) выбираются из устройства, обеспечивающего сетевой прием и передачу данных, например, Ethernet карту, WLAN/Wi-Fi модуль, Bluetooth модуль, BLE модуль, NFC модуль, IrDa, RFID модуль, GSM модем, и т.п. С помощью средства (906) обеспечивается организация обмена данными по проводному или беспроводному каналу передачи данных, например, WAN, PAN, ЛВС (LAN), Интранет, Интернет, WLAN, WMAN или GSM, квантовый (оптоволоконный) канал передачи данных, спутниковая связь и т.п. Компоненты устройства (900), как правило, сопряжены посредством общей шины передачи данных.The network interaction means (906) are selected from a device that provides network reception and transmission of data, for example, an Ethernet card, a WLAN/Wi-Fi module, a Bluetooth module, a BLE module, an NFC module, an IrDa module, an RFID module, a GSM modem, etc. With the help of the means (906), the organization of data exchange is ensured via a wired or wireless data transmission channel, for example, a WAN, PAN, LAN, Intranet, Internet, WLAN, WMAN or GSM, a quantum (fiber optic) data transmission channel, satellite communication, etc. The components of the device (900), as a rule, are connected via a common data transmission bus.

Программа - последовательность инструкций, предназначенных для исполнения устройством управления вычислительной машины или устройством обработки команд.A program is a sequence of instructions intended for execution by a computer control unit or command processing device.

В настоящих материалах заявки было представлено предпочтительное раскрытие осуществления заявленного технического решения, которое не должно использоваться как ограничивающее иные, частные воплощения его реализации, которые не выходят за рамки испрашиваемого объема правовой охраны и являются очевидными для специалистов в соответствующей области техники.These application materials present a preferred disclosure of the implementation of the claimed technical solution, which should not be used as limiting other, particular embodiments of its implementation that do not go beyond the scope of the requested scope of legal protection and are obvious to specialists in the relevant field of technology.

Claims

A computer-implemented method for placing data in RAID arrays for balanced load distribution during array recovery, comprising the steps of:

- create a new RAID array using the generate stripe map procedure;

- the number of free disks N, the length of the current stripe L and the length of the stripe placement map R are passed to the input of the stripe placement map generation procedure and, based on the received data, a stripe placement map is formed consisting of R concatenated permutations of the set {1, …, N};

- initialize the matrix of combinations M of size N×N, during which all its elements are assigned the value 0, the current stripe is initialized with an empty list;

- generate a permutation in the stripe allocation map by calling the generate permutation procedure with the following input parameters: the current value of the combination matrix M, the list of occupied disks in the current stripe, the length of the current stripe L, the number of disks in the RAID array N;

- to generate one permutation based on the input parameters, the auxiliary structures are initialized, namely: the list of free disks is initialized with numbers from 1 to N, the list of disks in the current permutation is initialized with an empty list;

- iteratively select a disk that is contained in the list of free disks, but is not contained in the list of occupied disks in the current stripe, using the search for the minimum sum of the elements of the matrix of combinations corresponding to the disk and the occupied disks in the current stripe;

- add the disk to the list of occupied disks in the current stripe and to the end of the list of disks in the current permutation;

- as soon as the length of the list of occupied disks in the current stripe has reached the length of the current stripe L, the combination matrix is updated and the list of occupied disks in the current stripe is assigned the value of the empty list;

- repeat the procedure of generating a permutation in the stripe placement map until the number of permutations reaches R.