RU2835373C1

RU2835373C1 - Method of arranging data in raid arrays for balanced load distribution during array recovery

Info

Publication number: RU2835373C1
Application number: RU2024118335A
Authority: RU
Inventors: Анна Игоревна Васенина; Иван Максимович Левицкий; Дмитрий Сергеевич Смирнов
Original assignee: Общество С Ограниченной Ответственностью "Швачер"
Filing date: 2024-07-02
Publication date: 2025-02-25

Abstract

FIELD: physics.

SUBSTANCE: invention relates to a method of arranging data in a RAID array for balanced load distribution during array recovery. Method comprises steps of creating a new RAID array using a procedure for generating a stripe map; number of free disks N, length of current stripe L and the length of the stripe arrangement map R is passed at the input of the procedure for generate a stripe map and based on the obtained data, a stripe map consisting of R concatenated permutations of the set {1, …, N} is formed; matrix of combinations M of size N×N is initialized, during which all its elements are assigned values of 0, the current stripe is initialized with an empty list; permutation is generated in the stripe map by calling the generate permutation procedure with the following input parameters: current value of combination matrix M, list of occupied disks in current stripe, length of current stripe L, number of disks in RAID array N; to generate one permutation based on input parameters, auxiliary structures are initialized, namely: list of free disks is initialized with numbers from 1 to N, list of disks in current permutation is initialized with empty list; iterative selection of a disk which is contained in the list of free disks, but is not contained in the list of occupied disks in the current stripe, using the search for the minimum sum of the elements of the combination matrix corresponding to the disk, and occupied disks in the current stripe; adding a disc to a list of occupied discs in the current stripe and to the end of the list of discs in the current permutation; once the length of the list of occupied disks in the current stripe has reached the length of the current stripe L, updating the combination matrix and assigning the list of occupied disks in the current stripe to an empty list value; procedure for generation of a permutation in a stripe arrangement map is repeated until the number of permutations reaches R.

EFFECT: faster recovery of a RAID array due to a data arrangement scheme which ensures uniform distribution of the read load across all disks during recovery of the array.

1 cl, 9 dwg

Description

ОБЛАСТЬ ТЕХНИКИAREA OF TECHNOLOGY

Заявленное техническое решение в общем относится к области вычислительной техники, а в частности к способу размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива.The claimed technical solution generally relates to the field of computing technology, and in particular to a method of placing data in RAID arrays for balanced load distribution during array recovery.

УРОВЕНЬ ТЕХНИКИLEVEL OF TECHNOLOGY

Одной из важнейших характеристик систем хранения данных является доступность хранимых данных. Иными словами, возможность непрерывно работать с данными в течении длительного времени. При использовании стандартных устройств хранения данных, таких как жесткие диски (HDD) или твердотельные накопители (SSD) велика вероятность поломки устройства и потери доступа к данным. Одним из универсальных способов увеличить доступность данных, хранимых на устройствах, является объединение этих устройств в RAID-массив с избыточностью хранения данных. Избыточность данных в RAID-массиве может обеспечиваться хранением полных копий данных (RAID-1, RAID-10) или хранением контрольно-восстановительных сумм для некоторых фрагментов данных (RAID-5, RAID-6, RAID-50, RAID-60). Рассмотрим подробнее второй вариант.One of the most important characteristics of data storage systems is the availability of stored data. In other words, the ability to continuously work with data for a long time. When using standard data storage devices such as hard disk drives (HDD) or solid-state drives (SSD), there is a high probability of the device breaking down and losing access to data. One of the universal ways to increase the availability of data stored on devices is to combine these devices into a RAID array with data storage redundancy. Data redundancy in a RAID array can be provided by storing full copies of data (RAID-1, RAID-10) or storing checksums for some data fragments (RAID-5, RAID-6, RAID-50, RAID-60). Let's consider the second option in more detail.

RAID-массивы с использованием контрольно-восстановительных сумм разбивают все хранимые данные на последовательные фрагменты, называемые страйпами. Страйп состоит из стрипов данных равного размера. Для обеспечения избыточности к стрипам данных так же добавляют стрипы контрольно-восстановительных сумм (1 для RAID-5, 2 для RAID-6). Каждый стрип хранится на некотором физическом устройстве, при этом ни одно устройство не входит в один и тот же страйп дважды. Чем больше дисков в RAID-массиве, тем больше вероятность поломки одновременно нескольких дисков, в связи с этим увеличивают количество дисков с контрольно-восстановительными суммами. Для масштабирования RAID-6 на большое количество дисков используется технология RAID-60, в котором диски разделяются на несколько групп.Каждая группа образует подобие RAID-6, но данные при этом чередуются между группами. То есть, если есть 2 группы дисков, то нечетные страйпы будут лежать на первой группе дисков, а четные на второй.RAID arrays using checksums divide all stored data into consecutive fragments called stripes. A stripe consists of data stripes of equal size. To ensure redundancy, stripes of checksums are also added to the data stripes (1 for RAID-5, 2 for RAID-6). Each stripe is stored on a physical device, and no device is included in the same stripe twice. The more disks in a RAID array, the greater the likelihood of several disks failing at the same time, so the number of disks with checksums is increased. To scale RAID-6 to a large number of disks, RAID-60 technology is used, in which the disks are divided into several groups. Each group forms a semblance of RAID-6, but the data is alternated between the groups. That is, if there are 2 groups of disks, then the odd stripes will be on the first group of disks, and the even ones on the second.

Восстановление RAID-массива с контрольно-восстановительными суммами происходит по страйпам. Если по каким-то причинам оказывается невозможным прочитать стрип с данными, то вместо этого читается вся информация с других стрипов, в том числе со стрипов с контрольно-восстановительными суммами. Используя всю информацию, хранящуюся внутри страйпа и алгоритм восстановления с контрольно-восстановительных сумм, можно точно восстановить данные, пока число поврежденных(отсутствующих) стрипов не превышает количества стрипов с контрольно-восстановительными суммами.Recovery of a RAID array with checksums is performed stripe by stripe. If for some reason it is impossible to read a strip with data, then all information from other stripes is read instead, including stripes with checksums. Using all the information stored inside the stripe and the algorithm for recovery from checksums, it is possible to accurately recover data as long as the number of damaged (missing) stripes does not exceed the number of strips with checksums.

При установке нового диска взамен поврежденного производится аналогичная процедура, но данные не только вычисляются, но и записываются на новый диск. Данный процесс называется процессом восстановления RAID-массива. В случае использования RAID-60 для восстановления дисков читаются только те диски, которые были в одной группе со сломавшимся диском, диски остальных групп не используются, что может приводить к длительной перегрузке дисков и новым поломкам.When installing a new disk to replace a damaged one, a similar procedure is performed, but the data is not only calculated, but also written to the new disk. This process is called the RAID array recovery process. In the case of using RAID-60 to restore disks, only those disks that were in the same group as the broken disk are read, disks of other groups are not used, which can lead to long-term disk overload and new failures.

Одним из решений этой проблемы являются альтернативные способы размещения данных, предложенные в заявленном решении.One solution to this problem is the alternative methods of data placement proposed in the stated solution.

Описанная логика RAID-массива может быть реализована двумя способами.The described RAID array logic can be implemented in two ways.

Первый способ - это использование аппаратного RAID-контроллера для управления массивом. В этом случае настройка производится до загрузки операционной системы, и операционная система не видит базовых устройств хранения данных.The first method is to use a hardware RAID controller to manage the array. In this case, the configuration is done before the operating system is loaded, and the operating system does not see the underlying storage devices.

Второй способ - это написание драйвера устройства в ядре операционной системы, такие устройства называют программно-определяемыми или виртуальными. В этом случае диски видны ядру операционной системы, и уже после загрузки драйвера создается новое устройство, которое работает с переданными ему устройствами хранения данных, реализуя переадресацию запросов базовым устройствам согласно заданной логике.The second method is to write a device driver in the operating system kernel; such devices are called software-defined or virtual. In this case, the disks are visible to the operating system kernel, and after loading the driver, a new device is created that works with the data storage devices transferred to it, implementing the redirection of requests to the base devices according to the specified logic.

Из уровня техники известен патент US9841908B1 «Declustered array of storage devices with chunk groups and support for multiple erasure schemes», патентообладатель Western Digital Technologies Inc, опубликован 12.12.2017. В данном решении описывается способ генерации сбалансированных неполных блок-дизайнов (balanced incomplete block designs, BIBD), способ генерации частичных сбалансированных неполных блок-дизайнов (PBIBD), а также способ применения сгенерированных блок-дизайнов для создания карты размещения страйпов (chunk group mapping table в терминологии патента). Генерация BIBD возможна только для конфигураций, описываемых формулой N=k², где N -количество дисков в RAID-массиве, k - количество дисков в страйпе. В ходе генерации используется случайно сгенерированная перестановка и последовательные операции вращения матриц (successive rotational operations). Генерация PBIBD используется только для тех конфигураций, для которых нельзя сгенерировать BIBD. Алгоритм использует последовательную псевдослучайную генерацию перестановок с оценкой параметров блок-дизайна на каждом шаге.The prior art includes patent US9841908B1 "Declustered array of storage devices with chunk groups and support for multiple erasure schemes", patent holder Western Digital Technologies Inc, published on 12/12/2017. This solution describes a method for generating balanced incomplete block designs (BIBD), a method for generating partial balanced incomplete block designs (PBIBD), and a method for using the generated block designs to create a chunk group mapping table (in the patent terminology). BIBD generation is only possible for configurations described by the formula N=k ² , where N is the number of disks in the RAID array, k is the number of disks in the stripe. During the generation, a randomly generated permutation and successful rotational operations are used. PBIBD generation is used only for those configurations for which BIBD cannot be generated. The algorithm uses sequential pseudo-random generation of permutations with estimation of block design parameters at each step.

Недостатками описанного способа являются:The disadvantages of the described method are:

1. Применимость основного алгоритма генерации только для части возможных конфигураций.1. The applicability of the basic generation algorithm is only for a part of possible configurations.

2. Использование псевдослучайных генераций требует хранения дополнительной информации на дисках.2. Using pseudo-random generations requires storing additional information on disks.

3. Время работы и длина вывода у алгоритма генерации PBIBD непредсказуемы и зависят от того, насколько удачно на каждой итерации генерируется перестановка.3. The running time and output length of the PBIBD generation algorithm are unpredictable and depend on how successfully the permutation is generated at each iteration.

4. Оцениваемые в патенте параметры блок-дизайна, такие как число блоков, содержащих любую точку, и число блоков, содержащих любые две точки, оценивают сгенерированный блок-дизайн только глобально, что может приводить к локальным перегруженным элементам.4. The block design parameters estimated in the patent, such as the number of blocks containing any point and the number of blocks containing any two points, evaluate the generated block design only globally, which may lead to locally overloaded elements.

Кроме того, из уровня техники известен патент EP2921960A2 «Method of, and apparatus for, accelerated data recovery in a storage system», патентообладатель Seagate Systems UK Ltd, опубликован 23.12.2015. В данном решении описывается способ раскладки данных, основанный на двух параметрах: ширине и количестве повторений, позволяющий оптимизировать восстановление RAID-массива через использование упреждающего чтения с базовых устройств. Подход состоит из трех основных шагов. На первом шаге определяются параметры ширины и количества повторений. Далее формируется матрица согласно выбранным параметрам, количеству дисков и количеству дисков в страйпе. На последнем шаге столбцы матрицы перемешиваются с помощью случайно сгенерированной перестановки.In addition, the prior art includes patent EP2921960A2 "Method of, and apparatus for, accelerated data recovery in a storage system", patent holder Seagate Systems UK Ltd, published on 23.12.2015. This solution describes a data layout method based on two parameters: width and number of repetitions, allowing to optimize RAID array recovery by using anticipatory reading from base devices. The approach consists of three main steps. In the first step, the parameters of the width and number of repetitions are determined. Then, a matrix is formed according to the selected parameters, the number of disks and the number of disks in the stripe. In the last step, the matrix columns are shuffled using a randomly generated permutation.

Основными недостатками данного подхода являются:The main disadvantages of this approach are:

1. При оптимизации за счет упреждающих чтений часть дисков оказывается перегруженной запросами, что может замедлять обработку пользовательских запросов и приводить к увеличению вероятности отказа дисков, с которых производится восстановление.1. When optimizing using read-ahead, some disks become overloaded with requests, which can slow down the processing of user requests and lead to an increased probability of failure of the disks from which recovery is performed.

2. Использование псевдослучайной перестановки в конце генерации требует хранения дополнительной информации на дисках.2. Using pseudo-random permutation at the end of generation requires storing additional information on disks.

К сожалению, жесткие диски, которые являются на сегодняшний день основным хранилищем данных, не так надежны, как хотелось бы. И достаточно остро стоит проблема обезопасить свои файлы, чтобы не пришлось прибегать к восстановлению данных.Unfortunately, hard drives, which are the main data storage today, are not as reliable as we would like. And the problem of protecting your files so that you do not have to resort to data recovery is quite acute.

СУЩНОСТЬ ИЗОБРЕТЕНИЯESSENCE OF THE INVENTION

Недостатки известного уровня техники преодолеваются и преимущества обеспечиваются посредством предоставления компьютерно-реализуемого способа размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива.The disadvantages of the prior art are overcome and advantages are achieved by providing a computer-implementable method for placing data in RAID arrays for balanced load distribution during array rebuild.

Техническим результатом, достигающимся при решении данной проблемы, является обеспечение высокого уровня доступности данных, надежности хранения данных и увеличение скорости восстановления RAID-массива за счет схемы расположения данных, обеспечивающей равномерное распределение нагрузки чтения по всем дискам во время восстановления массива.The technical result achieved by solving this problem is to ensure a high level of data availability, data storage reliability and increase the speed of RAID array recovery due to the data arrangement scheme, which ensures uniform distribution of the reading load across all disks during array recovery.

Кроме того, на разработанной схеме расположения данных удалось добиться максимального коэффициента несбалансированности не более чем 1.5 среди всех допустимых конфигураций (до 1024 дисков). За счет формирования страйпов с помощью матрицы сочетаний, хранящей в себе данные о том, сколько раз каждая пара дисков встречалась в одном страйпе осуществляют сбалансированное построение страйпов. Например, во время восстановления данных происходит чтение с тех дисков, которые входят в страйп с диском, восстановление которого производится.In addition, the developed data layout scheme made it possible to achieve a maximum imbalance coefficient of no more than 1.5 among all permissible configurations (up to 1024 disks). By forming stripes using a combination matrix that stores data on how many times each pair of disks was found in one stripe, balanced stripe construction is achieved. For example, during data recovery, reading occurs from those disks that are included in the stripe with the disk that is being recovered.

К дополнительным эффектам можно отнести потребление оперативной памяти в объемах не более чем 2Мб (для худшего случая, 1024 диска) на один RAID-массив. Во время работы RAID-массива в оперативной памяти хранится только карта размещения страйпов. При фиксированной длине карты размещения страйпов в 1024 она хранит число элементов равное 1024 умножить на количество дисков. Для максимального количества дисков это 1024*1024 элемента. Размер элемента при этом 16 бит, так как этого достаточно для хранения чисел от 1 до 1024. Таким образом максимальное потребление памяти равно `1024*1024*16=16 777 216 бит=2 Мб.Additional effects include the consumption of RAM in volumes of no more than 2 MB (in the worst case, 1024 disks) per RAID array. During RAID operation, only the stripe allocation map is stored in RAM. With a fixed stripe allocation map length of 1024, it stores a number of elements equal to 1024 multiplied by the number of disks. For the maximum number of disks, this is 1024 * 1024 elements. The element size is 16 bits, since this is enough to store numbers from 1 to 1024. Thus, the maximum memory consumption is `1024 * 1024 * 16 = 16,777,216 bits = 2 MB.

Указанный технический результат достигается благодаря осуществлению способа размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива, содержащий этапы, на которых:The specified technical result is achieved by implementing a method for placing data in RAID arrays for balanced load distribution during array recovery, containing stages in which:

- создают новый RAID-массив с помощью процедуры генерации карты размещения страйпов (generate stripe map);- create a new RAID array using the generate stripe map procedure;

- на вход процедуры генерации карты размещения страйпов передают количество свободных дисков N, длину текущего страйпа L и длину карты размещения страйпов R и на основе полученных данных формируют карту размещения страйпов, состоящую из R конкатенированных перестановок множества {1, …, N};- the number of free disks N, the length of the current stripe L and the length of the stripe placement map R are passed to the input of the stripe placement map generation procedure and, based on the data received, a stripe placement map is formed consisting of R concatenated permutations of the set {1, …, N};

- осуществляют инициализацию матрицы сочетаний M размером NxN, во время которой присваивают всем ее элементам значения 0, текущий страйп инициализируется пустым списком;- initialize the matrix of combinations M of size NxN, during which all its elements are assigned the value 0, the current stripe is initialized with an empty list;

- выполняют генерацию перестановки в карте размещения страйпов путем вызова процедуры генерации перестановки (generate permutation) со следующими входными параметрами: текущее значения матрицы сочетаний M, список занятых дисков в текущем страйпе, длина текущего страйпа L, количество дисков в RAID-массиве N;- generate a permutation in the stripe placement map by calling the generate permutation procedure with the following input parameters: the current value of the combination matrix M, the list of occupied disks in the current stripe, the length of the current stripe L, the number of disks in the RAID array N;

- для генерации одной перестановки на основе входных параметров осуществляют инициализацию вспомогательных структур, а именно: список свободных дисков инициализируется числами от 1 до N, список дисков в текущей перестановке инициализируется пустым списком;- to generate one permutation based on the input parameters, the auxiliary structures are initialized, namely: the list of free disks is initialized with numbers from 1 to N, the list of disks in the current permutation is initialized with an empty list;

- итеративно осуществляют выбор диска, который содержится в списке свободных дисков, но не содержится в списке занятых дисков в текущем страйпе, используя поиск минимальной суммы элементов матрицы сочетаний, соответствующих диску и занятых дисков в текущем страйпе;- iteratively select a disk that is contained in the list of free disks, but is not contained in the list of occupied disks in the current stripe, using the search for the minimum sum of the elements of the matrix of combinations corresponding to the disk and the occupied disks in the current stripe;

- добавляют диск в список занятых дисков в текущем страйпе и в конец списка дисков в текущей перестановке;- add a disk to the list of occupied disks in the current stripe and to the end of the list of disks in the current permutation;

- как только длина списка занятых дисков в текущем страйпе достигла длины текущего страйпа L, обновляют матрицу сочетаний и присваивают списку занятых дисков в текущем страйпе значение пустого списка;- as soon as the length of the list of occupied disks in the current stripe reaches the length of the current stripe L, the combination matrix is updated and the list of occupied disks in the current stripe is assigned the value of the empty list;

- повторяют процедуру генерации перестановки в карте размещения страйпов до тех пор, пока количество перестановок не достигло R.- repeat the procedure of generating a permutation in the stripe placement map until the number of permutations reaches R.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF DRAWINGS

Признаки и преимущества настоящего технического решения станут очевидными из приводимого ниже подробного описания и прилагаемых чертежей.The features and advantages of the present technical solution will become apparent from the following detailed description and the attached drawings.

Фиг. 1 - иллюстрирует блок-схему выполнения заявленного способа;Fig. 1 - illustrates a block diagram of the implementation of the claimed method;

Фиг. 2 - иллюстрирует состояния матрицы сочетаний после генерации трех перестановок в карте размещения страйпов;Fig. 2 - illustrates the states of the combination matrix after generating three permutations in the stripe placement map;

Фиг. 3 - иллюстрирует требования к карте размещения страйпов;Fig. 3 - illustrates the requirements for the stripe placement map;

Фиг. 4 - иллюстрирует пример итерации в генерации перестановки;Fig. 4 - illustrates an example of iteration in permutation generation;

Фиг. 5 - иллюстрирует использование карты размещения страйпов для вычисления физического размещения страйпов;Fig. 5 - illustrates the use of a stripe placement map to calculate the physical placement of stripes;

Фиг. 6 - иллюстрирует блок-схему процедуры генерации перестановок;Fig. 6 - illustrates a block diagram of the permutation generation procedure;

Фиг. 7 - иллюстрирует блок-схему процедуры поиска кандидата на добавление в страйп;Fig. 7 - illustrates a flow chart of the procedure for searching for a candidate for addition to a stripe;

Фиг. 8 - иллюстрирует блок-схему процедуры обновления матрицы сочетаний;Fig. 8 - illustrates a block diagram of the procedure for updating the combination matrix;

Фиг. 9 - иллюстрирует общий пример вычислительного устройства.Fig. 9 - illustrates a general example of a computing device.

ДЕТАЛЬНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

В приведенном ниже подробном описании реализации изобретения приведены многочисленные детали реализации, призванные обеспечить отчетливое понимание настоящего изобретения. Однако, квалифицированному в предметной области специалисту, будет очевидно каким образом можно использовать настоящее изобретение, как с данными деталями реализации, так и без них. В других случаях хорошо известные методы, процедуры и компоненты не были описаны подробно, чтобы не затруднять излишне понимание особенностей настоящего изобретения.In the following detailed description of the embodiment of the invention, numerous implementation details are set forth in order to provide a clear understanding of the present invention. However, it will be apparent to one skilled in the art how the present invention may be used with or without these implementation details. In other instances, well-known methods, procedures, and components have not been described in detail in order not to unnecessarily obscure the features of the present invention.

Кроме того, из приведенного изложения будет ясно, что изобретение не ограничивается приведенной реализацией. Многочисленные возможные модификации, изменения, вариации и замены, сохраняющие суть и форму настоящего изобретения, будут очевидными для квалифицированных в предметной области специалистов.In addition, it will be clear from the above description that the invention is not limited to the embodiment shown. Numerous possible modifications, changes, variations and substitutions, preserving the essence and form of the present invention, will be obvious to those skilled in the art.

Ниже будут описаны понятия и термины, необходимые для понимания данного технического решения.Below we will describe the concepts and terms necessary for understanding this technical solution.

RAID или RAID массив (англ. Redundant Array of Independent Disks - избыточный массив независимых (самостоятельных) дисков) - совокупность из нескольких блочных энергонезависимых устройств хранения, например дисков (SSD, HDD), объединенных в единое логическое блочное устройство таким образом, что выход из строя одного или нескольких блочных устройств в составе RAID не вызывает выхода из строя самого массива, и не приводит к потере данных.RAID or RAID array (English: Redundant Array of Independent Disks - redundant array of independent (independent) disks) is a set of several block non-volatile storage devices, such as disks (SSD, HDD), combined into a single logical block device in such a way that the failure of one or more block devices in the RAID does not cause the failure of the array itself, and does not lead to data loss.

Стрип (Strip) - последовательный участок базового устройства хранения данных (диска RAID-массива) фиксированного размера, например 16 килобайт.Strip - a sequential section of a basic data storage device (RAID disk array) of a fixed size, for example 16 kilobytes.

Страйп (Stripe) - набор стрипов, расположенных на разных базовых устройствах хранения (дисках RAID-массива) и вместе формирующих последовательный участок виртуального устройства хранения (RAID-массива). Каждый страйп содержит набор данных, а также, опционально, контрольно-восстановительные суммы, вычисляемые от набора данных страйпа. В случае хранения контрольно-восстановительных сумм, под них выделяются отдельные стрипы (по количеству различных контрольно-восстановительных сумм). Глубиной страйпа (Stripe depth) называется размер одного стрипа, входящего в состав страйпа. Шириной страйпа (Stripe width) называется объем данных, содержащийся в каждом страйпе.Stripe - a set of stripes located on different basic storage devices (RAID array disks) and together forming a sequential section of a virtual storage device (RAID array). Each stripe contains a data set and, optionally, checksums calculated from the stripe data set. In the case of storing checksums, separate stripes are allocated for them (according to the number of different checksums). Stripe depth is the size of one strip included in the stripe. Stripe width is the volume of data contained in each stripe.

Так если глубина страйпа равна 64 КБ, то вычислить ширину страйпа мы можем, умножив это значение на количество стрипов с данными в страйпе.So if the stripe depth is 64 KB, then we can calculate the stripe width by multiplying this value by the number of strips with data in the stripe.

Коэффициент несбалансированности (imbalance ratio) - метрика, использующаяся для оценки эффективности распределения нагрузки по дискам во время восстановления RAID-массива. Вычисляется для фиксированных номеров отказавших дисков как отношение количества операций ввода вывода для наиболее и наименее загруженных дисков при восстановлении RAID-массива. Для оценки RAID-массива в целом используется минимальное, среднее и максимальное значение коэффициента несбалансированности среди всех возможных отказов дисков.The imbalance ratio is a metric used to evaluate the efficiency of load distribution across disks during RAID array recovery. It is calculated for fixed numbers of failed disks as the ratio of the number of I/O operations for the most and least loaded disks during RAID array recovery. The minimum, average, and maximum imbalance ratio values among all possible disk failures are used to evaluate the RAID array as a whole.

Карта размещения страйпов (stripe map) - структура, хранящая информацию о том, на каком базовом устройстве (диске RAID-массива) расположен каждый стрип каждого страйпа RAID-массива. Для экономии ресурсов используются карты, длина которых значительно меньше, чем количество стрипов в RAID-массиве, доступе к карте размещения страйпов при этом осуществляется по модулю от деления на длину карты размещения страйпов.Stripe map - a structure that stores information about which base device (RAID array disk) each stripe of each stripe of the RAID array is located on. To save resources, maps are used whose length is significantly less than the number of stripes in the RAID array, and the stripe map is accessed modulo the division by the length of the stripe map.

Матрица сочетаний - квадратная таблица чисел с длиной стороны, равной количеству дисков в RAID-массиве. Число в таблице, стоящее в i-м столбце, j-й строке обозначает, сколько раз диск i и диск j встречались в одном страйпе согласно карте размещения страйпов.The combination matrix is a square table of numbers with the side length equal to the number of disks in the RAID array. The number in the table in the i-th column, j-th row indicates how many times disk i and disk j were found in one stripe according to the stripe placement map.

Данное техническое решение может быть реализовано на компьютере, в виде автоматизированной информационной системы (АИС), распределенной компьютерной системы, или машиночитаемого носителя, содержащего инструкции для выполнения вышеупомянутого способа размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива с помощью вычислительных средств (например, процессора).This technical solution can be implemented on a computer, in the form of an automated information system (AIS), a distributed computer system, or a machine-readable medium containing instructions for performing the above-mentioned method of placing data in RAID arrays for balanced load distribution during array recovery using computing means (for example, a processor).

На Фиг. 1 представлена блок-схема выполнения заявленного способа (100) размещения данных в RAID-массивах для сбалансированного распределения нагрузки во время восстановления массива.Fig. 1 shows a block diagram of the implementation of the claimed method (100) of placing data in RAID arrays for balanced load distribution during array recovery.

На первом этапе (101) создают новый RAID-массив с помощью процедуры генерации карты размещения страйпов (generate stripe map).In the first step (101), a new RAID array is created using the generate stripe map procedure.

На этапе (102) на вход процедуры генерации карты размещения страйпов передают количество свободных дисков N, длину текущего страйпа L и длину карты размещения страйпов R и на основе полученных данных формируют карту размещения страйпов, состоящую из R конкатенированных перестановок множества {1, …, N}.At stage (102), the number of free disks N, the length of the current stripe L and the length of the stripe placement map R are passed to the input of the stripe placement map generation procedure, and based on the data received, a stripe placement map is formed, consisting of R concatenated permutations of the set {1, …, N}.

Карта размещения страйпов - это массив длиной k*N, где N - количество дисков, k - фиксированный коэффициент, в нашем случае равный 1024.The stripe placement map is an array of length k*N, where N is the number of disks and k is a fixed coefficient, in our case equal to 1024.

К карте размещения страйпов предъявляются следующие требования:The following requirements apply to the stripe placement map:

Карта размещения страйпов может восприниматься как конкатенация k перестановок множества {1,...,N}, что выступает гарантией равномерного использования дисков т.к. каждый диск n при этом встречается ровно k раз.The stripe placement map can be thought of as a concatenation of k permutations of the set {1,...,N}, which guarantees uniform disk usage since each disk n occurs exactly k times.

Карта размещения страйпов может восприниматься как конкатенация (k*N)/M перестановок множества {1,...,M}, где M - длина страйпа, что выступает гарантией правильно сформированного (без повторений) страйпа.The stripe placement map can be perceived as a concatenation of (k*N)/M permutations of the set {1,...,M}, where M is the stripe length, which guarantees a correctly formed (without repetitions) stripe.

На фигуре 3 приведен пример требования к карте размещения страйпов, где N - количество дисков, М - длина страйпа, d_i- номер диска из множества {1, …, N}, а i - индекс в карте размещения страйпов из множества {1, …, k*N}.Figure 3 shows an example of a requirement for a stripe allocation map, where N is the number of disks, M is the stripe length, d _i is the disk number from the set {1, …, N}, and i is the index in the stripe allocation map from the set {1, …, k*N}.

На этапе (103) осуществляют инициализацию матрицы сочетаний M размером NxN, где N - количество дисков, во время которой присваивают всем ее элементам значения 0, текущий страйп инициализируется пустым списком.At step (103), the matrix of combinations M of size NxN is initialized, where N is the number of disks, during which all its elements are assigned the value 0, and the current stripe is initialized with an empty list.

Инициализация матрицы сочетаний M - это присваивание всем ее элементам значения 0. Текущий страйп инициализируется пустым списком. Текущий страйп - это название вспомогательной структуры типа список. Элементы списка являются номерами дисков. В разные моменты в списке может находиться от 0 до L (параметр длины страйпа, передаваемый на вход) элементов. Заполнение списка «текущий страйп» происходит в ходе генерации перестановки. Как только его длина достигает L, список сбрасывается. Сам по себе он не сохраняется, а играет роль вспомогательной структуры для генерации перестановки.Initialization of the matrix of combinations M is the assignment of the value 0 to all its elements. The current stripe is initialized with an empty list. The current stripe is the name of an auxiliary structure of the list type. The elements of the list are the disk numbers. At different moments, the list may contain from 0 to L (the stripe length parameter passed to the input) elements. The filling of the list "current stripe" occurs during the generation of the permutation. As soon as its length reaches L, the list is reset. It is not saved by itself, but plays the role of an auxiliary structure for generating the permutation.

Например, элемент матрицы сочетаний M[i][j] отвечает за то, сколько раз диск i и диск j встречались в одном страйпе. Так как отношение нахождения в одном страйпе симметрично, то матрица сочетаний M симметрична относительно своей главной диагонали. В целях оптимизации использования памяти можно хранить только верхне-треугольную часть матрицы сочетаний М. Для простоты изложения в рамках описания алгоритма используется полный вариант матрицы сочетаний М, в котором M[i][j]=M[j][i] для всех i, j из множества {1,...,N}.For example, the element of the combination matrix M[i][j] is responsible for how many times disk i and disk j occurred in one stripe. Since the relationship of being in one stripe is symmetric, the combination matrix M is symmetric with respect to its main diagonal. In order to optimize memory usage, only the upper triangular part of the combination matrix M can be stored. For simplicity of presentation, the description of the algorithm uses the full version of the combination matrix M, in which M[i][j]=M[j][i] for all i, j from the set {1,...,N}.

Значение коэффициента несбалансированности линейно зависит от отношения минимального и максимального (за исключением элементов на главной диагонали) элементов в матрице сочетаний М.The value of the imbalance coefficient depends linearly on the ratio of the minimum and maximum (except for elements on the main diagonal) elements in the combination matrix M.

Представленный выше способ размещения данных составляет карту страйпов пытаясь сбалансировать элементы матрицы сочетаний: напрямую использует текущие данные о сбалансированности сочетаний дисков в страйпах и старается найти лучшую комбинацию на основе этого.The above data layout method creates a stripe map by trying to balance the elements of the combination matrix: it directly uses the current data on the balance of disk combinations in stripes and tries to find the best combination based on this.

Матрица сочетаний и карта размещения страйпов располагаются в оперативной памяти. При этом матрица сочетаний это временный объект, память под который выделяется только на время заполнения карты размещения страйпов. Карта размещения страйпов же наоборот объект персистентный и располагается в оперативной памяти постоянно. Именно она определяет на каком диске расположен тот или иной блок информации.The combination matrix and stripe placement map are located in RAM. The combination matrix is a temporary object, the memory for which is allocated only for the time the stripe placement map is filled. The stripe placement map, on the contrary, is a persistent object and is located in RAM permanently. It is the one that determines on which disk a particular block of information is located.

На этапе (104) выполняют генерацию перестановки в карте размещения страйпов (фиг.6) путем вызова процедуры генерации перестановки (generate permutation) со следующими входными параметрами: текущее значения матрицы сочетаний M, список занятых дисков в текущем страйпе, длина текущего страйпа L, количество дисков в RAID-массиве N.At step (104), a permutation is generated in the stripe placement map (Fig. 6) by calling the generate permutation procedure with the following input parameters: the current value of the combination matrix M, the list of occupied disks in the current stripe, the length of the current stripe L, the number of disks in the RAID array N.

Карта размещения страйпов состоит из K конкатенированных перестановок. Перестановка - это произвольный упорядоченный набор всех элементов множества дисков без повторений. Например, перестановками множества {1,2,3} являются перестановки 1, 2, 3; 3, 1, 2 и др.The stripe allocation map consists of K concatenated permutations. A permutation is an arbitrary ordered set of all elements of the set of disks without repetitions. For example, the permutations of the set {1,2,3} are 1, 2, 3; 3, 1, 2, etc.

Выбор именно такого способа генерации обусловлен тем, что, используя конкатенацию перестановок гарантируется, что все диски встречаются в карте размещения страйпов с одинаковой частотой. В свою очередь, это гарантирует равномерное использование пространства всех дисков.The choice of this particular generation method is due to the fact that using the concatenation of permutations it is guaranteed that all disks occur in the stripe placement map with the same frequency. In turn, this guarantees uniform use of the space of all disks.

Передача входных параметров позволяет сгенерировать перестановку, которая с учетом сгенерированных ранее перестановок позволит получить хороший коэффициент несбалансированности для дисков, что также является задачей данного решения. Наиболее важными тут являются матрица сочетаний М и список дисков в текущем страйпе, именно по ним происходит выбор нового диска. Далее этот новый диск добавляется в страйп и в перестановку.The input parameters allow generating a permutation that, taking into account the previously generated permutations, will allow obtaining a good imbalance coefficient for the disks, which is also the task of this solution. The most important here are the combination matrix M and the list of disks in the current stripe, which is what the new disk is selected for. Then this new disk is added to the stripe and to the permutation.

На этапе (105) для генерации одной перестановки на основе входных параметров осуществляют инициализацию вспомогательных структур, а именно: список свободных дисков инициализируется числами от 1 до N, список дисков в текущей перестановке инициализируется пустым списком.At step (105), to generate one permutation based on the input parameters, the auxiliary structures are initialized, namely: the list of free disks is initialized with numbers from 1 to N, the list of disks in the current permutation is initialized with an empty list.

На этапе (106) итеративно осуществляют выбор диска, который содержится в списке свободных дисков, но не содержится в списке занятых дисков в текущем страйпе, используя поиск минимальной суммы элементов матрицы сочетаний, соответствующих диску и занятых дисков в текущем страйпе. Минимальную сумму элементов матрицы сочетаний рассчитывают на этапе генерации перестановки. На каждом шаге в генерации перестановки добавляют один диск. Диск выбирается на основе минимума сумм в матрице сочетаний. Суммируемые элементы при этом выбираются по следующей логике, номером строки всегда является номер диска-кандидата. Номер столбца меняется в цикле, проходящем по всем элементам массива текущего страйпа.At step (106), a disk that is contained in the list of free disks but is not contained in the list of occupied disks in the current stripe is iteratively selected using the search for the minimum sum of the elements of the combination matrix corresponding to the disk and the occupied disks in the current stripe. The minimum sum of the elements of the combination matrix is calculated at the permutation generation step. At each step in the permutation generation, one disk is added. The disk is selected based on the minimum of the sums in the combination matrix. The elements to be summed are selected according to the following logic, the row number is always the number of the candidate disk. The column number changes in a cycle that goes through all the elements of the current stripe array.

Процедура выбора диска (фиг.7) получает на вход матрицу сочетаний М, список дисков в текущем страйпе S и непустой список доступных дисков А. Первым делом инициализируем вспомогательные переменные, это номер диска с наименьшей локальной суммой и ему присваивается первый элемент списка доступных дисков. Минимум локальных сумм и ему присваивается значение максимального значения типа INT. Далее итеративно проходим по всем элементам списка А. Присваиваем в начале каждой итерации значение локальной суммы, равное нулю. Проходим по всем элементам списка дисков в текущем страйпе S и добавляем к локальной сумме значение матрицы сочетаний M, стоящее в ячейке (значение текущего элемента непустого списка А, значение текущего элемента непустого списка S). после прохода по списку S получаем значение локальной суммы. Если оно меньше значения текущего минимума локальных сумм, то запоминаем номер диска. Процедура возвращает диск из списка A с наименьшим значением локальной суммы.The disk selection procedure (Fig. 7) receives as input a matrix of combinations M, a list of disks in the current stripe S, and a non-empty list of available disks A. First, we initialize the auxiliary variables: the number of the disk with the smallest local sum, and the first element of the list of available disks is assigned to it. The minimum of local sums is assigned to it, and the maximum value of the INT type is assigned to it. Then, we iteratively go through all elements of list A. At the beginning of each iteration, we assign a local sum value equal to zero. We go through all elements of the list of disks in the current stripe S and add to the local sum the value of the matrix of combinations M, which is in the cell (the value of the current element of the non-empty list A, the value of the current element of the non-empty list S). After going through the list S, we obtain the value of the local sum. If it is less than the value of the current minimum of local sums, then we remember the disk number. The procedure returns the disk from list A with the smallest value of the local sum.

На этапе (107) добавляют диск в список занятых дисков в текущем страйпе и в конец списка дисков в текущей перестановке.At step (107), the disk is added to the list of occupied disks in the current stripe and to the end of the list of disks in the current permutation.

На этапе (108) как только длина списка занятых дисков в текущем страйпе достигла длины текущего страйпа L, обновляют матрицу сочетаний и присваивают списку занятых дисков в текущем страйпе значение пустого списка и повторяют процедуру генерации перестановки в карте размещения страйпов до тех пор, пока количество перестановок не достигло R.At step (108), as soon as the length of the list of occupied disks in the current stripe has reached the length of the current stripe L, the combination matrix is updated and the list of occupied disks in the current stripe is assigned the value of the empty list and the procedure for generating a permutation in the stripe allocation map is repeated until the number of permutations has reached R.

Процедура обновления матрицы сочетаний (фиг.8) получает на вход матрицу сочетаний и текущий страйп.Итеративно проходим по всем возможным значениям кортежа (d1, d2), где d1 и d2 это номера дисков из списка S и увеличиваем на 1 значение M[d1][d2]. После обработки всех кортежей процедура возвращает обновленную матрицу сочетаний.The procedure for updating the combination matrix (Fig. 8) receives the combination matrix and the current stripe as input. We iteratively go through all possible values of the tuple (d1, d2), where d1 and d2 are the disk numbers from the list S, and increase the value of M[d1][d2] by 1. After processing all tuples, the procedure returns the updated combination matrix.

После исполнения всех указанных выше этапов цикла осуществляют возвращение карты размещения страйпов, которая продолжит храниться в оперативной памяти на протяжении всего времени работы RAID-массива.After all the above stages of the cycle are completed, the stripe placement map is returned, which will continue to be stored in RAM for the entire duration of the RAID array operation.

Как следует из указанного выше, заявленное решение позволяет обеспечить высокий уровень надежности хранения данных и увеличение скорости восстановления RAID-массива за счет схемы расположения данных, обеспечивающей равномерное распределение нагрузки чтения по всем дискам во время восстановления данных.As follows from the above, the declared solution allows for a high level of data storage reliability and an increase in the speed of RAID array recovery due to the data arrangement scheme, which ensures uniform distribution of the reading load across all disks during data recovery.

На Фиг. 9 представлен общий пример вычислительного устройства (900), которое может представлять собой, например, компьютер, сервер, ноутбук, смартфон, SoC (System-on-a-Chip/Система на кристалле) и т.п. Устройство (900) может применяться для полной или частичной реализации заявленного способа (100).Fig. 9 shows a general example of a computing device (900), which may be, for example, a computer, a server, a laptop, a smartphone, a SoC (System-on-a-Chip), etc. The device (900) may be used for the full or partial implementation of the claimed method (100).

В общем случае устройство (900) содержит такие компоненты, как: один или более процессоров (901), по меньшей мере одну оперативную память (902), средство постоянного хранения данных (903), интерфейсы ввода/вывода (904) включая релейные выходы для соединения с контроллерами управления движения ленточного конвейера, средство В/В (905), средства сетевого взаимодействия (906).In general, the device (900) comprises components such as: one or more processors (901), at least one random access memory (902), a permanent data storage means (903), input/output interfaces (904) including relay outputs for connection to belt conveyor motion controllers, an I/O means (905), and network interaction means (906).

Процессор (901) устройства выполняет основные вычислительные операции, необходимые для функционирования устройства (900) или функционала одного или более его компонентов. Процессор (901) исполняет необходимые машиночитаемые команды, содержащиеся в оперативной памяти (902).The processor (901) of the device performs the basic computing operations necessary for the functioning of the device (900) or the functionality of one or more of its components. The processor (901) executes the necessary machine-readable commands contained in the RAM (902).

Память (902), как правило, выполнена в виде ОЗУ и содержит необходимую программную логику, обеспечивающую требуемый функционал. Средство хранения данных (903) может выполняться в виде HDD, SSD дисков, рейд массива, сетевого хранилища, флэш-памяти, оптических накопителей информации (CD, DVD, MD, BlueRay дисков) и т.п.Средство (903) позволяет выполнять долгосрочное хранение различного вида информации, например, запись магнитограмм, истории обработки запросов (логов), идентификаторов пользователей, данные камер, изображения и т.п.Memory (902) is usually implemented as RAM and contains the necessary software logic that provides the required functionality. Data storage facility (903) can be implemented as HDD, SSD disks, RAID array, network storage, flash memory, optical storage devices (CD, DVD, MD, BlueRay disks), etc. Facility (903) allows long-term storage of various types of information, such as recording magnetograms, request processing history (logs), user identifiers, camera data, images, etc.

Интерфейсы (904) представляют собой стандартные средства для подключения и работы с вычислительными устройствами. Интерфейсы (904) могут представлять, например, релейные соединения, USB, RS232/422/485 или другие, RJ45, LPT, UART, СОМ, HDMI, PS/2, Lightning, Fire Wire и т.п.для работы, в том числе, по протоколам Modbus и сетям Probfibus, Profinet или сетям иного типа. Выбор интерфейсов (904) зависит от конкретного исполнения устройства (900), которое может представлять собой, вычислительный блок (вычислительный модуль), например на базе ЦПУ (одного или нескольких процессоров), микроконтроллера и т.п., персональный компьютер, мейнфрейм, серверный кластер, тонкий клиент, смартфон, ноутбук и т.п., а также подключаемых сторонних устройств.Interfaces (904) are standard means for connecting and working with computing devices. Interfaces (904) can be, for example, relay connections, USB, RS232/422/485 or others, RJ45, LPT, UART, COM, HDMI, PS/2, Lightning, Fire Wire, etc. for work, including, according to Modbus protocols and Probfibus, Profinet networks or networks of another type. The choice of interfaces (904) depends on the specific design of the device (900), which can be a computing unit (computing module), for example based on a CPU (one or more processors), a microcontroller, etc., a personal computer, a mainframe, a server cluster, a thin client, a smartphone, a laptop, etc., as well as connected third-party devices.

В качестве средств В/В данных (905) может использоваться: клавиатура, джойстик, дисплей (сенсорный дисплей), проектор, тачпад, манипулятор мышь, трекбол, световое перо, динамики, микрофон и т.п.The following can be used as I/O data means (905): keyboard, joystick, display (touch display), projector, touchpad, mouse, trackball, light pen, speakers, microphone, etc.

Средства сетевого взаимодействия (906) выбираются из устройства, обеспечивающего сетевой прием и передачу данных, например, Ethernet карту, WLAN/Wi-Fi модуль, Bluetooth модуль, BLE модуль, NFC модуль, IrDa, RFID модуль, GSM модем, и т.п. С помощью средства (906) обеспечивается организация обмена данными по проводному или беспроводному каналу передачи данных, например, WAN, PAN, ЛВС (LAN), Интранет, Интернет, WLAN, WMAN или GSM, квантовый (оптоволоконный) канал передачи данных, спутниковая связь и т.п. Компоненты устройства (900), как правило, сопряжены посредством общей шины передачи данных.Network interaction means (906) are selected from a device that provides network reception and transmission of data, for example, an Ethernet card, a WLAN/Wi-Fi module, a Bluetooth module, a BLE module, an NFC module, IrDa, an RFID module, a GSM modem, etc. With the help of the means (906), data exchange is organized via a wired or wireless data transmission channel, for example, WAN, PAN, LAN, Intranet, Internet, WLAN, WMAN or GSM, a quantum (fiber optic) data transmission channel, satellite communication, etc. The components of the device (900), as a rule, are connected via a common data transmission bus.

Программа - последовательность инструкций, предназначенных для исполнения устройством управления вычислительной машины или устройством обработки команд.A program is a sequence of instructions intended for execution by a computer control device or a command processing device.

В настоящих материалах заявки было представлено предпочтительное раскрытие осуществления заявленного технического решения, которое не должно использоваться как ограничивающее иные, частные воплощения его реализации, которые не выходят за рамки испрашиваемого объема правовой охраны и являются очевидными для специалистов в соответствующей области техники.The present application materials present a preferred disclosure of the implementation of the claimed technical solution, which should not be used as limiting other, particular embodiments of its implementation that do not go beyond the scope of the requested scope of legal protection and are obvious to specialists in the relevant field of technology.

Claims

A computer-implemented method for placing data in RAID arrays for balanced load distribution during array recovery, comprising the steps of:

- create a new RAID array using the generate stripe map procedure;

- the number of free disks N, the length of the current stripe L and the length of the stripe placement map R are passed to the input of the stripe placement map generation procedure and, based on the data received, a stripe placement map is formed consisting of R concatenated permutations of the set {1, …, N};

- initialize the matrix of combinations M of size N×N, during which all its elements are assigned the value 0, the current stripe is initialized with an empty list;

- generate a permutation in the stripe placement map by calling the generate permutation procedure with the following input parameters: the current value of the combination matrix M, the list of occupied disks in the current stripe, the length of the current stripe L, the number of disks in the RAID array N;

- to generate one permutation based on the input parameters, the auxiliary structures are initialized, namely: the list of free disks is initialized with numbers from 1 to N, the list of disks in the current permutation is initialized with an empty list;

- iteratively select a disk that is contained in the list of free disks, but is not contained in the list of occupied disks in the current stripe, using the search for the minimum sum of the elements of the matrix of combinations corresponding to the disk and the occupied disks in the current stripe;

- add a disk to the list of occupied disks in the current stripe and to the end of the list of disks in the current permutation;

- as soon as the length of the list of occupied disks in the current stripe reaches the length of the current stripe L, the combination matrix is updated and the list of occupied disks in the current stripe is assigned the value of the empty list;

- repeat the procedure of generating a permutation in the stripe placement map until the number of permutations reaches R.