RU2297662C2

RU2297662C2 - Method for high speed control over blocks for instant copying in data storage systems with joint usage of memory by n units

Info

Publication number: RU2297662C2
Application number: RU2005120632/09A
Authority: RU
Inventors: Карлос Франсиско ФЬЮНТ (GB); Карлос Франсиско ФЬЮНТ; Уилль м Джеймс СКЕЙЛЕС (GB); Уилльям Джеймс СКЕЙЛЕС
Original assignee: Интернэшнл Бизнес Машинз Корпорейшн
Priority date: 2002-11-29
Filing date: 2003-08-14
Publication date: 2007-04-20
Also published as: US20060095682A1; DE60326632D1; AU2003253003A1; KR100745878B1; TWI226549B; PL376860A1; RU2005120632A; TW200413930A; JP2006508459A; GB0227825D0; CN100550894C; CN1703891A; WO2004051473A3; KR20050083858A; EP1566041A2; WO2004051473A2; ATE425623T1; AU2003253003A8; EP1566041B1

Abstract

FIELD: engineering of computers for controlling memory, in particular, external memory controllers.

SUBSTANCE: memory control device for operation in memory controller network contains memory controller being an owner unit, capable of controlling the blocking of certain data area during execution of input-output outputs, and component for exchanging messages, providing for transmission of at least one message with blocking request, permission of blocking, blocking removal request and blocking removal signal, and also input-output component, while any image of aforementioned data area, received by instant copying thereof, is maintained as coherent relatively to data area itself, and input-output component may position previous direct confirmation, that this data area remains coherent to any such image, to cash-memory, and may perform input-output operations on basis of aforementioned previous direct confirmation. Method describes operation of aforementioned device. Software product for computer is realized on machine-readable carrier and contains a program recorded thereon, realizing operations of aforementioned method.

EFFECT: expanded functional capabilities.

3 cl, 3 dwg

Description

Область техники, к которой относится изобретениеFIELD OF THE INVENTION

Настоящее изобретение относится к вычислительным устройствам для управления памятью, в частности к контроллерам внешней памяти с расширенными функциями в системах хранения данных с совместным использованием памяти n узлами, с функцией мгновенной копии данных.The present invention relates to computing devices for managing memory, in particular to external memory controllers with advanced functions in storage systems with shared memory n nodes, with the function of instant copy data.

Уровень техникиState of the art

В области компьютерных систем хранения данных все более востребованными становятся так называемые "расширенные функции". Эти функции выходят за рамки простого управления вводом-выводом, реализуемого обычными системами контроллеров памяти. Расширенные функции широко известны и опираются на управление метаданными, которые используются для сохранения информации о состоянии реальных, или "пользовательских" данных, хранящихся в системе. Расширенные функции позволяют быстро осуществлять различные действия с виртуальными образами, или "снимками" данных, при этом реальные данные остаются доступными для использования пользовательскими приложениями. Одной из таких хорошо известных расширенных функций является мгновенное, или моментальное, копирование данных.In the field of computer storage systems, the so-called "advanced functions" are becoming more and more popular. These functions go beyond the simple I / O control implemented by conventional memory controller systems. Advanced functions are widely known and rely on the management of metadata, which are used to store information about the state of real or "user" data stored in the system. Advanced functions allow you to quickly perform various actions with virtual images, or "snapshots" of data, while real data remains available for use by user applications. One such well-known advanced feature is instant, or instant, copying of data.

На самом высоком уровне мгновенное копирование представляет собой функцию, создающую вторичный образ неких данных. Такую функцию иногда называют копией данных в контрольной точке, или Т0-копией. Содержимое вторичного образа первоначально является идентичным содержимому первичного. Вторичный образ создается "мгновенно". На практике это означает, что вторичный образ создается за гораздо меньшее время, чем потребовалось бы для создания настоящей отдельной физической копии, и что при этом не происходит нежелательного нарушения работы программ, использующих копируемые данные.At the highest level, instant copying is a function that creates a secondary image of some data. Such a function is sometimes called a copy of data at a control point, or a T0 copy. The contents of the secondary image are initially identical to the contents of the primary. The secondary image is created "instantly". In practice, this means that the secondary image is created in much less time than would be required to create a real separate physical copy, and that there is no undesirable disruption to the programs using the data being copied.

Созданная таким образом вторичная копия может использоваться в различных целях, включая резервное копирование, проверку системы и извлечение информации из данных. При этом первичная копия продолжает использоваться первоначальным приложением в первоначальных целях. В отличие от этой функции для создания резервной копии без мгновенного копирования сначала необходимо закрыть приложение, сделать резервную копию и только потом можно снова запустить приложение. Находить временные окна, когда используемое приложение можно закрыть без ущерба для рабочего процесса, становится все сложнее. Таким образом, затраты на резервное копирование повышаются. Для предприятий возможность создания резервных копий при помощи функции мгновенного копирования данных без остановки рабочего процесса имеет значительную и все возрастающую ценность.The secondary copy created in this way can be used for various purposes, including backing up, checking the system, and extracting information from data. In this case, the primary copy continues to be used by the original application for the original purposes. In contrast to this function, to create a backup without instant copy, you must first close the application, make a backup, and only then you can start the application again. Finding time windows when the application in use can be closed without compromising the workflow is becoming increasingly difficult. Thus, backup costs increase. For enterprises, the ability to create backups using the function of instant copying data without stopping the workflow is of significant and increasing value.

В решениях, реализующих функцию мгновенного копирования, создается иллюзия существования некоего образа данных, что достигается перенаправлением запросов на чтение, адресуемых вторичному образу данных (далее - адресат), исходному образу (далее - источник), если только в эту область данных не производится запись. Когда в определенную область данных (в источник или адресат) должна быть произведена запись, то для сохранении иллюзии, что и источник, и адресат обладают собственной копией данных, инициируется процесс приостановки выполнения команды записи и, пока запись не состоялась, выдается команда на считывание перезаписываемой области данных из источника, считанные данные записываются в адресат, и затем (и только при условии успешного выполнения всех этапов) происходит снятие блокировки операции записи. При последующих командах записи в ту же область их выполнение можно не приостанавливать, поскольку адресат уже имеет собственную копию данных. Эта техника копирования при записи хорошо известна и используется во многих операционных средах.In solutions that implement the instant copy function, the illusion of the existence of a certain data image is created, which is achieved by redirecting read requests addressed to the secondary data image (hereinafter referred to as the destination), to the original image (hereinafter referred to as the source), unless writing is made to this data area. When a record has to be made to a specific data area (to the source or destination), in order to preserve the illusion that both the source and the destination have their own copy of the data, the process of pausing the write command is initiated and, until the recording has taken place, a read-write command is issued areas of data from the source, the read data is written to the destination, and then (and only if all the steps are successfully completed) the blocking of the write operation is released. With subsequent write commands to the same area, you can not pause their execution, since the destination already has its own copy of the data. This recording copy technique is well known and used in many operating environments.

В основе всех решений, реализующих функцию мгновенной копии, лежит использование структуры данных, управляющей принятием описанных выше решений, а именно решения относительно того, направляются ли полученные адресатом запросы на чтение источнику или адресату, и относительно того, необходимо ли приостанавливать операцию записи для осуществления копирования при записи. Вышеупомянутая структура данных по существу следит за областями или фрагментами данных, скопированных из источника в адресат, в отличие от данных, которые не были скопированы.The basis of all decisions that implement the instant copy function is the use of a data structure that controls the adoption of the decisions described above, namely, decisions as to whether read requests received by the addressee are sent to the source or destination, and whether it is necessary to suspend the write operation for copying when recording. The aforementioned data structure essentially keeps track of areas or pieces of data copied from source to destination, as opposed to data that has not been copied.

Поддержание такой структуры данных (далее - метаданные) является ключом к реализации алгоритма, лежащего в основе мгновенного копирования.Maintaining such a data structure (hereinafter referred to as metadata) is the key to the implementation of the algorithm underlying instant copying.

Мгновенное копирование относительно просто реализовать в одном процессорном комплексе (возможно, с симметричными многопроцессорными модулями), как это часто используется в современных контроллерах памяти. Ценой несколько больших усилий можно реализовать отказоустойчивое мгновенное копирование, когда доступ к копии метаданных имеют два (по меньшей мере два) процессорных комплекса. В случае отказа первого комплекса для продолжения работы может использоваться второй комплекс без потери доступа к адресату.Instant copying is relatively simple to implement in one processor complex (possibly with symmetric multiprocessor modules), as is often used in modern memory controllers. At the cost of a few great efforts, you can implement fault-tolerant instant copying when two (at least two) processor complexes have access to a copy of metadata. In case of failure of the first complex, the second complex can be used to continue operation without loss of access to the addressee.

Однако производительность одного процессорного комплекса по выполнению операций ввода-вывода ограничена. Даже при повышении производительности одного процессорного комплекса, выражаемой в количестве операций ввода-вывода в секунду или в пропускной способности (Мбит/с), она имеет конечный предел, что в итоге накладывает ограничение на производительность системы при работе с приложениями. Такой предел существует во многих случаях применения технологии мгновенного копирования, но наиболее наглядным является пример контроллеров памяти. Типичный контроллер памяти имеет один процессорный комплекс (либо возможно пару дублирующих друг друга процессорных комплексов), определяющий предел производительности такого контроллера.However, the performance of one processor complex for performing I / O operations is limited. Even with an increase in the performance of one processor complex, expressed in the number of I / O operations per second or in bandwidth (Mbps), it has a finite limit, which ultimately imposes a limitation on system performance when working with applications. Such a limit exists in many instances of the use of instant copy technology, but the most obvious example is the memory controller. A typical memory controller has one processor complex (or perhaps a couple of processor complexes duplicating each other), which determines the performance limit of such a controller.

Можно установить дополнительные контроллеры памяти. Но такие отдельные контроллеры памяти не имеют совместного доступа к метаданным и, следовательно, не взаимодействуют при управлении мгновенным образом данных. Объем памяти становится фрагментированным, причем область действия функция мгновенного копирования ограничивается рамками системы одного контроллера. Диск-источник и диск-адресат должны находиться под управлением одного и того же контроллера памяти. Пространство на диске одного такого контроллера памяти может оказаться заполненным, тогда как на диске другого контроллера может оставаться свободное пространство, но диск-источник и диск-адресат невозможно разделить, передав диск-адресат под управление другого контроллера. (Это особенно досадно в случае новой мгновенной копии, когда перемещение адресата не требует значительных ресурсов, поскольку с адресатом не сопоставлены какие-либо физические данные.)You can install additional memory controllers. But such separate memory controllers do not share metadata and, therefore, do not interact when managing instant data. The amount of memory becomes fragmented, and the scope of the instant copy function is limited to the system of one controller. The source disk and destination disk must be under the control of the same memory controller. The disk space of one such memory controller may be full, while the disk of another controller may have free space, but the source disk and the destination disk cannot be divided by transferring the destination disk under the control of another controller. (This is especially annoying in the case of a new snapshot, when moving the destination does not require significant resources, because no physical data is associated with the destination.)

Помимо ограничения потенциальной эффективности пары "источник-адресат", ограниченные функции системы хранения данных с единственным контроллером дополнительно усложняют администрирование средств хранения данных.In addition to limiting the potential effectiveness of the source-destination pair, the limited functions of the data storage system with a single controller further complicate the administration of data storage facilities.

В современных системах управления памятью обычно не делается попыток решить эту проблему. Эти системы реализуют способы мгновенного копирования, допускающие использование только одного контроллера, и поэтому их функциональные возможности ограничены производительностью такого контроллера.Modern memory management systems usually do not attempt to solve this problem. These systems implement instant copy methods that allow the use of only one controller, and therefore their functionality is limited by the performance of such a controller.

Простой способ задействования нескольких контроллеров в совместном использовании функции мгновенного копирования заключается в том, что один из контроллеров назначается владельцем метаданных, при этом остальные контроллеры должны направлять ему все запросы считывания и записи. Используя описанный выше алгоритм, владеющий данными контроллер обрабатывает запросы ввода-вывода так, как если бы они поступали непосредственно от его закрепленных за ним хост-серверов, и возвращает каждый выполненный запрос ввода-вывода направившему его контроллеру.An easy way to enable multiple controllers to share the instant copy function is to use one of the controllers as the owner of the metadata, while the rest of the controllers should send them all read and write requests. Using the algorithm described above, the controller owning the data processes the I / O requests as if they were coming directly from its host servers assigned to it, and returns each completed I / O request to the controller that sent it.

Основной недостаток такой системы и причина, сдерживающая ее распространение, заключаются в том, что затраты на пересылку каждого запроса ввода-вывода настолько велики, что могут даже удваивать расход ресурсов на уровне всей системы и, следовательно, примерно вдвое снижать производительность системы.The main drawback of such a system and the reason that it is spreading are the fact that the cost of sending each I / O request is so high that they can even double the cost of resources at the system level and, therefore, reduce the system performance by about half.

Известно, например, в области распределенных систем баз данных со средствами параллельной обработки данных использование структуры распределенного управления блокировками, в которой используется протокол двухфазного блокирования для блокировки данных с целью обеспечения когерентности любых копий данных. Однако двухфазное блокирование обычно сопряжено со значительными затратами времени и приводит к снижению производительности системы из-за необходимости обмена сообщениями. Известный из уровня техники протокол двухфазного блокирования как таковой, в изначальном виде, нецелесообразно использовать в системах более низкого уровня программно-аппаратного комплекса, таких как сети хранения данных с распределенными контроллерами памяти, в которых влияние прохождения управляющих блокировками сообщений на производительность сети имеет еще большее значение, чем на уровне управления базами данных.It is known, for example, in the field of distributed database systems with parallel data processing, the use of a distributed locking control structure that uses a two-phase locking protocol to lock data in order to ensure the coherence of any data copies. However, two-phase blocking is usually time-consuming and leads to a decrease in system performance due to the need for messaging. The original two-phase blocking protocol, as such, in its original form, is not practical to use in lower-level systems of a hardware-software complex, such as storage networks with distributed memory controllers, in which the influence of the passage of blocking control messages on network performance is even more important than at the database management level.

Таким образом, желательно использовать преимущества распределенного управления блокировками в вычислительных средах, реализующих функцию мгновенного копирования, при минимальных затратах вычислительных ресурсов на передачу блокировочных сообщений.Thus, it is desirable to take advantage of the distributed control of locks in computing environments that implement the instant copy function, while minimizing the cost of computing resources for transmitting blocking messages.

Краткое изложение сущности изобретенияSummary of the invention

Для решения вышеупомянутой задачи в настоящем изобретении предлагается устройство управления памятью для работы в сети контроллеров памяти, включающей контроллер памяти, являющийся узлом-владельцем, способным управлять блокировкой определенной области данных при выполнении операций ввода-вывода, и компонент для обмена сообщениями, обеспечивающий передачу по меньшей мере одного сообщения с запросом блокировки, разрешением блокировки, запросом снятия блокировки и сигналом снятия блокировки, содержащее компонент ввода-вывода, обеспечивающий выполнение операций ввода-вывода в отношении данных, принадлежащих любому узлу-владельцу, при условии соответствия этого компонента блокировочным протоколам, управляемым этим узлом-владельцем, причем любой образ указанной области данных, полученный путем мгновенного копирования последней, поддерживается когерентным по отношению к самой области данных, а компонент ввода-вывода может помещать в кэш-память предыдущее прямое подтверждение, что эта область данных осталась когерентной любому такому образу, и выполнять операции ввода-вывода на основе этого предыдущего прямого подтверждения.To solve the aforementioned problem, the present invention proposes a memory management device for operating in a network of memory controllers, including a memory controller, which is an owner node capable of controlling the blocking of a certain data area during I / O operations, and a messaging component that provides transmission of at least at least one message with a request for blocking, blocking permission, request for unlocking and a signal for unlocking, containing an input-output component, which carries out I / O operations with respect to data belonging to any owner node, provided that this component complies with the blocking protocols managed by this owner node, and any image of the specified data region obtained by instantly copying the latter is maintained coherent with respect to the region itself data, and the I / O component can cache the previous direct confirmation that this data area has remained coherent to any such image and perform operations ode-output based on this previous direct confirmation.

Предпочтительно, чтобы компонент ввода-вывода мог отбрасывать помещенное в кэш-память прямое подтверждение когерентности и впоследствии снова запрашивать блокировку.Preferably, the I / O component can discard the direct cache coherence placed in the cache and subsequently request a lock again.

Путем такого избирательного отбрасывания помещенного в кэш-память прямого подтверждения когерентности может осуществляться управление ограниченным участком кэш-памяти.By such selective discarding of the cache direct confirmation of coherence, a limited portion of the cache can be controlled.

Предпочтительно также, чтобы прямое подтверждение когерентности в отношении одной области данных также являлось прямым подтверждением когерентности в отношении другой, смежной с ней, области данных.It is also preferred that a direct confirmation of coherence with respect to one data area is also a direct confirmation of coherence with respect to another adjacent data area.

Объектом настоящего изобретения является также способ управления памятью в сети контроллеров памяти, включающей контроллер памяти, являющийся узлом-владельцем, способным управлять блокировкой определенной области данных при выполнении операций ввода-вывода, и компонент для обмена сообщениями, обеспечивающий передачу по меньшей мере одного сообщения с запросом блокировки, разрешением блокировки, запросом снятия блокировки и сигналом снятия блокировки, заключающийся в том, что выполняют операции ввода-вывода в отношении данных, принадлежащих любому узлу-владельцу, при условии соответствия блокировочным протоколам, управляемым этим узлом-владельцем, поддерживают образ области данных, полученный путем мгновенного копирования последней, когерентным по отношению к самой области данных, и помещают в кэш-память предыдущее прямое подтверждение того, что указанная область данных остается когерентной любому такому образу, причем операции ввода-вывода осуществляют на основе этого предыдущего прямого подтверждения.The object of the present invention is also a method of managing memory in a network of memory controllers, including a memory controller, which is an owner node that can control the blocking of a certain data area during I / O operations, and a messaging component that transmits at least one request message blocking, blocking permission, request for unlocking and signal for unlocking, which consists in the fact that I / O operations are performed in relation to data, belonging to any owner node, provided that the blocking protocols managed by this owner node comply, they support the image of the data region obtained by instantly copying the last one, coherent with respect to the data region itself, and place the previous direct confirmation that the specified the data area remains coherent with any such image, with I / O operations based on this previous direct confirmation.

При осуществлении предлагаемого в изобретении способа предпочтительно отбрасывать помещенное в кэш-память прямое подтверждение когерентности с последующим повторным запросом блокировки.When implementing the method of the invention, it is preferable to discard the direct confirmation of coherence placed in the cache memory, followed by a repeated blocking request.

В этом случае путем избирательного отбрасывания помещенного в кэш-память прямого подтверждения когерентности можно управлять ограниченным участком кэш-памяти.In this case, by selectively dropping the cache of direct confirmation of coherence, a limited portion of the cache can be controlled.

Еще одним объектом изобретения является программный продукт для ЭВМ, материально реализованный на машиночитаемом носителе и содержащий программу, которая при загрузке в вычислительную систему и выполнении в ней управляет устройством управления памятью в сети контроллеров памяти, включающей в себя контроллер, являющийся узлом-владельцем данных, управляющим блокировкой области данных при выполнении операций ввода-вывода, и компонент для обмена сообщениями, обеспечивающий передачу по меньшей мере одного сообщения с запросом блокировки, разрешением блокировки, запросом снятия блокировки и сигналом снятия блокировки, с обеспечением выполнения устройством управления памятью следующих действий: выполнения операций ввода-вывода в отношении данных, принадлежащих любому узлу-владельцу, при условии соответствия блокировочным протоколам, управляемым этим узлом-владельцем; поддержания образа области данных, полученного путем мгновенного копирования последней, когерентным по отношению к самой области данных; помещения в кэш-память предыдущего прямого подтверждения того, что указанная область данных остается когерентной любому такому образу, причем операции ввода-вывода выполняются на основе этого предыдущего прямого подтверждения.Another object of the invention is a computer software product, materially implemented on a computer-readable medium and containing a program that, when loaded into a computer system and executed in it, controls a memory management device in a network of memory controllers, which includes a controller that is the data owner node that controls locking a data area during I / O, and a messaging component that provides at least one message requesting a lock , Interlocking resolution request unlock signal and the lock release, with enforcement of the memory management unit following actions: performing input-output operations in relation to data belonging to the owner of any node, subject to compliance with the locking protocols controlled by the node-holder; maintaining the image of the data region obtained by instant copying of the latter, coherent with respect to the data region itself; placing in the cache the previous direct confirmation that the specified data area remains coherent to any such image, and I / O operations are performed based on this previous direct confirmation.

В предпочтительном варианте осуществления настоящего изобретения используется схема обмена сообщениями при двухфазном блокировании, позволяющая координировать работу нескольких контроллеров памяти (или узлов) в системе с n узлами (n-сторонней системе). Передача сообщений обеспечивает согласование работы узлов системы, но при этом каждый узел по-прежнему отвечает за осуществление собственных операций ввода-вывода. Каждый узел сети имеет средство кэширования для помещения в кэш-память результатов его предыдущих запросов блокировки, что позволяет этому узлу повторно использовать некоторые из результатов, обходясь без запросов относительно состояния той или иной области данных, если такое состояние уже прямо отражено предыдущими результатами, помещенными в кэш-память.In a preferred embodiment of the present invention, a two-phase blocking messaging scheme is used to coordinate the operation of several memory controllers (or nodes) in a system with n nodes (n-sided system). Message passing ensures coordination of the system nodes, but each node is still responsible for its own I / O operations. Each node of the network has a cache facility for caching the results of its previous blocking requests into the cache, which allows this node to reuse some of the results, avoiding requests regarding the state of a particular data area, if such a state is already directly reflected by the previous results placed in cache memory.

Краткое описание чертежейBrief Description of the Drawings

Далее в качестве примера описан предпочтительный вариант осуществления настоящего изобретения со ссылкой на приложенные чертежи, на которых показано:The following describes, by way of example, a preferred embodiment of the present invention with reference to the accompanying drawings, in which:

на фиг.1 - блок-схема, иллюстрирующая вариант реализации схемы двухфазного блокирования с использованием блокировочных сообщений для контроля когерентности области данных и ее образа, полученного мгновенным копированием,figure 1 is a block diagram illustrating an embodiment of a two-phase blocking scheme using blocking messages to control the coherence of the data region and its image obtained by instant copying,

на фиг.2 - компоненты предлагаемой в изобретении системы в ее предпочтительном исполнении,figure 2 - components proposed in the invention of the system in its preferred embodiment,

на фиг.3 - дополнительные операции, выполняемые при осуществлении изобретения в его предпочтительном варианте.figure 3 - additional operations performed during the implementation of the invention in its preferred embodiment.

Подробное описание предпочтительного варианта осуществленияDetailed Description of a Preferred Embodiment

Для лучшего понимания предпочтительного варианта осуществления настоящего изобретения следует рассмотреть обмен сообщениями при двухфазном блокировании с целью координации работы нескольких контроллеров памяти (или узлов) в системе хранения данных, имеющей n узлов (n-сторонняя система).For a better understanding of the preferred embodiment of the present invention, consideration should be given to two-phase blocking messaging in order to coordinate the operation of several memory controllers (or nodes) in a data storage system having n nodes (n-sided system).

В качестве примера рассмотрим систему с n узлами, в которой реализована функция мгновенного копирования области данных. Допустим, что каждый узел имеет доступ к памяти, управляемой группой из n взаимодействующих узлов. На шаге 102 один из узлов назначается владельцем метаданных, касающихся всех взаимоотношений по вводу-выводу в определенной области данных. Остальные узлы назначаются клиентами. Кроме того, в наиболее предпочтительном на данное время варианте осуществления изобретения один из узлов-клиентов назначается запасным владельцем и поддерживает копию метаданных, что в случае отказа узла-владельца обеспечивает их постоянную доступность.As an example, consider a system with n nodes that implements the instant copy function of a data region. Assume that each node has access to memory controlled by a group of n interacting nodes. At step 102, one of the nodes is designated as the owner of the metadata relating to all the I / O relationships in a particular data area. The remaining nodes are assigned by clients. In addition, in the currently most preferred embodiment of the invention, one of the client nodes is designated as the backup owner and maintains a copy of the metadata, which in case of failure of the owner node ensures their constant availability.

Рассмотрим случай, когда выполнения запроса на ввод-вывод, который на шаге 104 поступает от хост-системы (от англ. "host" - в данном случае автономный компьютер, например сервер или рабочая станция) в определенный узел-клиент С. Предполагается, что запрос ввода-вывода от хост-системы относится к считыванию или записи на диске-адресате, либо возможно операцией записи на диск-источник. Узел-клиент С начинает обработку данных путем приостановки выполнения запросов ввода-вывода на шаге 106. Затем на шаге 108 узел-клиент С направляет узлу-владельцу О сообщение REQ с вопросом, скопирован ли соответствующий фрагмент области данных.Consider the case when the I / O request, which at step 104 is received from the host system (from the English "host" - in this case, a stand-alone computer, such as a server or workstation) to a specific client node C. It is assumed that an I / O request from the host system refers to reading or writing to the destination disk, or possibly by writing to the source disk. The client node C begins processing the data by pausing the I / O requests in step 106. Then, in step 108, the client node C sends a REQ message to the owner node O asking if the corresponding fragment of the data area has been copied.

После получения запроса REQ узел-владелец О проверяет структуры собственных метаданных. Если узел-владелец О установит, что область данных уже скопирована, на шаге 110 узел-владелец О передает в ответ сообщение NACK (отрицательное подтверждение). Если же узел-владелец О установит, что область данных еще не скопирована, он поместит напротив соответствующих метаданных, относящихся к вышеупомянутой области данных и находящихся в его собственных структурах метаданных, блокировочную запись и на шаге 112 ответит узлу-клиенту, направив ему сообщение GNT (запрос блокировки выполнен). Блокировочная запись необходима для обеспечения совместимости между только что принятым и выполненным запросом и поступающими впоследствии запросами, которые могут повлиять на те же метаданные в процессе продолжения обработки данных в узле-клиенте С. Процедура поддержания блокировочной записи в силе и ограничения на совместимость аналогичны процедуре и ограничениям в том случае, если бы запрос ввода-вывода был получен локально узлом О, что хорошо известно специалистам в данной области техники.After receiving the REQ request, the O node owns the structure of its own metadata. If the owner node O determines that the data area has already been copied, in step 110, the owner node O transmits a NACK (negative acknowledgment) response. If the owner node O determines that the data region has not yet been copied, it will place a lock record in front of the corresponding metadata related to the above data region and located in its own metadata structures, and in step 112 it will respond to the client node by sending it a GNT message ( lock request completed). A lock record is necessary to ensure compatibility between a request that has just been received and executed and subsequently received requests that may affect the same metadata while processing data in client node C. Continue to maintain the lock record and the compatibility restrictions are similar to the procedure and restrictions in that case, if the I / O request was received locally by node O, which is well known to specialists in this field of technology.

После получения сообщения NACK узел-клиент С на шаге 114 возобновляет выполнение приостановленного исходного запроса ввода-вывода.After receiving the NACK message, the client node C in step 114 resumes the execution of the suspended initial I / O request.

После получения сообщения GNT узел-клиент С на шаге 116 продолжает работу, выполняя одну или несколько операций передачи данных, требуемых алгоритмом мгновенного копирования. В случае операции считывания из адресата это означает осуществление считывания применительно к исходному диску. Некоторое время спустя на шаге 118 узел-клиент С завершает обработку запроса на считывание и на шаге 120 передает узлу О сообщение UNL (блокировку снять) одновременно с передачей сообщения о завершении обработки исходного запроса на ввод-вывод хост-системе, которая направила такой запрос.After receiving the GNT message, the client node C continues to work in step 116, performing one or more data transfer operations required by the instant copy algorithm. In the case of a read operation from the destination, this means that the reading is applied to the source disk. Some time later, at step 118, the client node C completes the processing of the read request and, at step 120, transmits the UNL message (unlock lock) to the O node at the same time as sending the message on completion of the processing of the initial I / O request to the host system that sent such a request.

После получения сообщения UNL узел-владелец О на шаге 122 удаляет блокировочную запись из своей таблицы метаданных, тем самым возобновляя обработку других запросов ввода-вывода, выполнение которых было приостановлено из-за такой блокировки. В рассматриваемом наиболее предпочтительном варианте осуществления изобретения на шаге 124 узел-владелец О передает узлу-клиенту С сообщение UNLD (блокировка снята), разрешая узлу-клиенту С снова использовать ресурсы, указанные в исходном запросе. Однако собственно алгоритм мгновенной копии этого не требует.After receiving the UNL message, the O host in step 122 removes the lock record from its metadata table, thereby resuming processing of other I / O requests that were suspended due to such a lock. In the most preferred embodiment of the invention under consideration, in step 124, the owner node O transmits the UNLD message (lock is released) to the client node C, allowing the client node C to reuse the resources indicated in the original request. However, the instant copy algorithm itself does not require this.

В случае операции записи (в адресат или источник) узел-клиент С должен на шаге 127 выполнить операцию копирования при записи. После завершения всех шагов копирования при записи и пока исходный запрос записи ввода-вывода остается приостановленным, на шаге 126 узел-клиент С передает узлу О сообщение UNLC (запрос снятия блокировки копии).In the case of a write operation (to the destination or source), the client node C must perform a copy operation during recording at step 127. After completing all the copying steps during recording and while the initial I / O write request remains suspended, at step 126, the client node C transmits to the node O a UNLC message (request to unlock the copy).

После получения сообщения UNLC узел-владелец О на шаге 128 помечает в метаданных соответствующую область как скопированную, на шаге 130 снимает блокировку записи, на шаге 132 отвечает на все ожидающие выполнения запросы, сообщая, что область данных скопирована, и затем на шаге 134 передает узлу-клиенту С сообщение UNLD.After receiving the UNLC message, the O owner node in step 128 marks the corresponding area in the metadata as copied, removes the write lock in step 130, responds to all pending requests in step 132, notifying that the data area is copied, and then passes to the node in step 134 client C is an UNLD message.

После получения сообщения UNLD узел-клиент С на шаге 136 возобновляет приостановленную операцию записи, которая выполняется через некоторое время, а затем на шаге 138 сообщает хост-системе о завершении операции записи.After receiving the UNLD message, the client node C in step 136 resumes the paused write operation, which is performed after a while, and then in step 138 informs the host system of the completion of the write operation.

В случае ошибки ввода-вывода на диске, отказа системы обмена сообщениями или отказа узла необходимы пути восстановления данных, однако требования к таким путям и их реализация хорошо известны из техники.In the event of an I / O error on the disk, a messaging system failure, or a node failure, data recovery paths are necessary, however, the requirements for such paths and their implementation are well known in the art.

Рассмотренные выше операции описаны применительно к одному запросу ввода-вывода и с точки зрения одного узла-клиента С. Однако не вызывает вопросов действие описанной схемы при наличии множества запросов ввода-вывода, поступающих от множества узлов-клиентов, которые узел-владелец О обрабатывает, используя описанный выше алгоритм.The operations described above are described with respect to one I / O request and from the point of view of one client node C. However, the operation of the described circuit does not raise questions when there are many I / O requests coming from many client nodes that the owner node O processes, using the algorithm described above.

На фиг.2 показано предлагаемое в изобретении устройство в предпочтительном на данное время варианте его выполнения, реализованное в сети контроллеров памяти, включающей узел-владелец 202, узел-клиент 204, компонент, осуществляющий операции ввода-вывода, участок метаданных 206, относящийся к данным 208, которыми управляет сеть контроллеров памяти, копию 209 данных 208 и средства связи. Устройство имеет компонент 210, управляющий распределением прав владения данными и закрепляющий за узлом 202 права владельца метаданных, и компонент 212 для управления блокировками, способный управлять блокировкой на уровне метаданных 206 в процессе операций ввода-вывода для обеспечения их когерентности с любой копией 209. Устройство также включает компонент 214 для обмена сообщениями, который расположен при узле-владельце 202 и служит для осуществления обмена между узлом-клиентом 204 и узлом-владельцем 202 одним или несколькими сообщениями с запросом о состоянии метаданных, разрешением на установку блокировки, запросом снятия блокировки, а также сигналом снятия блокировки. Узел-клиент 204 осуществляет операции ввода-вывода в отношении данных, метаданными которых владеет узел-владелец 202, при условии выполнения узлом-клиентом 204 блокировочных протоколов на уровне метаданных, которыми управляет такой узел-владелец 202.Figure 2 shows the device of the invention in the currently preferred embodiment, implemented in a network of memory controllers, including an owner node 202, a client node 204, an input / output component, a metadata section 206 related to data 208, which is controlled by a network of memory controllers, a copy 209 of data 208 and communications. The device has a component 210 that controls the distribution of data ownership rights and secures the owner of the metadata to the node 202, and a lock management component 212 that can control the lock at the metadata level 206 during I / O to ensure coherence with any copy 209. The device also includes a component 214 for messaging, which is located at the host node 202 and serves to exchange between the client node 204 and the host node 202 one or more messages with a request Som metadata condition, the lock setting resolution request unlock and unlock signal. The client node 204 performs I / O operations on data whose metadata is owned by the owner node 202, provided that the client node 204 performs blocking protocols at the metadata level managed by such owner node 202.

Описанные выше система и способ позволяют осуществлять распределенное управление блокировками в содержащей n узлов сети контроллеров совместно используемой памяти, однако недостатком этой системы являются значительные непроизводительные затраты, связанные с обменом сообщениями в системе. В системах с относительно небольшим числом контроллеров или относительно малой активностью такой недостаток не столь существенен, чего, однако, нельзя сказать про современные системы хранения данных, такие как крупные сети устройств хранения данных с большим числом контроллеров и очень высокой активностью обращения к памяти. В таких условиях целесообразно исключить излишний обмен сообщениями.The system and method described above allow distributed control of locks in the n-shared network of controllers of shared memory, however, the disadvantage of this system is the significant overhead associated with messaging in the system. In systems with a relatively small number of controllers or relatively low activity, this drawback is not so significant, which, however, cannot be said about modern data storage systems, such as large networks of storage devices with a large number of controllers and very high memory access activity. In such circumstances, it is advisable to eliminate excessive messaging.

Поэтому для усовершенствования процесса использования системы по обработке данных в наиболее предпочтительном варианте осуществления настоящего изобретения для каждого узла-клиента предусмотрена возможность сохранять информацию, в которой записан последний ответ, полученную от узла-владельца. В частности (см. фиг.3, дополняющую блок-схему на фиг.1), узлу-клиенту С разрешено на шаге 308 помещать в кэш-память информацию о получении им сообщения NACK после шага 114, показанного на фиг.1, или о передаче и подтверждении им пары сообщений UNLC/UNLD на шаге 126 и после шага 134, показанного на фиг.1.Therefore, to improve the process of using the data processing system in the most preferred embodiment of the present invention, it is possible for each client node to store information in which the last response received from the owner node is recorded. In particular (see FIG. 3, supplementing the block diagram in FIG. 1), the client node C is allowed at step 308 to cache information on its receipt of the NACK message after step 114 shown in FIG. 1, or transmitting and acknowledging to him a pair of UNLC / UNLD messages in step 126 and after step 134 shown in FIG.

После получения от хост-системы на шаге 302 запроса ввода-вывода, аналогичного показанному на фиг.1 шагу 104, узел-клиент С в данном случае использует описанный далее измененный алгоритм управления блокировками.After receiving an I / O request from the host system at step 302, similar to the step 104 shown in FIG. 1, the client node C in this case uses the modified lock control algorithm described below.

Сначала на шаге 303 узел-клиент С проверяет, содержат ли его помещенные в кэш-память данные указание на то, что затронутая область данных уже скопирована. Если это так, то на шаге 304 узел-клиент С продолжает обработку запроса ввода-вывода, не передавая узлу О протокольных сообщений.First, at step 303, the client node C checks to see if its cached data contains an indication that the affected data area has already been copied. If so, then at step 304, the client node C continues processing the I / O request without transmitting protocol messages to the node O.

Если кэш-память не содержит такого указания, используется описанный выше протокол без изменений. Узел-клиент С выполняет шаг 106 и последующие шаги, показанные на фиг.1. В случае получения на шаге 306 сообщения NACK или пары сообщений UNLC/UNLD на шаге 308 происходит обновление находящейся в кэш-памяти информации, и после обнаружения на шаге 303 такой информации в кэш-памяти обработка последующих запросов ввода-вывода, которые затрагивают такую область, может продолжаться на шаге 304 без передачи протокольных сообщений.If the cache does not contain such an indication, the protocol described above is used without modification. The client node C performs step 106 and the subsequent steps shown in FIG. If you receive a NACK message or a pair of UNLC / UNLD messages in step 306, the information in the cache is updated in step 308, and after such information is found in the cache in step 303, the processing of subsequent I / O requests that affect such an area may continue at step 304 without transmitting protocol messages.

При описании технологии, необходимой для работы настоящего изобретения в его наиболее предпочтительном на данное время варианте иногда используют понятие "пессимистический кэш". Это означает, что узел-клиент может не иметь полной обновленной копии метаданных узла-владельца: узел-клиент может считать, что определенная область памяти нуждается в копировании, а узел-владелец поправит его, уведомив сообщением NACK, что копирования не требуется. Но узел-клиент никогда не должен считать, что некая область скопирована, если узлу-владельцу известно обратное.When describing the technology necessary for the operation of the present invention in its currently most preferred embodiment, the term "pessimistic cache" is sometimes used. This means that the client node may not have a full updated copy of the metadata of the owner node: the client node may consider that a certain memory area needs to be copied, and the owner node will correct it by notifying the NACK message that copying is not required. But the client node should never assume that a certain area is copied unless the owner node knows the opposite.

Для правильного функционирования системы при кэшировании сведений о блокировках в соответствии с наиболее предпочтительным вариантом осуществления изобретения в работу узла-клиента требуется внести ряд изменений. Во-первых, каждый раз при начале взаимодействия по созданию мгновенных копий (на шаге 300а) на шаге 301 необходимо инициализировать кэш-память (чтобы указать, что необходимо копировать все области). Это может быть сделано различными способами, но наиболее простым является сообщение узла-владельца узлу-клиенту. Во-вторых, каждый раз, когда узел-клиент на шаге 300b мог пропустить сообщение о том, что кэш-память была инициализирована повторно (возможно, из-за нарушения режима в энергосистеме), узел-клиент должен исходить из наихудшего варианта и на шаге 301 повторно инициализировать свою кэш-память или перепроверить правильность содержащихся в ней данных.For the system to function correctly when caching lock information in accordance with the most preferred embodiment of the invention, a number of changes are required in the operation of the client node. Firstly, each time you start the instant snapshot interaction (in step 300a) in step 301, you need to initialize the cache (to indicate that all areas need to be copied). This can be done in various ways, but the simplest is to tell the owner node to the client node. Secondly, every time a client node in step 300b could skip a message that the cache was re-initialized (possibly due to a violation of the mode in the power system), the client node should proceed from the worst case and in step 301 Re-initialize your cache or double-check the correctness of the data contained in it.

В предлагаемый в изобретении способ могут быть внесены дальнейшие добавления и изменения, очевидные для специалистов в данной области техники. Например, кэшированную информацию можно отбрасывать, т.е. не учитывать, поскольку ее всегда можно восстановить при помощи узла-владельца, в котором хранится единственная и истинно обновленная копия. Таким образом, в узле-клиенте для кэширования информации может быть выделена меньшая часть отведенного под метаданные пространства, чем было бы необходимо для хранения всех метаданных, имеющихся во всех узлах сети. В этом случае узлы-клиенты могли бы полагаться на местонахождение доступа в отношении обрабатываемых ими операций ввода-вывода, чтобы иметь возможность и далее пользоваться кэшированием информации, содержащейся в сообщениях о блокировках.Further additions and changes obvious to those skilled in the art can be made to the method of the invention. For example, cached information can be discarded, i.e. do not take into account, since it can always be restored using the owner node, in which the only and truly updated copy is stored. Thus, in the client node for caching information, a smaller part of the space allocated for metadata can be allocated than would be necessary to store all metadata available in all nodes of the network. In this case, the client nodes could rely on the location of access with respect to the I / O operations that they process, in order to be able to continue to use the caching of the information contained in the blocking messages.

В другом варианте изобретения с расширенной функциональностью сообщение NACK (а также сообщения GNT или UNLD) может содержать дополнительную информацию сверх той, что относится непосредственно к области, обрабатываемой с использованием сообщений REQ/GNT/UNLC/UNLD. Узлы-владельцы могут также пересылать узлам-клиентам информацию, касающуюся соседних областей, которые также были очищены.In another embodiment of the invention with enhanced functionality, a NACK message (as well as GNT or UNLD messages) may contain additional information beyond that directly related to the area processed using REQ / GNT / UNLC / UNLD messages. Host nodes can also forward client nodes information regarding neighboring areas that have also been cleared.

Типовым вариантом осуществления описанного выше способа является реализация в программном обеспечении, выполняемом на одном или нескольких процессорах (на чертежах не показаны), и это программное обеспечение может представлять собой элемент компьютерной программы, записанный на подходящем носителе данных (также не показан), таком как магнитный или оптический диск. Аналогичным образом каналы передачи данных могут включать любого рода среды для хранения информации, а также среды передачи сигналов, такие как проводные или беспроводные каналы передачи сигналов.A typical embodiment of the above method is implementation in software running on one or more processors (not shown in the drawings), and this software may be a computer program element recorded on a suitable storage medium (also not shown), such as magnetic or optical disk. Similarly, data transmission channels may include any kind of information storage medium, as well as signal transmission media such as wired or wireless signal transmission channels.

Настоящее изобретение может быть подходящим образом осуществлено в виде программного продукта для использования в вычислительных системах. Изобретение может быть осуществлено в виде последовательности машиночитаемых команд, записанных на материальном носителе, таком как машиночитаемый носитель, например дискета, ПЗУ на компакт-диске (CD-ROM), ПЗУ или жесткий диск, либо передаваемых вычислительной системе посредством модема или иного устройства интерфейса с использованием материальной среды, включая (и не только) оптические или аналоговые линии связи, или нематериальным способом с использованием беспроводных технологий, включая (и не только) СВЧ-технику, инфракрасную или иную технологию передачи сигналов. В последовательности машиночитаемых команд частично или полностью реализованы описанные выше функциональные возможности.The present invention may suitably be implemented as a software product for use in computing systems. The invention can be implemented in the form of a sequence of computer-readable instructions recorded on a tangible medium, such as a computer-readable medium, such as a floppy disk, ROM on a compact disc (CD-ROM), ROM or hard disk, or transmitted to a computing system via a modem or other interface device with using a material medium, including (and not only) optical or analog communication lines, or an intangible way using wireless technologies, including (and not only) microwave equipment, infrared or other signal transmission technology. In the sequence of machine-readable commands, the functionality described above is partially or fully implemented.

Для специалистов ясно, что такие машиночитаемые команды могут быть записаны на ряде языков программирования, применимых в разнообразных архитектурах ЭВМ или операционных системах. Кроме того, такие команды могут храниться с использованием любой технологии хранения, существующей или будущей, включая (и не только) полупроводниковую, магнитную или оптическую технологию, либо передаваться с использованием любой техники связи, существующей или будущей, включая (и не только) оптическую, инфракрасную или СВЧ-технику. Предполагается, что такой программный продукт для ЭВМ может распространяться в виде съемного носителя с сопроводительной документацией в печатном или электронном виде, например, в виде закрытых программных средств (программный пакет без раскрытия внутренней структуры), предварительно загруженных в вычислительную систему, например, на системном ПЗУ или несъемном диске, либо распространяться с сервера или электронной доски объявлений по сети, например, через Интернет или всемирную паутину.For specialists, it is clear that such machine-readable instructions can be written in a number of programming languages applicable in a variety of computer architectures or operating systems. In addition, such commands can be stored using any storage technology, existing or future, including (and not only) semiconductor, magnetic or optical technology, or transmitted using any communication technology, existing or future, including (and not only) optical, infrared or microwave technology. It is assumed that such a computer software product can be distributed in the form of removable media with accompanying documentation in printed or electronic form, for example, in the form of closed software (software package without revealing the internal structure) preloaded into a computer system, for example, on a system ROM or a non-removable disk, or distributed from a server or electronic bulletin board over a network, for example, via the Internet or the World Wide Web.

Для специалиста очевидно, что в описанный выше вариант осуществления настоящего изобретения могут быть внесены различные изменения.It will be apparent to those skilled in the art that various changes may be made to the above-described embodiment of the present invention.

Claims

1. A memory management device for operating in a network of memory controllers, including a memory controller, which is the owner node, capable of controlling the blocking of a certain data area during I / O operations, and a messaging component that provides at least one blocking request message , a blocking permission, an unlocking request, and an unlocking signal, comprising an input / output component for performing input / output operations on data belonging to to any owner node, provided that this component matches the blocking protocols managed by this owner node, and any image of the specified data region obtained by instantly copying the latter is maintained coherent with respect to the data region itself, and the input / output component can be placed in cache previous direct confirmation that this data area has remained coherent to any such image, and perform I / O based on this previous direct confirmation.

2. The device according to claim 1, in which the input-output component can discard the direct confirmation of coherence placed in the cache memory and subsequently again request a lock.

3. The device according to claim 2, in which, by selectively discarding a direct confirmation of the coherence placed in the cache memory, a limited section of the cache memory is controlled.

4. The device according to claim 1, in which a direct confirmation of coherence with respect to one data area is also a direct confirmation of coherence with respect to another adjacent data area.

5. A method of managing memory in a network of memory controllers, including a memory controller, which is the owner node, capable of controlling the blocking of a certain data area during I / O operations, and a messaging component that provides the transmission of at least one message with a lock request, resolution blocking, a request for unlocking and a signal for unlocking, namely, that

perform I / O operations with respect to data belonging to any owner node, provided that the blocking protocols managed by this owner node are consistent,

support the image of the data region obtained by instantly copying the latter, coherent with respect to the data region itself, and

put the previous direct confirmation that the indicated data area remains coherent to any such image in the cache, moreover, I / O operations are based on this previous direct confirmation.

6. The method according to claim 5, in which the direct confirmation of the coherence placed in the cache is discarded, followed by a repeated lock request.

7. The method according to claim 6, in which, by selectively discarding a direct confirmation of the coherence placed in the cache memory, a limited portion of the cache memory is controlled.

8. The method according to claim 5, in which a direct confirmation of coherence with respect to one data area is also a direct confirmation of coherence with respect to another adjacent data area.

9. A computer software product materially implemented on a computer-readable medium and containing a program that, when loaded into a computer system and executed in it, controls a memory management device in a network of memory controllers, which includes a controller that is the data owner that controls the blocking of the data area when performing I / O operations, and a component for messaging that provides the transmission of at least one message with a lock request, lock permission, request to remove Ia lock and unlock the signal, with enforcement of the memory management unit following:

performing I / O operations with respect to data belonging to any owner node, provided that the blocking protocols managed by this owner node are consistent,

maintaining the image of the data region obtained by instant copying of the latter, coherent with respect to the data region itself,

placing in the cache the previous direct confirmation that the specified data area remains coherent to any such image, and I / O operations are performed based on this previous direct confirmation.