TWI468930B - Performing a data write on a storage device - Google Patents

Performing a data write on a storage device Download PDF

Info

Publication number
TWI468930B
TWI468930B TW98133282A TW98133282A TWI468930B TW I468930 B TWI468930 B TW I468930B TW 98133282 A TW98133282 A TW 98133282A TW 98133282 A TW98133282 A TW 98133282A TW I468930 B TWI468930 B TW I468930B
Authority
TW
Taiwan
Prior art keywords
storage device
transaction
data
device driver
snapshot copy
Prior art date
Application number
TW98133282A
Other languages
Chinese (zh)
Other versions
TW201027325A (en
Inventor
C Mcallister
Lucy Raw
Bruce Smith
Gordon Hutchison
Original Assignee
Ibm
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ibm filed Critical Ibm
Publication of TW201027325A publication Critical patent/TW201027325A/en
Application granted granted Critical
Publication of TWI468930B publication Critical patent/TWI468930B/en

Links

Description

於儲存裝置執行資料寫入Perform data writing on the storage device

本發明關於一種於儲存裝置執行資料寫入的方法以及系統,特別是,本發明一實施例中提供一機制以允許儲存子系統參與交易復原(transactional rollbacks)。The present invention relates to a method and system for performing data writing in a storage device. In particular, an embodiment of the present invention provides a mechanism to allow storage subsystems to participate in transactional rollbacks.

在任何硬體故障的事件中,大型組織內的資料儲存對於資料的可靠性以及資料復原能力,具有相當根本的重要性。當大量資料需要在可靠且安全的方式下儲存,儲存區域網路(Storage area network,SAN)是一種被採用的架構。此技術允許創造一網路,可支援伺服器連接到遠端電腦的儲存裝置,例如磁碟陣列,對作業系統而言,該裝置可以如同係本地連結般的呈現。這樣的網路中,在資料儲存以及個別元件之間的硬體連結,經常可見包含了大量的冗餘。In any hardware failure event, data storage within large organizations is of fundamental importance for data reliability and data recovery capabilities. When a large amount of data needs to be stored in a reliable and secure manner, a storage area network (SAN) is a adopted architecture. This technology allows the creation of a network that supports the server's connection to a remote computer's storage device, such as a disk array, which can be rendered as if it were a local connection to the operating system. In such networks, the storage of data and the hardware links between individual components are often seen to contain a large amount of redundancy.

不同方法的存在造成了資料冗餘。例如,像是快照複製功能之類的功能,可令管理者在一時間點(point-in-time)上,利用可取得的讀寫存取即時性地複製資料的完整磁碟複本。快照複製可以使用於你工作環境中可取得的標準備份工具,以便於磁帶(tape)上創造複本。快照複製在目標磁碟上可以創造一份來源磁碟的複本。此複本稱之為時間點複本。The existence of different methods creates data redundancy. For example, functions such as the snapshot copy function allow the administrator to instantly copy a full disk copy of the data at a point-in-time using available read and write access. Snapshot replication can be used with standard backup tools available in your work environment to create replicas on tape. A snapshot copy can create a copy of the source disk on the target disk. This copy is called a point-in-time copy.

當一快照複製啟動時,來源磁碟與目標磁碟之間建立一關係。此關係是來源磁碟與目標磁碟的”映射”。此映射允許來源磁碟的時間點複本被複製到有關聯的目標磁碟上。從快照複製的運作被啟始,直到儲存單元從來源磁碟複製所有資料到目標磁碟或是關係被刪除,該關係皆存在於這對磁碟之間。When a snapshot copy is initiated, a relationship is established between the source disk and the target disk. This relationship is the "mapping" of the source disk and the target disk. This mapping allows a point-in-time copy of the source disk to be copied to the associated target disk. The operation of copying from the snapshot is initiated until the storage unit copies all the data from the source disk to the target disk or the relationship is deleted, and the relationship exists between the pair of disks.

當資料已實際被複製,一背景處理複製從來源磁碟到目標磁碟的路徑。完成背景複製所需花費的時間,端賴以下的準則:被複製資料的數量、發生的背景複製處理的數量以及任何其它發生的活動。When the data has actually been copied, a background process copies the path from the source disk to the target disk. The time it takes to complete the background copy depends on the following criteria: the amount of material being copied, the amount of background copy processing that occurred, and any other activity that occurred.

在儲存中,使用者可以創造一快照複製,以取得某些儲存磁碟的時間點備份。假若使用者隨後在儲存上發生了問題,他們可以逆轉該快照複製,以回存資料已儲存的版本。快照複製關係的方向係可逆轉的,先前定義的目標磁碟可變成來源磁碟,先前定義的來源磁碟可變成目標磁碟。已改變的資料被複製到先前定義為來源的磁碟中。In storage, the user can create a snapshot copy to get a point-in-time backup of some storage disks. If the user subsequently has a problem with the storage, they can reverse the snapshot copy to restore the saved version of the data. The direction of the snapshot copy relationship is reversible. The previously defined target disk can become the source disk, and the previously defined source disk can become the target disk. The changed material is copied to the disk previously defined as the source.

假如管理者希望回存來源磁碟(磁碟A)回到他們原先執行快照複製的時間點上,管理者可以逆轉快照複製關係。實際上,他們逆轉快照複製的運作,呈現出快照複製的運作彷彿從未發生。在可能逆轉作為來源的磁碟A與作為目標的磁碟B之前,快照複製運作的背景複製處理必須完成。If the administrator wants to restore the source disk (disk A) back to the point where they originally performed the snapshot copy, the administrator can reverse the snapshot copy relationship. In fact, they reversed the operation of snapshot replication, and the operation of snapshot replication seemed to never happen. The background copy processing of the snapshot copy operation must be completed before it is possible to reverse the disk A as the source and the target disk B.

存在某些情況是需要逆轉原始快照複製關係的。例如,在來源磁碟A與目標磁碟B之間已創造一快照複製關係,然後來源磁碟A上發生資料遺失。因此可能逆轉該快照複製關係,以令磁碟B被複製到磁碟A。There are some cases where it is necessary to reverse the original snapshot replication relationship. For example, a snapshot copy relationship has been created between the source disk A and the target disk B, and then data loss occurs on the source disk A. Therefore, it is possible to reverse the snapshot copy relationship so that the disk B is copied to the disk A.

很不幸地,在這個執行資料儲存的方法中具有一些不利因素。例如,使用快照複製功能,則直接回存為快照複製被執行之時間點,但是在試圖執行資料還原的情境中,這並非永遠是正確的時間。同樣地,當執行複製係作為一種時鐘時間(clock time)功能,當作背景工作,而非根據系統中所執行,則該時間點對於資料復原而言可能不具有用處。甚至連續性的資料保護並未自動化,且對於何時執行備份也沒有概念。此外,因為許多運行中交錯系統的性質以及必須令其全部交錯一致化(cross-consistent),複本會傾向於大型化而且包含了許多磁碟的組合。這造成了大量處理以及儲存的負擔。系統必須以整體的方式被停止並且刷新。在多數的快照複製情況中,應用會被停止而且裝置驅動器在快照複製之前刷新快取資料。這將刷新所有的應用資料到儲存裝置,用以建立一個完整的映像檔,包含被快照複製的磁碟、也可能包含被其它磁碟快取的資料,以及與快照複製無關的應用。Unfortunately, there are some disadvantages in this method of performing data storage. For example, using the snapshot copy feature, it is directly back to the point in time when the snapshot copy was executed, but in the context of trying to perform a data restore, this is not always the right time. Similarly, when the copying system is implemented as a clock time function, acting as a background, rather than being executed according to the system, this point in time may not be useful for data recovery. Even continuous data protection is not automated and there is no concept of when to perform backups. In addition, because of the nature of many interleaved systems in operation and the need to make them all cross-consistent, replicas tend to be large and contain many combinations of disks. This creates a lot of processing and storage burden. The system must be stopped and refreshed in a holistic manner. In most snapshot replication situations, the application is stopped and the device drive refreshes the cached data before the snapshot is copied. This will refresh all application data to the storage device to create a complete image file containing the disk being copied by the snapshot, possibly the data cached by other disks, and applications not related to snapshot copying.

本發明之第一態樣在於提出一種於一儲存裝置執行一資料寫入的方法,其包含:針對該儲存裝置,指示一裝置驅動器對該儲存裝置執行資料寫入;以一交易協調器標示該裝置驅動器作為交易參與器;對該儲存裝置執行一快照複製(flashcopy);執行資料寫入於儲存裝置;以及在裝置驅動器與交易協調器之間執行一雙階段提交(two-phase commit)。A first aspect of the present invention provides a method for performing a data write in a storage device, comprising: instructing a device driver to perform data writing on the storage device for the storage device; and indicating by the transaction coordinator The device driver acts as a transaction participant; performs a flashcopy of the storage device; writes the execution data to the storage device; and performs a two-phase commit between the device driver and the transaction coordinator.

本發明之第二態樣在於提出一種於一儲存裝置執行一資料寫入的系統,其包含:一檔案系統,用以針對該儲存裝置指示一裝置驅動器對該儲存裝置執行一資料寫入;一交易協調器,用以標示該裝置驅動器作為一交易參與器;一儲存裝置;以及一裝置驅動器對應於該儲存裝置,用以執行該儲存裝置的一快照複製,以令執行該資料寫入於該儲存裝置以及藉由該交易協調器執行一雙階段提交。A second aspect of the present invention provides a system for performing a data write in a storage device, comprising: a file system for instructing a device driver to perform a data write to the storage device for the storage device; a transaction coordinator for indicating the device driver as a transaction participant; a storage device; and a device driver corresponding to the storage device for performing a snapshot copy of the storage device to cause the execution of the data to be written to the device The storage device and a two-phase commit are performed by the transaction coordinator.

本發明之第三態樣在於提出一種在一電腦可讀取媒體上的電腦程式產品,用以針對一儲存裝置而操作一裝置驅動器,該電腦程式產品包含指令用以:接收一指令後對該儲存裝置執行一資料寫入,以一交易協調器標示該裝置驅動器作為一交易參與器;對該儲存裝置執行一快照複製;執行該資料寫入於該儲存裝置;以及以該交易協調器執行一雙階段提交。A third aspect of the present invention provides a computer program product on a computer readable medium for operating a device driver for a storage device, the computer program product including instructions for: receiving an instruction The storage device performs a data writing, indicating, by the transaction coordinator, the device driver as a transaction participant; performing a snapshot copy on the storage device; executing the data writing to the storage device; and executing the transaction coordinator with the transaction coordinator Two-phase submission.

因為本發明,所以可以提供一種方法能在儲存裝置上維持資料的完整性,該儲存裝置連結於一交易系統,例如一儲存區域網路。本發明提供一種有效且具效率的方法,假如一交易(包含資料寫入)失敗時可以將資料回存,而不需要過度的處理或儲存負擔。Because of the present invention, it is possible to provide a method for maintaining the integrity of data on a storage device that is coupled to a transaction system, such as a storage area network. The present invention provides an efficient and efficient method of recovering data if a transaction (including data writing) fails without excessive processing or storage burden.

本發明在一實施例中係為一機制,對於被標示的儲存磁碟允許將用於控制器磁碟的多路徑/裝置驅動器軟體,在一中介軟體協調下變成一交易參與器,例如DB2或Websphere軟體。本發明允許一網路儲存磁碟變成一交易檔案服務,其中一交易復原將會把寫入到儲存系統的所有資料回存到該交易開始的狀態。該裝置驅動器可以參與該交易。如果該裝置驅動器在一特定線程(thread)中被通知一交易開始,則驅動器可以記錄該線程更新時所有對儲存磁碟的存取。該驅動器可以通知儲存控制器該交易,該交易在執行第一次資料寫入前會建立一個具有時間標記(timestamp)的可逆轉快照複製。In one embodiment, the present invention is a mechanism for allowing a multi-path/device driver software for a controller disk to be converted into a transactional participant, such as DB2 or Websphere software. The present invention allows a network storage disk to become a transaction file service, wherein a transaction recovery will restore all data written to the storage system to the state in which the transaction started. The device driver can participate in the transaction. If the device driver is notified of a transaction in a particular thread, the drive can record all accesses to the storage disk when the thread is updated. The drive can notify the storage controller of the transaction, which establishes a reversible snapshot copy with a timestamp before performing the first data write.

在準備時,如果希望能夠刷新任何快取(cache)到磁碟,但這不必立即對所有資料進行,一完整刷新作為今日的事例,其可以僅針對被更新的區塊以作為此交易的一部份。該系統不必被停止。本發明現在允許僅刷新在交易中相關聯的資料,該交易包含要寫入到被執行快照複製之磁碟的資料,而其它在交易中無關聯的資料可以仍維持於快取中。這是由於在本發明操作中,裝置驅動器以及應用伺服器可以持續接受對其它交易的新工作,當每一個交易的快取資料可以獨立的刷新,則不必為了達成此操作而刷新所有的資料,因此也不必等待所有的工作單元停止。交易協定可以在資料流量元資料數據(data traffic metadata)中被傳輸,該資料交流數據係被傳送於驅動器以及儲存控制器之間。檔案系統中的適當配置選擇,能允許使用者設定他們需要的磁碟以及交易協調。In preparation, if you want to be able to refresh any cache to the disk, but this does not have to be done immediately for all the data, a complete refresh as an example of today, it can only be used for the updated block as a transaction Part. The system does not have to be stopped. The present invention now allows for only refreshing the data associated with the transaction, the transaction containing the data to be written to the disk on which the snapshot was performed, while other data unrelated in the transaction may remain in the cache. This is because in the operation of the present invention, the device driver and the application server can continue to accept new work for other transactions. When the cache data of each transaction can be refreshed independently, it is not necessary to refresh all the data in order to achieve this operation. Therefore, it is not necessary to wait for all work units to stop. The transaction agreement can be transmitted in data traffic metadata, which is transmitted between the drive and the storage controller. Appropriate configuration options in the file system allow users to set the disk they need and transaction coordination.

快照複製可以係為增加性的,藉此只將前次快照複製被觸發後有更改過的資料複製過去(或回來)。此為操作快照複製最有效率的形式。當只有應用程式寫入(在交易期間)所分割的顆粒(grain)必須被複製回來,而需要被背景複製處理傳輸到目的地之資料集的其它部份不必被複製回來時,則不需作背景複製處理。Snapshot replication can be incremental, so that only the changed material is copied (or returned) after the previous snapshot replication was triggered. This is the most efficient form of operating snapshot replication. When only the grain that the application writes (during the transaction) must be copied back, and the other parts of the data set that need to be transferred to the destination by the background copy process do not have to be copied back, then no need to be made. Background copy processing.

較佳地,該方法更包含:在接收一指令以執行一復原之後,同時根據快照複製逆轉資料寫入。在任何的復原傳送,裝置驅動器可以切換快照複製的方向以回存資料集為交易開始時所具有的內容。在接收一指令以提交交易之後,該裝置驅動器可以丟棄該快照複製。交易結束後可以丟棄快照複製。Preferably, the method further comprises: after receiving an instruction to perform a restoration, simultaneously reversing the data writing according to the snapshot copy. In any resume transfer, the device driver can switch the direction of the snapshot copy to restore the data set to what was at the beginning of the transaction. After receiving an instruction to submit a transaction, the device driver can discard the snapshot copy. The snapshot copy can be discarded after the transaction ends.

較有助益地,係將在儲存裝置上執行資料寫入的步驟緊接於儲存裝置的快照複製啟始後。這可確保儲存裝置所採用的快照複製,係發生在緊接於儲存磁碟之實際寫入動作的時間點之前。這將確保在任何可能的未來復原中,使用快照複製可以將儲存於儲存裝置的資料回復為僅早於資料寫入開始前的原始狀態。More advantageously, the step of performing a data write on the storage device is immediately after the snapshot copy of the storage device is initiated. This ensures that the snapshot copy used by the storage device occurs before the point in time immediately following the actual write action of the storage disk. This will ensure that in any possible future recovery, snapshot copying can be used to reclaim data stored in the storage device only prior to the original state before the data was written.

理想情況下,應用一交易協調器標示該裝置驅動器作為一交易參與器的步驟,係由裝置驅動器所完成。較佳實施例中,係應用交易協調器將裝置驅動器本身作標示。這方式具有簡化程序的優點,該程序係確保裝置驅動器對於正確的交易被標示為交易參與器。Ideally, the step of applying a transaction coordinator to indicate that the device driver is a transactional participant is accomplished by the device driver. In the preferred embodiment, the application coordinator is used to label the device driver itself. This approach has the advantage of simplifying the procedure by ensuring that the device driver is marked as a transaction participant for the correct transaction.

圖1概要顯示執行資料寫入的系統。檔案系統10與裝置驅動器12進行通訊,該驅動器12係特別對應於儲存裝置14。裝置驅動器12也與交易協調器16進行通訊。檔案系統10與交易協調器16皆係軟體元件,且位於一個或多個包含有儲存區域網路(SAN)與儲存裝置14的伺服器中。在本發明的實施例中,具有大量的伺服器與儲存裝置交互連結在一起,以構成整個網路。裝置驅動器12可以係純軟體元件,或係包含軟體元件以及實體層。Figure 1 outlines the system for performing data writing. The file system 10 is in communication with a device driver 12, which corresponds in particular to the storage device 14. Device driver 12 also communicates with transaction coordinator 16. Both file system 10 and transaction coordinator 16 are software components and are located in one or more servers that include a storage area network (SAN) and storage device 14. In an embodiment of the invention, a large number of servers are interconnected with the storage device to form the entire network. The device driver 12 can be a pure software component or a software component and a physical layer.

檔案系統10以及交易協調器16的軟體元件具有對應於外部應用程式的應用程式介面,其中外部應用程式同樣係在儲存區域網路中執行。例如,該網路可以管理一個組織用來接收對於欲購買貨物之訂單的商務網站,儲存裝置14儲存客戶經由網站所訂購的訂單。在此情況,該網路中執行了一應用程式,且該應用程式具有一使用者介面可透過網站接收訂單,然後採取必要的動作,像是創造一份提供儲存裝置14儲存的訂單。該應用程式將與檔案系統10互動,以執行寫入資料到儲存裝置14的工作。The file system 10 and the software components of the transaction coordinator 16 have application interfaces corresponding to external applications, wherein the external applications are also executed in the storage area network. For example, the network can manage a business website that an organization uses to receive orders for goods to be purchased, and the storage device 14 stores orders placed by customers via the website. In this case, an application is executed in the network, and the application has a user interface to receive an order through the website, and then takes necessary actions, such as creating an order for providing storage device 14. The application will interact with the file system 10 to perform the work of writing data to the storage device 14.

交易協調器16係一軟體元件,可確保在網路中所採取的任何動作符合交易處理所需的程度。交易處理被設計為藉由確保在系統中完成的任何獨立運作不是全部完成就是全部取消,以維持一電腦系統,諸如所討論的網路,在一個已知、一致的狀態中。在網路中每一個工作單元透過交易協調器16進行管理,確保每一個工作單元的一致性。交易處理避免硬體與軟體錯誤,該硬體與軟體錯誤係留下僅部份完成的交易,使得網路處於未知、不一致的狀態。假使該網路(或任何與它連結的元件)在交易的中途失效,交易協調器16可擔保在任何未提交(未完全處理)交易中的操作被取消。The transaction coordinator 16 is a software component that ensures that any action taken in the network is as close as required for transaction processing. Transaction processing is designed to maintain a computer system, such as the network in question, in a known, consistent state by ensuring that any independent operations performed in the system are not all completed or all cancelled. Each unit of work in the network is managed by the transaction coordinator 16 to ensure consistency of each unit of work. Transaction processing avoids hardware and software errors. The hardware and software errors leave only partially completed transactions, leaving the network in an unknown, inconsistent state. In case the network (or any component connected to it) fails in the middle of the transaction, the transaction coordinator 16 can guarantee that the operation in any uncommitted (not fully processed) transaction is cancelled.

圖2顯示圖1中的系統在某些工作單元被觸發後的狀態,該些工作單元係位於需要資料寫入到儲存裝置14中的網路。如上所述,這是一應用程式於採行某些行動後的結果,該應用程式係於一網路中執行,例如一終端使用者在網路維持的企業系統所管理的網站中,進行一項訂購。結果,使用者的訂單需要儲存於儲存裝置14中,該儲存裝置14儲存與訂單相關的資訊。2 shows the state of the system of FIG. 1 after certain work units have been triggered, the work units being located in a network that requires data to be written to storage device 14. As mentioned above, this is the result of an application that performs a certain operation. The application is executed in a network, for example, an end user is in a website managed by a network-maintained enterprise system. Order. As a result, the user's order needs to be stored in the storage device 14, which stores the information associated with the order.

第一個行動(1)係檔案系統10針對儲存裝置14指示裝置驅動器12執行資料寫入至該儲存裝置14中。對應於此,第二行動(2)應用交易協調器16將裝置驅動器12標示為交易參與器。圖中顯示行動(2)係由裝置驅動器12本身所發起,但實際上以交易協調器16標示裝置驅動器12也可以由系統中的其它元件執行,例如由檔案系統10。The first action (1) is that the file system 10 instructs the device driver 12 to perform data writing into the storage device 14 for the storage device 14. Corresponding thereto, the second action (2) application transaction coordinator 16 identifies the device driver 12 as a transaction participant. The action (2) shown in the figure is initiated by the device driver 12 itself, but actually the device driver 12 is indicated by the transaction coordinator 16 and may also be executed by other components in the system, such as by the file system 10.

此標示的目的,係將裝置驅動器12變成交易處理系統中的一參與器,並承擔所有伴隨的需求。裝置驅動器12在發起行動後現在是系統的一部份,相對於資料寫入,其必須被交易協調器16驗證,並且就交易處理的意義而言形成大型交易的一部份。The purpose of this designation is to turn the device driver 12 into a participant in the transaction processing system and assume all of the attendant requirements. Device driver 12 is now part of the system after initiating the action, which must be verified by transaction coordinator 16 with respect to data writing and form part of a large transaction in the sense of transaction processing.

圖3顯示資料寫入處理的下一階段,也就是,裝置驅動器12經交易協調器16標示後,被用於執行儲存裝置14的快照複製。這動作如圖3中的行動(3)所示,並導致儲存裝置14的磁碟被複製到一新儲存位置18,可能是在整個網路中的不同硬體部份或簡單的新邏輯位置。快照複製功能的本質,就是在一特定時間點製造儲存於儲存裝置14的資料之複本。快照複製功能會產生一背景工作,該背景工作將儲存於儲存裝置14的資料,根據可取得的頻寬以預定速率複製到新儲存位置18,例如但不限於一新磁碟。另外,儲存裝置14中資料的任何改變,或新儲存位置18對資料的請求,會導致從儲存裝置14到新儲存位置18的資料自動複製。3 shows the next stage of the data write process, that is, after the device driver 12 is indicated by the transaction coordinator 16, it is used to perform a snapshot copy of the storage device 14. This action is illustrated by action (3) in Figure 3 and causes the disk of storage device 14 to be copied to a new storage location 18, possibly a different hardware portion or a simple new logical location throughout the network. . The essence of the snapshot copy function is to make a copy of the data stored in the storage device 14 at a particular point in time. The snapshot copy function generates a background job that copies the data stored in the storage device 14 to a new storage location 18 at a predetermined rate based on the available bandwidth, such as, but not limited to, a new disk. Additionally, any changes to the material in the storage device 14, or requests for data from the new storage location 18, may result in automatic copying of data from the storage device 14 to the new storage location 18.

在執行任何寫入到儲存裝置14之前,裝置驅動器12會啟始對儲存裝置14之內容的快照複製。確實,裝置驅動器12直接採取資料寫入的工作而不必透過任何中介行動係具有益處的,以確保新儲存位置18中保持的複本,即為新的資料寫入被執行之前的資料之複本。實際上,裝置驅動器12準備儲存於儲存裝置14中的資料之複本,以針對交易中的任何失敗作準備,該特定資料寫入構成了該交易的一部份。The device driver 12 initiates a snapshot copy of the contents of the storage device 14 prior to performing any writing to the storage device 14. Indeed, the device driver 12 takes the work of data writing directly without having to benefit from any intermediary action to ensure that a copy is maintained in the new storage location 18, i.e., a new copy of the data prior to execution is written. In effect, device driver 12 prepares a copy of the data stored in storage device 14 to prepare for any failure in the transaction that constitutes part of the transaction.

圖4中顯示資料寫入之處理的最後兩個行動,包含在儲存裝置14上執行資料寫入的行動(4),以及在裝置驅動器12與交易協調器16之間執行雙階段提交的行動(5)。此兩個行動中的第一行動,行動(4)係從裝置驅動器12到儲存裝置14的一個傳統資料寫入。在快照複製功能的控制之下,如果資料尚未被背景工作複製,這將觸發儲存裝置14中的覆寫資料被複製到圖3中的新儲存位置18。The last two actions of the process of writing data are shown in Figure 4, including the act of performing a data write on the storage device 14 (4), and the act of performing a two-phase commit between the device driver 12 and the transaction coordinator 16 ( 5). The first of these two actions, action (4), is a conventional data write from device driver 12 to storage device 14. Under the control of the snapshot copy function, if the material has not been copied by the background work, this will trigger the overwrite data in the storage device 14 to be copied to the new storage location 18 in FIG.

此兩個行動中的第二行動,在裝置驅動器12與交易協調器16之間執行雙階段提交的行動(5),係應用交易協調器16標示裝置驅動器12作為交易參與器的必要條件,如以上圖2中行動(2)的細節。這行動顯示為兩個元件12與16之間的雙向通訊。在包含資料寫入到儲存裝置14的當前工作單元中,每一個動作元件都需要雙階段提交。The second of the two actions, the action (5) of performing the two-phase commit between the device driver 12 and the transaction coordinator 16, is the requirement for the application transaction coordinator 16 to indicate the device driver 12 as a transactional participant, such as The details of action (2) in Figure 2 above. This action is shown as two-way communication between the two components 12 and 16. In the current work unit containing the data written to the storage device 14, each action element requires a two-phase commit.

雙階段提交協定係一種分散式演算法,可確保交易完成之前,在分散式網路中的所有元件同意提交交易。該協定可致使所有的元件不是提交該交易,便是中止該交易。在演算法的雙階段中,首先,係交易協調器16準備參與元件的提交請求階段,以及交易協調器16完成交易的提交階段。A two-phase commit protocol is a decentralized algorithm that ensures that all components in a decentralized network agree to commit a transaction before the transaction is completed. The agreement may cause all components not to submit the transaction, ie to suspend the transaction. In the two phases of the algorithm, first, the transaction coordinator 16 prepares the commit request phase of the participating component, and the transaction coordinator 16 completes the commit phase of the transaction.

在裝置驅動器12上雙階段提交的作用,係裝置驅動器12必須發送一同意訊息或是一中止訊息給交易協調器16,與行動(4)的資料寫入是否完成係相互獨立的。此外,裝置驅動器12必須等待來自交易協調器16的提交或復原訊息,以完成雙階段提交處理。如果裝置驅動器12被指示復原資料寫入,那麼裝置驅動器12必須存取已被覆寫之原始資料的快照複製,以正確地執行復原。The effect of the two-phase commit on device driver 12 is that device driver 12 must send a consent message or an abort message to transaction coordinator 16, independent of whether the data write of action (4) is complete. In addition, device driver 12 must wait for a commit or restore message from transaction coordinator 16 to complete the two-phase commit process. If device driver 12 is instructed to restore data writes, then device driver 12 must access a snapshot copy of the original data that has been overwritten to properly perform the restore.

在圖2至圖4顯示的處理,可總結成圖5的流程圖。在儲存裝置14上執行資料寫入的方法包含:首先,如步驟S1,指示裝置驅動器12對儲存裝置14執行資料寫入至儲存裝置14。完成後,在步驟S2中應用交易協調器16標示裝置驅動器12作為交易參與器。當裝置驅動器12被標示後,在下一步驟的步驟S3中執行儲存裝置14的快照複製。當快照複製被啟始後,緊接著步驟S4在儲存裝置14上執行資料寫入,步驟S5係在裝置驅動器12以及交易協調器16之間執行雙階段提交。The processes shown in Figures 2 through 4 can be summarized into the flow chart of Figure 5. The method of performing data writing on the storage device 14 includes: first, in step S1, the pointing device driver 12 performs data writing to the storage device 14 to the storage device 14. Upon completion, the application transaction coordinator 16 identifies the device driver 12 as a transaction participant in step S2. When the device driver 12 is marked, the snapshot copy of the storage device 14 is performed in step S3 of the next step. When the snapshot copy is initiated, data writing is performed on the storage device 14 immediately following step S4, and step S5 performs a two-phase commit between the device driver 12 and the transaction coordinator 16.

雙階段提交處理的兩種可能結果,表示為二個互斥的步驟S6以及S7。第一種可能性是步驟S6,包含接收一指令以執行復原,並且根據快照複製逆轉資料寫入。第二種可能性是步驟S7,包含接收一指令以提交交易,並且丟棄快照複製。參照圖2至圖4與以上的描述,步驟S1至S5中最重要的優點,就是在一復原事件中,其若不是由雙階段提交中投票否決的裝置驅動器12所引起,便是由指示復原的交易協調器16引起(在不同的參與器投票否決之後),然後快照複製可用來重建儲存裝置14上的資料。快照複製導致儲存於裝置14之資料的一時間點複本被複製到新儲存位置18,而且可以被逆轉,以有效地消除資料寫入。如果存在一判定(decision)要提交包含當前工作單元的交易,那麼該快照複製可以被丟棄。Two possible outcomes of the two-phase commit process are represented as two mutually exclusive steps S6 and S7. The first possibility is step S6, which involves receiving an instruction to perform the restoration and reversing the data write based on the snapshot copy. The second possibility is step S7, which involves receiving an instruction to submit the transaction and discarding the snapshot copy. Referring to Figures 2 through 4 and the above description, the most important advantage of steps S1 through S5 is that in a recovery event, if it is not caused by the device driver 12 voted to reject the two-phase commit, it is restored by the indication. The transaction coordinator 16 is caused (after a different participant vote rejection), and then the snapshot copy can be used to reconstruct the data on the storage device 14. The snapshot copy causes a copy of the data stored in device 14 to be copied to the new storage location 18 and can be reversed to effectively eliminate data writes. If there is a decision to submit a transaction containing the current unit of work, then the snapshot copy can be discarded.

10...檔案系統10. . . File system

12...裝置驅動器12. . . Device driver

14...儲存裝置14. . . Storage device

16...交易協調器16. . . Transaction coordinator

18...新儲存位置18. . . New storage location

圖1係用以於儲存裝置上執行資料寫入的系統概要圖。1 is a schematic diagram of a system for performing data writing on a storage device.

圖2至圖4係圖1進一步的系統概要圖,顯示系統中的資料流。Figures 2 through 4 are further system overviews of Figure 1 showing the flow of data in the system.

圖5係用以於儲存裝置上執行資料寫入的方法流程圖。Figure 5 is a flow chart of a method for performing data writing on a storage device.

Claims (11)

一種於一儲存裝置執行一資料寫入的方法,其包含針對該儲存裝置,指示一裝置驅動器對該儲存裝置執行一資料寫入;以一交易協調器標示該裝置驅動器作為一交易參與器;對該儲存裝置執行一快照複製(flashcopy);執行該資料寫入於該儲存裝置,根據一快照複製關係映射而產生自一來源磁碟至一目標磁碟的一時間點(point-in-time)複製;以及在該裝置驅動器與該交易協調器之間執行一雙階段提交(two-phase commit);其中該交易協調器於一第一階段發出一準備訊息至多個註冊的交易參與器,並於一第二階段發出一提交訊息至該等交易參與器。 A method for performing a data writing in a storage device, comprising: instructing, by the storage device, a device driver to perform a data writing to the storage device; and indicating, by a transaction coordinator, the device driver as a transaction participant; The storage device performs a flashcopy; the execution of the data is written to the storage device, and a point-in-time is generated from a source disk to a target disk according to a snapshot copy relationship mapping. Copying; and performing a two-phase commit between the device driver and the transaction coordinator; wherein the transaction coordinator issues a preparation message to a plurality of registered transaction participants in a first phase, and A second stage sends a submission message to the transaction participants. 如請求項1所述之方法,其更包含接收一指令以執行一復原(rollback),以及根據該快照複製逆轉(reversing)該資料寫入。 The method of claim 1, further comprising receiving an instruction to perform a rollback, and reversing the data write based on the snapshot copy. 如請求項1所述之方法,其更包含接收一指令以提交該交易,以及丟棄(discarding)該快照複製。 The method of claim 1, further comprising receiving an instruction to submit the transaction and discarding the snapshot copy. 2或3所述之方法,其中執行該資料寫入於該儲存裝置的步驟,係緊接於啟始該儲存裝置之該快照複製之後。The method of 2 or 3, wherein the step of executing the data writing to the storage device is immediately after the snapshot copy of the storage device is initiated. 2或3所述之方法,其中以該交易協調器標示該裝置驅動器作為該交易參與器的步驟,係由該裝置驅動器所完成。The method of 2 or 3, wherein the step of marking the device driver as the transaction participant with the transaction coordinator is performed by the device driver. 一種於一儲存裝置執行一資料寫入的系統,其包含:一檔案系統,用以針對該儲存裝置指示一裝置驅動器對該儲存裝置執行一資料寫入;一交易協調器,用以標示該裝置驅動器作為一交易參與器;一儲存裝置;以及一裝置驅動器對應於該儲存裝置,用以執行該儲存裝置的一快照複製,根據一快照複製關係映射而產生自一來源磁碟至一目標磁碟的一時間點(point-in-time)複製,以實施將該資料寫入該儲存裝置並實施與該交易協調器的一雙階段提交(two-phase commit);其中該交易協調器於一第一階段發出一準備訊息至多個註冊的交易參與器,並於一第二階段發出一提交訊息至該等交易參與器。 A system for performing a data write in a storage device, comprising: a file system for instructing a device driver to perform a data write to the storage device for the storage device; and a transaction coordinator for indicating the device The drive acts as a transaction participant; a storage device; and a device driver corresponding to the storage device for performing a snapshot copy of the storage device, generating a source disk to a target disk according to a snapshot copy relationship mapping a point-in-time copy to implement writing the data to the storage device and implementing a two-phase commit with the transaction coordinator; wherein the transaction coordinator is A preparation message is sent to a plurality of registered transaction participants in one stage, and a submission message is sent to the transaction participants in a second stage. 如請求項6所述之系統,其中該裝置驅動器更用以接收一指令以執行一復原,以及根據該快照複製逆轉該資料寫入。 The system of claim 6, wherein the device driver is further configured to receive an instruction to perform a restore and to reverse the data write based on the snapshot copy. 如請求項6所述之系統,其中該裝置驅動器更用以接收一指令以提交該交易,以及丟棄該快照複製。 The system of claim 6, wherein the device driver is further configured to receive an instruction to submit the transaction and discard the snapshot copy. 7或8所述於儲存裝置執行資料寫入的系統,其中該裝置驅動器更用以在啟始該快照複製之後,執行該資料寫入於該儲存裝置。7 or 8 is the system for performing data writing in the storage device, wherein the device driver is further configured to perform the writing of the data to the storage device after the snapshot is initiated. 7或8所述於儲存裝置執行資料寫入的系統,其中該裝置驅動器更安排以該交易協調器標示該裝置驅動器作為該交易參與器。7 or 8 is the system for performing data writing in a storage device, wherein the device driver is further arranged to mark the device driver as the transaction participant with the transaction coordinator. 一種包含電腦程式碼的電腦程式,當其載入於一電腦系統中並將其執行時,致使所述電腦系統實施如請求項1~5之任一項所述之方法的全部步驟。 A computer program comprising a computer program code, when loaded in a computer system and executed, causes the computer system to perform all the steps of the method of any one of claims 1 to 5.
TW98133282A 2008-10-30 2009-09-30 Performing a data write on a storage device TWI468930B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP08167942 2008-10-30

Publications (2)

Publication Number Publication Date
TW201027325A TW201027325A (en) 2010-07-16
TWI468930B true TWI468930B (en) 2015-01-11

Family

ID=44853155

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98133282A TWI468930B (en) 2008-10-30 2009-09-30 Performing a data write on a storage device

Country Status (1)

Country Link
TW (1) TWI468930B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6397227B1 (en) * 1999-07-06 2002-05-28 Compaq Computer Corporation Database management system and method for updating specified tuple fields upon transaction rollback
TW507148B (en) * 2001-05-18 2002-10-21 Mitac Int Corp Verification method for replying work sheets in a collaboration transaction system
US6769074B2 (en) * 2000-05-25 2004-07-27 Lumigent Technologies, Inc. System and method for transaction-selective rollback reconstruction of database objects
TWI234095B (en) * 2001-08-08 2005-06-11 E Ten Information Sysems Co Lt Transmission-type electronic device and its system with real-time patching function for stock transaction data
US20070072163A1 (en) * 2005-09-09 2007-03-29 Microsoft Corporation Transaction consistency and problematic states
TW200826069A (en) * 2006-12-07 2008-06-16 Inventec Corp Method for automatically adjusting the COW(copy on write) disk space of the snapshot device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6397227B1 (en) * 1999-07-06 2002-05-28 Compaq Computer Corporation Database management system and method for updating specified tuple fields upon transaction rollback
US6769074B2 (en) * 2000-05-25 2004-07-27 Lumigent Technologies, Inc. System and method for transaction-selective rollback reconstruction of database objects
TW507148B (en) * 2001-05-18 2002-10-21 Mitac Int Corp Verification method for replying work sheets in a collaboration transaction system
TWI234095B (en) * 2001-08-08 2005-06-11 E Ten Information Sysems Co Lt Transmission-type electronic device and its system with real-time patching function for stock transaction data
US20070072163A1 (en) * 2005-09-09 2007-03-29 Microsoft Corporation Transaction consistency and problematic states
TW200826069A (en) * 2006-12-07 2008-06-16 Inventec Corp Method for automatically adjusting the COW(copy on write) disk space of the snapshot device

Also Published As

Publication number Publication date
TW201027325A (en) 2010-07-16

Similar Documents

Publication Publication Date Title
JP5689507B2 (en) Method, system, and computer program for performing data writing on a storage device
US10114581B1 (en) Creating a virtual access point in time on an object based journal replication
US9336094B1 (en) Scaleout replication of an application
US8954645B2 (en) Storage writes in a mirrored virtual machine system
US8464101B1 (en) CAS command network replication
US8788772B2 (en) Maintaining mirror and storage system copies of volumes at multiple remote sites
JP4791051B2 (en) Method, system, and computer program for system architecture for any number of backup components
US7577867B2 (en) Cross tagging to data for consistent recovery
US8214612B1 (en) Ensuring consistency of replicated volumes
US9535801B1 (en) Xcopy in journal based replication
US8898409B1 (en) Journal-based replication without journal loss
US8954796B1 (en) Recovery of a logical unit in a consistency group while replicating other logical units in the consistency group
US10152267B1 (en) Replication data pull
US8028192B1 (en) Method and system for rapid failback of a computer system in a disaster recovery environment
US20080140963A1 (en) Methods and systems for storage system generation and use of differential block lists using copy-on-write snapshots
US20080209145A1 (en) Techniques for asynchronous data replication
US7359927B1 (en) Method for performing periodic replication of data on a remote storage system
JP2008225616A (en) Storage system, remote copy system and data restoration method
US10776211B1 (en) Methods, systems, and apparatuses to update point in time journal using map reduce to create a highly parallel update
WO2019107232A1 (en) Data backup system, relay site storage, data backup method, and control program for relay site storage
JP5292350B2 (en) Message queue management system, lock server, message queue management method, and message queue management program
JP2009123175A (en) Storage system, storage device, and data update method
TWI468930B (en) Performing a data write on a storage device
US11656947B2 (en) Data set recovery from a point-in-time logical corruption protection copy
CN110413370B (en) Method and system for implementing data backup of virtual machine depending on original equipment mapping disk

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees