WO2017162174A1 - 一种存储系统 - Google Patents

一种存储系统 Download PDF

Info

Publication number
WO2017162174A1
WO2017162174A1 PCT/CN2017/077751 CN2017077751W WO2017162174A1 WO 2017162174 A1 WO2017162174 A1 WO 2017162174A1 CN 2017077751 W CN2017077751 W CN 2017077751W WO 2017162174 A1 WO2017162174 A1 WO 2017162174A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
cache
data
storage medium
node
Prior art date
Application number
PCT/CN2017/077751
Other languages
English (en)
French (fr)
Inventor
王东临
金友兵
Original Assignee
北京书生国际信息技术有限公司
书生云公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京书生国际信息技术有限公司, 书生云公司 filed Critical 北京书生国际信息技术有限公司
Publication of WO2017162174A1 publication Critical patent/WO2017162174A1/zh
Priority to US16/139,712 priority Critical patent/US10782898B2/en
Priority to US16/378,076 priority patent/US20190235777A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching

Definitions

  • the present invention relates to the field of data storage technologies, and in particular, to a storage system.
  • the cache reduces the system load and improves the data transmission rate.
  • the cache area is usually integrated on each storage node of the cluster server, that is, the cache read and write operations are performed on each of the cluster servers.
  • Each server temporarily puts the commonly used data in its own built-in cache, and then transfers the data in the cache to the persistent storage medium in the storage pool for permanent storage when the system is idle.
  • the cache Since the cache has the feature that the storage content disappears after the power is turned off, if it is set in the server host, it will bring unpredictable risks to the storage system. Once any host in the cluster server fails, the cached data stored in this host will be lost, which will seriously affect the reliability and stability of the entire storage system.
  • the embodiment of the present invention provides a storage system to avoid loss of cached data when a server fails.
  • An embodiment of the invention provides a storage system, including:
  • At least two storage nodes connected to the storage network
  • each storage device including at least one Storage medium
  • the storage medium includes at least one high speed storage medium and at least one persistent storage medium.
  • the storage network is configured such that each storage node can access all storage media without using other storage nodes;
  • All or part of one or more high speed storage media in the at least one high speed storage medium constitutes a cache area
  • the storage node When the storage node writes data, the data is first written into the cache area, and then the data on the cache area is written to the persistent storage medium by the same or different storage nodes.
  • a cache area composed of a high-speed storage medium is set in a global storage pool independently of each host of the cluster server, in such a manner that even a storage node in the cluster server occurs.
  • the storage node writes the cached data in the high-speed storage medium without loss, which greatly enhances the reliability and stability of the storage system.
  • FIG. 1 is a block diagram showing the architecture of a memory system in accordance with an embodiment of the present invention.
  • the storage system includes a storage network; at least two storage nodes are connected to the storage network, wherein the storage node is a software module that provides a storage service, instead of including a storage medium in a usual sense.
  • a hardware server; and a storage device are also connected to the storage network; each storage device includes at least one high speed storage medium and at least one persistent storage medium.
  • the storage network is configured such that each storage node can be accessed without resorting to other storage nodes All storage media.
  • All or part of one or more high speed storage media in the at least one high speed storage medium constitutes a cache area; when the storage node writes data, the data is first written into the cache area, and then by the same or different storage The node writes the data on the cache area to the persistent storage medium.
  • each storage node corresponds to one or more computing nodes, and each storage node and its corresponding computing node are located in the same server, and the physical server is connected to the storage device through the storage switching device.
  • the computing node and the storage node are aggregated in the same server. From the overall structure of the storage system, the number of required physical devices is reduced, and the cost is reduced. At the same time, the compute node can also access the storage resources it wishes to access locally.
  • the storage node records the location of the persistent storage medium to which the data should be written in the cache area while writing data to the cache area; Different storage nodes write data on the cache area to the persistent storage medium in accordance with the location of the persistent storage medium to which the data should ultimately be written. After the data of the cache area is written to the persistent storage medium, the corresponding data is cleared from the cache area in time to release more space for the new data to be cached to be written.
  • the location of the persistent storage medium to which each data should ultimately be written is not limited by the storage medium on which the data resides.
  • some data may be cached in the high speed storage medium of the storage device 1, but the location of the persistent storage medium that should eventually be written is located in the storage device 2.
  • the cache area is divided into at least two cache units, each cache unit comprising one or more high speed storage media, or including some or all of one or more high speed storage media.
  • each cache unit comprising one or more high speed storage media, or including some or all of one or more high speed storage media.
  • the high speed storage media included in each cache unit are located in the same or different storage devices.
  • a certain cache unit may include two complete high-speed storage media, and may also include two portions of high-speed storage media, which may be part of one high-speed storage medium and another complete high-speed storage medium.
  • each cache unit may be constructed by redundant storage of all or part of at least two high speed storage media on at least two storage devices.
  • each storage node is responsible for managing zero to multiple cache units. That is, some storage nodes may not be responsible for managing the cache unit at all, but are responsible for copying the data in the cache unit to the persistent storage medium. For example, suppose a system has 9 storage nodes, wherein storage nodes 1-8 are responsible for writing data into its corresponding cache unit, and storage node 9 is only used to write data in the cache unit to the corresponding hold. In the long medium (as mentioned above, the corresponding address holding the long medium is also recorded in the corresponding cache data). With the above embodiments, some storage nodes can be released with more burden to perform other operations. In addition, the storage node that is specifically responsible for writing cached data to the persistent medium can also write the cached data to the persistent storage unit in idle time, which greatly improves the transmission efficiency of the cached data.
  • each storage node can only read and write cache units managed by itself. Since multiple storage nodes are prone to conflicts on write operations to one high-speed storage medium at the same time, and the read operations do not conflict with each other, in another embodiment, each storage node can only write data to be cached.
  • the cache unit managed by itself but can read all cache units managed by itself and other storage nodes, that is, the write operation of the storage unit to the cache unit is local, and the read operation can be global.
  • other or all of the storage nodes may be configured such that the storage nodes take over the cache unit previously managed by the failed storage node.
  • all cache units managed by one of the storage nodes taking over the failed storage node may also be taken over by at least two other storage nodes, each of which takes over a partial cache unit managed by the failed storage node.
  • the storage system provided by the embodiment of the present invention may further include a storage control node, connected to the storage network, for determining a cache unit managed by each storage node, or a storage allocation module disposed in the storage node, for determining the The cache unit managed by the storage node.
  • a storage control node connected to the storage network, for determining a cache unit managed by each storage node, or a storage allocation module disposed in the storage node, for determining the The cache unit managed by the storage node.
  • the storage control node or the storage allocation module maintains The list of cache units managed by each storage node also changes correspondingly; or, by modifying the list of cache units managed by each storage node maintained by the storage control node or the storage allocation module, the cache unit managed by each storage node is modified. .
  • the length information of the data to be written is also required. Together, it is called a cached data block.
  • the head pointer and the tail pointer are respectively recorded in the fixed position of the cache unit, and the head pointer and the tail pointer initially point to the beginning position of the blank area in the cache unit.
  • the head pointer increases the total length of the write cache data block, pointing to the next blank area.
  • the length of the current cache data block and the location of the persistent storage medium to which the data should be written are read from the position pointed by the tail pointer, and the cached data of the length is written to the persistent medium at the specified location.
  • the tail pointer is then increased by the length of the cleaned cache data block, pointing to the next block of cached data, freeing up the space of the currently cleared cached data.
  • the pointer should be rewinded accordingly (ie, the length of the available buffer is reduced to return to the front portion of the cache unit); the length of the available buffer is the length of the cache unit minus The space occupied by the U-turn and tail pointers.
  • Space write cache data if the available cache size of the entire cache unit is smaller than the cache database size that needs to be written, the data is directly written to the persistent storage medium without caching; when the cache is cleaned, if the tail pointer is equal to the head pointer, Indicates that the cached data is empty and there is currently no cached data that needs to be cleaned up.
  • all the storage areas of the storage node are located in the global cache area, not the memory of the physical server where the storage node is located or any other storage medium.
  • the cached data written to the global cache area can be shared by all storage nodes.
  • the work of writing cached data to the persistent storage medium can be done by each storage node, or It is specifically responsible for selecting one or more storage nodes that are fixed as needed, and such an implementation can improve the balance of load between different storage nodes.
  • the storage node is configured to write data to be cached into any one (or specified) high-speed storage medium in the global cache pool, and the same or other storage nodes are written into the global cache pool.
  • the cached data is written one by one to the persistent storage medium specified in the global cache pool.
  • the application runs on the server where the storage node is located, such as the compute node.
  • each storage node temporarily stores the data commonly used by the application on the high-speed storage medium. In this way, the application can read and write data directly from the high-speed storage medium at runtime, thereby improving the running speed and performance of the application.
  • the storage device includes, but is not limited to, a JBOD, which may include, but is not limited to, SSD, SRAM, NVRAM, DRAM, or other forms
  • the persistent storage medium may include, but is not limited to, a hard disk, a flash memory, an SSD, NVMe or other forms
  • access interfaces for high speed storage media and persistent storage media may include, but are not limited to, SAS interfaces, SATA interfaces, PCI/e interfaces, DIMM interfaces, NVMe interfaces, SCSI interfaces, AHCI interfaces.
  • the storage network includes at least two switching devices, each of which can be connected to any one of the storage devices through any one of the storage switching devices, thereby being connected to the high speed storage medium and/or persistent storage. medium.
  • any storage switching device or storage channel connected to a storage switching device fails, the storage node can read and write data on the storage device through other storage switching devices, and the design further enhances the reliability of the storage system data transmission.
  • the storage switching device may be a SAS switch or a PCI/e switch.
  • the storage channel may be a SAS (Serial Attached SCSI) channel or a PCI/e channel.
  • SAS Serial Attached SCSI
  • PCI/e channel Take the SAS channel as an example. Based on the SAS switching solution, it has the advantages of high performance, large bandwidth, and a large number of disks in a single device.
  • HBA adapter
  • the storage it provides can be easily accessed by multiple servers connected at the same time.
  • the cache area formed by the high-speed storage medium is set in the global storage pool independently of the hosts of the cluster server. In this manner, if a storage node in the cluster server fails, the storage is performed. The cached data written by the node to the high-speed storage medium is also not lost, which greatly enhances the reliability and stability of the storage system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明实施例提供了一种存储系统,以避免在服务器发生故障时其缓存数据丢失。该存储系统包括:存储网络;至少两个存储节点,连接至所述存储网络;至少一个存储设备,连接至所述存储网络,每个存储设备包括至少一个存储介质;以及所述存储介质包括至少一个高速存储介质和至少一个持久性存储介质;其中,所述存储网络被配置为使得每一个存储节点都能够无需借助其他存储节点而访问所有存储介质;所述至少一个高速存储介质中的一个或多个高速存储介质的全部或部分构成高速缓存区;所述存储节点在写入数据时,先将数据写入高速缓存区,然后再由相同或不同的存储节点将高速缓存区上的数据写入到持久存储介质。

Description

一种存储系统 技术领域
本发明涉及数据存储技术领域,具体涉及一种存储系统。
背景技术
随着计算机应用规模越来越大,对存储空间的需求也与日俱增。对应的,将复数设备的存储资源(比如,磁盘组的存储介质)统合为一个存储池来为集群服务器提供存储服务成为了现在的主流。缓存作为临时数据交换区,减小了系统负荷,提高了数据传输速率,传统的存储系统,缓存区通常集成在集群服务器的每个存储节点上,即缓存的读写操作在集群服务器的每台主机中实现。每台服务器将常用的数据临时放在自己内置的缓存中,然后待系统空闲时,再将缓存中的数据传送到存储池中的持久性存储介质进行永久存储。由于缓存具有断电后存储内容消失的特点,如果将它设置在服务器主机中将会给存储系统带来不可预测的风险。一旦集群服务器中任何一台主机发生故障,那么保存于这个主机中的缓存数据就会丢失,这将严重地影响整个存储系统的可靠性与稳定性。
发明内容
有鉴于此,本发明实施例提供了一种存储系统,以避免在服务器发生故障时其缓存数据丢失。
本发明一实施例提供了一种存储系统,包括:
存储网络;
至少两个存储节点,连接至所述存储网络;
至少一个存储设备,连接至所述存储网络,每个存储设备包括至少一个 存储介质;以及
所述存储介质包括至少一个高速存储介质和至少一个持久性存储介质,
其中,所述存储网络被配置为使得每一个存储节点都能够无需借助其他存储节点而访问所有存储介质;
所述至少一个高速存储介质中的一个或多个高速存储介质的全部或部分构成高速缓存区;
所述存储节点在写入数据时,先将数据写入高速缓存区,然后再由相同或不同的存储节点将高速缓存区上的数据写入到持久存储介质。
本发明实施例提供的存储系统,由高速存储介质构成的高速缓存区独立于集群服务器的各个主机而被设置于全局存储池中,采用这样的方式,既使集群服务器中的某一存储节点发生故障,该存储节点写入高速存储介质中的缓存数据也不会丢失,这会大大增强存储系统的可靠性与稳定性。
附图说明
图1所示为根据本发明一实施例中一个存储系统的架构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1所示为根据本发明的实施方式的存储系统的架构示意图。如图1所示,该存储系统包括存储网络;至少两个存储节点,连接至所述存储网络,其中,存储节点是提供存储服务的软件模块,而非通常意义上的包含存储介质在内的硬件服务器;以及存储设备,同样连接至所述存储网络;每个存储设备包括至少一个高速存储介质和至少一个持久性存储介质。其中,所述存储网络被配置为使得每一个存储节点都能够无需借助其他存储节点而访问 所有存储介质。该至少一个高速存储介质中的一个或多个高速存储介质的全部或部分构成高速缓存区;当存储节点在写入数据时,先将数据写入高速缓存区,然后再由相同或不同的存储节点将高速缓存区上的数据写入到持久存储介质。
在本发明另一实施例中,每个存储节点对应一个或多个计算节点,并且每个存储节点与其对应的计算节点都位于同一服务器,该物理服务器通过存储交换设备与存储设备连接。本发明实施例中将计算节点和存储节点聚合在同一服务器中,从存储系统整体结构而言,减少了所需物理设备的数量,降低了成本。同时,计算节点也可以在本地访问到其希望访问的存储资源。
在本发明一实施例中,存储节点在将数据写入高速缓存区的同时,将所述数据最终应写入的持久存储介质的位置也记录在所述高速缓存区中;后续所述相同或不同的存储节点按照所述数据最终应写入的持久存储介质的位置,将所述高速缓存区上的数据写入到持久存储介质。在将高速缓存区的数据写入到持久存储介质后,将对应数据从所述高速缓存区中及时清除,以释放更多的空间供新的待缓存数据写入。
在本发明一实施例中,每一数据最终应写入的持久存储介质的位置并不受其数据所在的告诉存储介质的限制。举例说明,某数据可能缓存在存储设备1的高速存储介质中,但其最终应写入的持久存储介质的位置位于存储设备2中。
在本发明一实施例中,高速缓存区被划分为至少两个缓存单元,每个缓存单元包括一个或多个高速存储介质,或包括一个或多个高速存储介质的部分或全部。同时,每个缓存单元所包括的高速存储介质位于同一个或不同的存储设备中。
举例说明,某一个缓存单元可以包括2个完整的高速存储介质,也可以包括2个高速存储介质的部分,可以是一个高速存储介质的部分以及另一个完整的高速存储介质。
在本发明一实施例中,每一个缓存单元可以由至少两个存储设备上的至少两个高速存储介质的全部或部分以冗余存储的方式构成。
在本发明一实施例中,每个存储节点负责管理零到多个缓存单元。即,有的存储节点可能完全不负责管理缓存单元,而是负责将缓存单元中的数据拷贝到持久存储介质中。举例说明,假设一个系统有9个存储节点,其中存储节点1-8负责将数据写入到其对应的缓存单元中,存储节点9仅仅用于将缓存单元中的数据写入到对应的持有久介质中(如前所述,该对应的持有久介质的地址也记录在对应的缓存数据中)。采用上述的实施方式,可以使一些存储节点释放更多的负担来进行其他操作。另外,设置专门负责将缓存数据写入持久性介质的存储节点还可在空闲时间将缓存数据陆续写入持久性存储单元中,这在很大程度上提高了缓存数据的传输效率。
在本发明一实施例中,每个存储节点只能读写自己管理的缓存单元。由于多个存储节点同时对一个高速存储介质的写操作容易发生冲突,而对读操作并不会互相冲突,因此,在另一个实施例中,每个存储节点只能将待缓存的数据写入自己管理的缓存单元,但是可以读取自己以及其他存储节点管理的所有缓存单元,即存储节点对缓存单元的写操作是局域性的,而读操作可以是全局性的。
在本发明一实施例中,当监测到一个存储节点发生故障时,可以对其他部分或全部存储节点进行配置,使得这些存储节点接管之前由所述发生故障的存储节点管理的缓存单元。例如,可以由其中一个存储节点接管发生故障的存储节点管理的所有缓存单元,也可以由其它至少两个存储节点进行接管,其中每个存储节点接管发生故障的存储节点管理的部分缓存单元。
具体而言,本发明实施例提供的存储系统可以进一步包括存储控制节点,连接存储网络,用于确定每个存储节点管理的缓存单元;或在存储节点中设置有存储分配模块,用于确定该存储节点所管理的缓存单元。当某一个存储节点所管理的缓存单元发生变化时,存储控制节点或存储分配模块维护 的每个存储节点管理的缓存单元列表也会对应发生变化;或者说,通过修改存储控制节点或存储分配模块维护的每个存储节点管理的缓存单元列表来修改每个存储节点所管理的缓存单元。
本发明一实施例中,将数据写入高速缓存区时,除了需要写入数据本身以及该数据应写入的持久化存储介质的位置外,还需要写入数据的长度信息,这三类信息合起来称为一个缓存数据块。
本发明一实施例中,将数据写入高速缓存区时,可以按照方式进行。首先在缓存单元固定位置分别记录头指针和尾指针,头指针和尾指针初始都指向缓存单元中空白区域的开始位置。当有缓存数据写入时,头指针增加写入缓存数据块的总长度,从而指向下一块空白区域。当清理缓存时,从尾指针指向的位置读取当前缓存数据块的长度以及该数据应写入的持久化存储介质的位置,将该长度的缓存数据写入到指定位置的持久化介质中,然后将尾指针增加已清理的缓存数据块的长度,从而指向下一块缓存数据块,释放当前已清理的缓存数据的空间。当头指针或尾指针的值超过可用缓存的长度时,指针要相应回卷(即减掉可用缓存的长度,从而回到缓存单元的靠前部分);所谓可用缓存的长度是缓存单元的长度减掉头指针和尾指针所占用的空间。当写入缓存数据时,如果缓存单元所剩余空间小于缓存数据块的大小(即头指针加上缓存数据块的长度后追上了尾指针),则清理已有缓存数据,直到有足够的缓存空间写入缓存数据;如果整个缓存单元的可用缓存小于需要写入的缓存数据库大小,则将数据直接写入持久化存储介质,而不做缓存;在清理缓存时,如果尾指针等于头指针,表明缓存数据为空,当前没有需要清理的缓存数据。
基于本发明实施例提供的存储系统,存储节点的所有缓存区都位于全局高速缓存区,而不是存储节点所在物理服务器的内存或任何其它存储介质上。写入全局高速缓存区的缓存数据可被所有存储节点共享。这种情况下,将缓存数据写入持久性存储介质的工作可以由每个存储节点各自完成,也可 以根据需要选择固定的一个或者多个存储节点专门负责,这样的实施方式可以提高不同存储节点之间负载的均衡性。
在本发明一实施例中,存储节点用于将待缓存的数据写入全局缓存池中任意一个(或指定的)高速存储介质中,同时,同一个或者其他存储节点将写入全局缓存池中的缓存数据逐一写入到全局缓存池中指定的持久性存储介质中。具体而言,应用程序运行在存储节点所在的服务器中,比如计算节点处,为了降低应用程序对持久性存储介质访问的频次,每个存储节点会将应用程序常用的数据临时存放在高速存储介质中,这样应用程序在运行时就可直接从高速存储介质中读写数据,从而提高了应用的运行速度与性能。在一个实施例中,所述存储设备包括但不限于JBOD,高速存储介质可以包括但不限于SSD、SRAM、NVRAM、DRAM或其他形式,持久性存储介质可以包括但不限于硬盘、闪存、SSD、NVMe或其它形式,高速存储介质和持久性存储介质的访问接口可以包括但不限于SAS接口、SATA接口、PCI/e接口、DIMM接口、NVMe接口、SCSI接口、AHCI接口。
在本发明一实施例中,存储网络包括至少两个交换设备,所述每个存储节点都可以通过任意一个存储交换设备连接到任何一个存储设备,进而连接至高速存储介质和/或持久性存储介质。当任何一个存储交换设备或连接到一个存储交换设备的存储通道出现故障时,存储节点能够通过其它存储交换设备读写存储设备上的数据,这样的设计进一步增强了存储系统数据传输的可靠性。
在本发明一实施例中,存储交换设备可以是SAS交换机或PCI/e交换机,对应地,存储通道可以是SAS(串行连接SCSI)通道或PCI/e通道。以SAS通道为例,基于SAS交换的方案,拥有性能高,带宽大,单台设备磁盘数量多等优点。同时,SAS体系与适配器(HBA)或者服务器主板上的SAS接口结合使用后,它所提供的存储能够很容易被连接的多台服务器同时访问。
在本发明实施例中,由高速存储介质构成的缓存区域独立于集群服务器的各个主机而被设置于全局存储池中,采用这样的方式,如果集群服务器中的某一存储节点发生故障,该存储节点写入高速存储介质中的缓存数据也不会丢失,这会大大增强存储系统的可靠性与稳定性。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换等,均应包含在本发明的保护范围之内。

Claims (13)

  1. 一种存储系统,其特征在于,包括:
    存储网络;
    至少两个存储节点,连接至所述存储网络;
    至少一个存储设备,连接至所述存储网络,每个存储设备包括至少一个存储介质;以及
    所述存储介质包括至少一个高速存储介质和至少一个持久性存储介质;
    其中,所述存储网络被配置为使得每一个存储节点都能够无需借助其他存储节点而访问所有存储介质;
    所述至少一个高速存储介质中的一个或多个高速存储介质的全部或部分构成高速缓存区;
    所述存储节点在写入数据时,先将数据写入高速缓存区,然后再由相同或不同的存储节点将高速缓存区上的数据写入到持久存储介质。
  2. 根据权利要求1所述的存储系统,其特征在于,所述存储节点在将数据写入高速缓存区的同时,将所述数据最终应写入的持久存储介质的位置也记录在所述高速缓存区中;后续所述相同或不同的存储节点按照所述数据最终应写入的持久存储介质的位置,将所述高速缓存区上的数据写入到持久存储介质。
  3. 根据权利要求2所述的存储系统,其特征在于,所述相同或不同的存储节点将高速缓存区的数据写入到持久存储介质后,并将对应数据从所述高速缓存区中清除。
  4. 根据权利要求3所述的存储系统,其特征在于,高速缓存区被划分为至少两个缓存单元,每个缓存单元包括一个或多个高速存储介质,或包括一个或多个高速存储介质的部分或全部;和/或,每个缓存单元所包括的高速存储介质位于同一个或不同的存储设备中;和/或,
    每个存储节点负责管理零到多个缓存单元。
  5. 根据权利要求4所述的存储系统,其特征在于,设置所述每个存储节点只能读写自己管理的缓存单元;或
    设置每个存储节点只能写自己管理的缓存单元,但可以读自己以及其他存储节点管理的所有缓存单元。
  6. 根据权利要求4所述的存储系统,其特征在于,当一个存储节点出现故障时,由另一个存储节点接管故障存储节点所管理的缓存单元。
  7. 根据权利要求4所述的存储系统,其特征在于,还包括:
    存储控制节点,连接所述存储网络,用于确定每个存储节点管理的缓存单元;或
    所述存储节点还包括:
    存储分配模块,用于确定该存储节点所管理的缓存单元。
  8. 根据权利要求4所述的存储系统,其特征在于,所述相同或不同的存储节点利用CPU空闲时间将尚未写入到持久存储介质的数据写入到持久存储介质中。
  9. 根据权利要求1至8任一所述的存储系统,其特征在于,在所述高速缓存区中记录头指针和尾指针;
    当存储节点将数据写入到高速缓存区时,写入到高速缓存区头指针所指示的位置,并在写入后相应调整头指针的值,使得头指针指向高速缓存区中未被使用的区域;以及
    当存储节点将数据从高速缓存区写入所述持久性存储介质时,写入尾指针所指向位置的数据,并在写入后相应调整尾指针的位置,使得尾指针指向下一块尚未写入持久存储介质的数据。
  10. 根据权利要求1所述的存储系统,其特征在于,所述高速缓存区由至少两个存储设备上的至少两个高速存储介质的全部或部分以冗余存储的方式构成。
  11. 根据权利要求1所述的存储系统,其特征在于,所述存储网络包 括至少两个交换设备,当任何一个交换设备或连接到一个交换设备的存储通道出现故障时,存储节点通过其它存储交换设备读写高速缓存区和持久性存储介质。
  12. 根据权利要求1项所述的存储系统,其特征在于,所述存储网络是SAS交换机或PCI/e交换机;所述存储网络包括SAS交换机或PCI/e交换机。
  13. 根据权利要求1所述的存储系统,其特征在于,所述存储设备为JBOD;和/或所述高速存储介质是SSD、SRAM、NVRAM或DRAM;和/或所述持久性存储介质是硬盘、闪存、SSD或NVMe;和/或所述高速存储介质和持久性存储介质的接口是SAS接口、SATA接口、PCI/e接口、DIMM接口、NVMe接口、SCSI接口、AHCI接口。
PCT/CN2017/077751 2011-10-11 2017-03-22 一种存储系统 WO2017162174A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/139,712 US10782898B2 (en) 2016-02-03 2018-09-24 Data storage system, load rebalancing method thereof and access control method thereof
US16/378,076 US20190235777A1 (en) 2011-10-11 2019-04-08 Redundant storage system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610180244.1 2016-03-25
CN201610180244.1A CN105897859B (zh) 2016-03-25 2016-03-25 一种存储系统

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2017/071830 Continuation-In-Part WO2017133483A1 (zh) 2011-10-11 2017-01-20 存储系统
US16/140,951 Continuation-In-Part US20190028542A1 (en) 2011-10-11 2018-09-25 Method and device for transmitting data

Related Child Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077753 Continuation-In-Part WO2017162176A1 (zh) 2011-10-11 2017-03-22 存储系统、存储系统的访问方法和存储系统的访问装置

Publications (1)

Publication Number Publication Date
WO2017162174A1 true WO2017162174A1 (zh) 2017-09-28

Family

ID=57014839

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077751 WO2017162174A1 (zh) 2011-10-11 2017-03-22 一种存储系统

Country Status (2)

Country Link
CN (1) CN105897859B (zh)
WO (1) WO2017162174A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515775A (zh) * 2019-08-29 2019-11-29 苏州浪潮智能科技有限公司 一种缓存备份方法及集群存储系统

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472047B (zh) * 2016-02-03 2019-05-14 天津书生云科技有限公司 存储系统
CN105897859B (zh) * 2016-03-25 2021-07-30 北京书生云科技有限公司 一种存储系统
CN107066204A (zh) * 2016-12-23 2017-08-18 航天星图科技(北京)有限公司 一种多节点间的数据交换方法
CN106844108B (zh) * 2016-12-29 2019-05-24 成都华为技术有限公司 一种数据存储方法、服务器以及存储系统
CN111124945B (zh) * 2018-10-30 2023-09-22 伊姆西Ip控股有限责任公司 用于提供高速缓存服务的方法、设备和计算机可读介质
CN112948336B (zh) * 2021-03-30 2023-01-03 联想凌拓科技有限公司 数据加速方法及缓存单元、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467468A (zh) * 2010-11-05 2012-05-23 三星电子株式会社 存储系统和操作存储系统的方法
CN102498476A (zh) * 2009-09-14 2012-06-13 甲骨文国际公司 在数据库服务器和存储系统之间高速缓存数据
CN102681952A (zh) * 2012-05-12 2012-09-19 北京忆恒创源科技有限公司 将数据写入存储设备的方法与存储设备
CN103917963A (zh) * 2011-09-30 2014-07-09 甲骨文国际公司 基于快速持久性存储器的回写储存器高速缓存
US20150193337A1 (en) * 2014-01-08 2015-07-09 Netapp, Inc. Nvram caching and logging in a storage system
CN105897859A (zh) * 2016-03-25 2016-08-24 天津书生云科技有限公司 一种存储系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4681374B2 (ja) * 2005-07-07 2011-05-11 株式会社日立製作所 ストレージ管理システム
CN101252589B (zh) * 2008-03-25 2011-01-05 中国科学院计算技术研究所 数据缓存装置和采用该装置的网络存储系统及缓存方法
US8745329B2 (en) * 2011-01-20 2014-06-03 Google Inc. Storing data across a plurality of storage nodes
CN203982354U (zh) * 2014-06-19 2014-12-03 天津书生投资有限公司 一种冗余存储系统
CN104657316B (zh) * 2015-03-06 2018-01-19 北京百度网讯科技有限公司 服务器
CN104935654B (zh) * 2015-06-10 2018-08-21 华为技术有限公司 一种服务器集群系统中的缓存方法、写入点客户端和读客户端
CN105045336A (zh) * 2015-06-25 2015-11-11 北京百度网讯科技有限公司 Jbod
CN105068836A (zh) * 2015-08-06 2015-11-18 北京百度网讯科技有限公司 一种基于sas网络的远程可共享的启动系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102498476A (zh) * 2009-09-14 2012-06-13 甲骨文国际公司 在数据库服务器和存储系统之间高速缓存数据
CN102467468A (zh) * 2010-11-05 2012-05-23 三星电子株式会社 存储系统和操作存储系统的方法
CN103917963A (zh) * 2011-09-30 2014-07-09 甲骨文国际公司 基于快速持久性存储器的回写储存器高速缓存
CN102681952A (zh) * 2012-05-12 2012-09-19 北京忆恒创源科技有限公司 将数据写入存储设备的方法与存储设备
US20150193337A1 (en) * 2014-01-08 2015-07-09 Netapp, Inc. Nvram caching and logging in a storage system
CN105897859A (zh) * 2016-03-25 2016-08-24 天津书生云科技有限公司 一种存储系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515775A (zh) * 2019-08-29 2019-11-29 苏州浪潮智能科技有限公司 一种缓存备份方法及集群存储系统

Also Published As

Publication number Publication date
CN105897859B (zh) 2021-07-30
CN105897859A (zh) 2016-08-24

Similar Documents

Publication Publication Date Title
WO2017162174A1 (zh) 一种存储系统
US20200142599A1 (en) Providing track format information when mirroring updated tracks from a primary storage system to a secondary storage system
US9146684B2 (en) Storage architecture for server flash and storage array operation
US9304901B2 (en) System and method for handling I/O write requests
US20160062897A1 (en) Storage caching
US9075729B2 (en) Storage system and method of controlling data transfer in storage system
US20050203961A1 (en) Transaction processing systems and methods utilizing non-disk persistent memory
US7743209B2 (en) Storage system for virtualizing control memory
WO2017025039A1 (zh) 一种面向闪存存储的数据访问方法及其装置
US8694563B1 (en) Space recovery for thin-provisioned storage volumes
JP2005258918A (ja) ストレージシステムおよびストレージシステムのキャッシュメモリ制御方法
US20180107601A1 (en) Cache architecture and algorithms for hybrid object storage devices
JP2009043030A (ja) ストレージシステム
US20110191547A1 (en) Computer system and load equalization control method for the same
US20130219122A1 (en) Multi-stage cache directory and variable cache-line size for tiered storage architectures
US11294812B2 (en) Obtaining cache resources for expected writes to tracks in a write set after the cache resources were released for the tracks in the write set
US10733118B2 (en) Computer system, communication device, and storage control method with DMA transfer of data
US11080192B2 (en) Storage system and storage control method
CN117149062A (zh) 一种磁带损坏数据的处理方法以及计算装置
US20060031639A1 (en) Write unmodified data to controller read cache
US6950905B2 (en) Write posting memory interface with block-based read-ahead mechanism
JP2021124796A (ja) 分散コンピューティングシステム及びリソース割当方法
US20180316758A1 (en) Method and apparatus for logical mirroring to a multi-tier target node
US20230244385A1 (en) Storage apparatus and control method
JP2004013246A (ja) 多重書き込み型記憶装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17769451

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17769451

Country of ref document: EP

Kind code of ref document: A1