WO2015039352A1

WO2015039352A1 - Data caching method and storage system

Info

Publication number: WO2015039352A1
Application number: PCT/CN2013/084024
Authority: WO
Inventors: 陈磊; 蒋培军; 李小华; 邹蛟同
Original assignee: 华为技术有限公司
Priority date: 2013-09-23
Filing date: 2013-09-23
Publication date: 2015-03-26
Also published as: CN103635887B; CN103635887A

Abstract

Provided in an embodiment of the present invention are a data caching method and storage system, the storage system comprising a plurality of controllers, and each controller including a cache. The method comprises: a first controller receives a data read request transmitted by a host, the data read request carrying address information; determining a second controller according to the address information carried by the data read request; transmitting the address information to the second controller; and according to the address information, the second controller acquires the address information of target data to be read, and reads the target data into cache according to the address information of the target data, thus realizing accurate forecast of target data to be read.

Description

Method and storage system for caching data

The present invention relates to storage technologies, and in particular, to a method and a storage system for caching data. Background technique

The cache memory (also referred to as cache) is a buffer memory between the CPU and the main memory (for example, a hard disk) in the storage system. The volume is smaller than the hard disk, but the speed is faster than the hard disk. Normally, when the CPU processes the read data request, if the available data is found in the cache (called a cache hit), the result of the read data request can be immediately returned. When the miss occurs, the read data request is sent to the hard disk for reading. Write data. Since the cache access speed is much faster than the hard disk read and write speed, the higher the cache hit ratio, the higher the performance of the storage system. Therefore, the existing practice is to read the data that is "may be accessed" to the cache in advance, and the subsequent read data request can be hit immediately. This is called prefetching.

For a storage system with multiple controllers, if multiple controllers are in primary/standby (A/P) mode, that is, only one controller is in working state, then for a storage system, it stores the data in the cache. It is stored centrally in the cache of the controller, so the cache data can be prefetched by sequential stream identification of the read data request. However, if multiple controllers are in master/master (A/A) mode, each controller is active and read data requests may be distributed to each controller, so each controller makes a read data request. When the sequential stream is identified for prefetching, the information on which it is based is not comprehensive enough, so the prefetched data is not accurate enough. Summary of the invention

Embodiments of the present invention provide a method for caching data and a storage system to accurately predict target data to be read in a case where the storage system includes a plurality of controllers.

A first aspect of the embodiments of the present invention provides a method for buffering data, where the method is applied to save In the storage system, the storage system includes a plurality of controllers, wherein each controller includes a cache; and the method includes:

The first controller receives the read data request sent by the host, the read data request carries the address information, determines the second controller according to the address information carried by the read data request, and sends the address information to the second controller;

The second controller obtains address information of the target data to be read according to the address information, to read the target data into the cache according to the address information of the target data.

In a first implementation manner of the first aspect of the embodiment, the address information carried by the read data request includes a start address carried by the read data request.

Determining the second controller according to the address information carried by the read data request comprises: determining, according to the set hash algorithm, the second controller according to the starting address carried by the read data request.

With reference to the first embodiment of the first aspect of the embodiments of the present invention, in the second implementation manner of the first aspect of the embodiment of the present invention, the set hash algorithm includes a consistent hash algorithm.

In a third implementation manner of the first aspect of the embodiment, the address information carried by the read data request includes a start address carried by the read data request.

Determining the second controller according to the address information carried in the read data request comprises: querying a preset configuration table according to the start address, and obtaining a second controller corresponding to the start address.

In a fourth implementation manner of the first aspect of the embodiments of the present disclosure, the reading the target data into the cache according to the address information of the target data includes:

The second controller reads the target data into a cache of the second controller according to address information of the target data.

In a fifth implementation manner of the first aspect of the embodiments of the present disclosure, the reading, by the address information of the target data, the target data into the cache includes:

Determining, by the second controller, the target data pair according to address information of the target data a third controller; sending a prefetch command to the third controller, the prefetch command including address information of the target data;

The third controller reads the target data into a cache of the third controller according to address information of the target data.

A second aspect of the embodiments of the present invention provides a storage system, including:

The first controller is configured to receive a read data request sent by the host, where the read data request carries address information, determine a second controller according to the address information carried by the read data request, and send the address information to the second controller. ;

The second controller is configured to obtain address information of the target data to be read according to the address information, to read the target data into the cache according to the address information of the target data.

In a first implementation manner of the second aspect of the embodiment, the address information carried by the read data request includes a start address carried by the read data request.

The first controller is specifically configured to determine the second controller according to the set hash algorithm according to the starting address carried by the read data request.

With reference to the first embodiment of the second aspect of the embodiments of the present invention, in the second implementation manner of the second aspect of the embodiment, the set hash algorithm includes a consistent hash algorithm.

In a third implementation manner of the second aspect of the embodiment, the address information carried by the read data request includes a start address carried by the read data request.

The first controller is specifically configured to query a preset configuration table according to the start address, and obtain a second controller corresponding to the start address.

In a fourth implementation manner of the second aspect of the embodiment, the second controller is further configured to: read the target data into a cache of the second controller according to the address information of the target data. .

In a fifth implementation manner of the second aspect of the embodiment, the system further includes a third controller;

The second controller is further configured to determine the target number according to the address information of the target data. And corresponding to the third controller; sending a prefetch command to the third controller, where the prefetch command includes address information of the target data;

The third controller is configured to read the target data into a cache of the third controller according to the address information of the target data.

In the embodiment of the present invention, after receiving the read data request sent by the host, the first controller determines the second controller according to the address information carried by the read data request, and sends the address information to the second The controller obtains target data to be read according to the address information by the second controller to perform an operation of reading the target data to the cache. Since the controller performing the operation of obtaining the target data to be read is determined by the address information carried by the read data request, the read data request occurring above a logical address can be collectively analyzed by a controller for this logical address. In other words, the obtained information of the read data request is comprehensive, so the target data to be read can be accurately predicted and read into the cache. DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are Some embodiments of the present invention may also be used to obtain other drawings based on these drawings without departing from the skilled artisan.

1 is a schematic diagram of an application network architecture of a method for caching data according to an embodiment of the present invention;

2 is a flowchart of a method for buffering data according to an embodiment of the present invention;

3 is a flowchart of another method for buffering data according to an embodiment of the present invention; FIG. 5 is a data structure diagram of a data block table in a data prefetching unit according to an embodiment of the present invention; A specific flow chart of another method for buffering data provided; FIG. 7 is a schematic diagram of correspondence between a read data request and a data block according to an embodiment of the present invention; FIG. 8 is a schematic structural diagram of a storage system according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

System architecture of an embodiment of the present invention

The method for buffering data provided by the embodiment of the present invention can be implemented on a storage system. FIG. 1 is a schematic diagram of a system architecture of a method for caching data according to an embodiment of the present invention. As shown in FIG. 1, the storage system includes multiple controllers (including four controllers as an example) and a storage device. In this embodiment, the storage device is illustrated by using a hard disk as an example.

FIG. 1 is only an exemplary description, and is not limited to a specific networking manner, such as: a cascading tree network or a ring network. As long as the controller and the storage device can communicate with each other.

The controller can include any computing device known in the art, such as a server, desktop computer, and the like. In an application scenario of the embodiment of the present invention, each controller can process a read data request from a host, and can also access data stored in the storage device, for example, reading data in the storage device and storing the data in the storage device. Cache. Alternatively, in another application scenario, each controller can process read data requests from the host, but each controller corresponds to a segment of storage space in the storage device (eg, part of a disk or a portion of a disk's storage space), That is to say, each piece of storage space in the storage device has its corresponding specific controller and cannot be managed or accessed by other controllers. It should be noted that the storage device herein refers to a disk, a hard disk or other storage medium, and does not include a controller.

Inside the controller, there is a cache. The cache is a buffer between the CPU and the hard disk. It is smaller than the hard disk but faster than the hard disk. Some data is stored in the Cache. When the CPU processes the read data request, if the available data is found in the cache, it is a cache hit. Each controller can communicate with each other and can access data stored in the cache of other controllers. For example, the controller 0 receives a read data request from the host (not shown) to access the data A, and the cache 0 of the controller 0 does not store the data A, but the cache of the controller 1 stores the data A, so the controller 0 can send a data read command to the controller 1, so that the controller 1 reads the data A from the cache and sends it to the controller 0, so the controller 0 can directly return the data A to the host. It should be noted that since the data communication between the caches is a high-speed data transmission channel, the cache data sharing access speed between the controllers is very fast, compared with the cache misses that need to read the data in the storage device, from other The process of obtaining data in the controller's cache takes a short time.

The operating system and other software programs are installed in the controller. For example, each controller includes at least one prefetch management unit, and in the embodiment of the present invention, the number of prefetch management units included in each controller is substantially equal. Each prefetch management unit is used to perform a logical storage space data read operation of an address range. The logical storage space managed by each of the prefetch management units may be a Logic Unit Number (LU), a LUN, or a folder, and is not limited thereto. The distribution of the prefetch management unit in the controller can be determined based on the address range of the logical storage space. For example, for a logical address (eg, Logic Block Address, LB A ), a consistent hash algorithm or other hash storage device may be included in a storage device known in the art, such as SSD, or direct access memory (Direct Access) Storage Device, DASD), etc. The storage space of the storage device can be divided into a number of logical chunks, each of which has a unique ID. In this embodiment, the management of the data in the storage device is in chunks, for example, the data can be read into the cache in units of chunks.

According to the foregoing description, in an application scenario of the embodiment of the present invention, each controller can perform read and write operations on the storage device. For example, each controller can read data in the storage device to itself. Inside the cache, you can implement the cache life when executing subsequent read data requests. In another application scenario, each controller corresponds to a part of the storage space of the storage device, and can only read and write data stored in the part of the storage space, for example, reading data stored in the storage space. Go to your own cache, and the data stored in the other storage space of the storage device is managed by other controllers. Method of caching data

The method for buffering data provided by the embodiment of the present invention is as follows. As shown in FIG. 2, a flowchart of a method for caching data according to an embodiment of the present invention is applied to a storage system, where the storage system includes multiple The controller, wherein each controller includes a cache; the method includes: Step S201: The first controller receives a read data request sent by the host, where the read data request carries address information; and the address carried according to the read data request The information determines a second controller; the address information is sent to the second controller.

Optionally, the address information includes a start address and a length of the data to be read, and the second control corresponding to the read data request may be obtained according to the set hash algorithm according to the start address of the data to be read. Device.

Step S202: The second controller obtains address information of the target data to be read according to the address information, to read the target data into the cache according to the address information of the target data.

Optionally, after the second controller obtains the address information of the target data to be read according to the address information, the target data may be read into the cache of the second controller according to the address information of the target data; Or sending a prefetch command to the third controller according to the address information of the target data, and the third controller reads the target data into the cache of the third controller according to the address information of the target data.

Optionally, the embodiment of the present invention may also be applied to a distributed system, where the distributed system includes multiple nodes, each node is a server, and each server performs functions similar to each controller in the storage system. , will not repeat them here.

It should be noted that the read data request sent by the host received by the first controller may be One, or more than one. When the number of read data requests is multiple, the plurality of read data requests may be consecutively determined according to the address information carried by the plurality of read data requests, and if consecutive, the plurality of read data requests may be combined to obtain A continuous piece of address information. Determining, according to the address information carried by the read data request, the second controller specifically refers to determining the second controller according to the merged consecutive address information. If the plurality of read data requests are not consecutive, the second controller is determined according to the address information carried by each read data request.

In the embodiment of the present invention, after receiving the read data request sent by the host, the first controller determines the second controller according to the address information carried by the read data request, and sends the address information to the second The controller obtains target data to be read according to the address information by the second controller to perform an operation of reading the target data to the cache. Since the controller performing the operation of obtaining the target data to be read is determined by the address information carried by the read data request, the read data request occurring above a logical address can be collectively analyzed by a controller for this logical address. In other words, the obtained information of the read data request is comprehensive, so the target data to be read can be accurately predicted and read into the cache. As shown in FIG. 3, it is a flowchart of another method for buffering data provided by an embodiment of the present invention. For convenience of description, the embodiment of the present invention takes three controllers as an example, but is not limited to three controllers. Referring to FIG. 3, the following steps may be specifically performed by a processor in a controller, where the method includes:

Step S301: The first controller receives a first read data request sent by the host, where the first read data request carries address information of the first data to be read, where the address information includes a start of the first data to be read. Address (LBA) and length.

It should be noted that the logical address in the embodiment of the present invention, also called the starting address, is also called LBA.

Step S302: The first controller determines a controller corresponding to the first read data request. In the embodiment of the present invention, each controller includes at least one prefetch management unit, In the embodiment of the present invention, the number of prefetch management units included in each controller is substantially equal, and each prefetch management unit is used to manage a storage space of an address range, such as a segment of the LU. Specifically, the controller corresponding to the first read data request may be obtained according to the LBA of the first data to be read according to the consistency hash algorithm or other hash algorithm, and then the prefetch management unit in the controller is obtained. When the controller includes a prefetch management unit, the controller corresponding to the first read data request determines, and then the prefetch management unit corresponding to the first read data request is determined; when the controller includes multiple prefetches When the unit is managed, each prefetch management unit manages the storage space of a range of addresses, so it is also possible to uniquely determine a prefetch management unit in a controller based on the LBA. For example, the controller corresponding to the first data to be read is a second controller.

A hashing algorithm, also known as a hashing algorithm, is a data structure that determines a unique access address based on a certain key value, with the goal of speeding up the lookup. Where key is the starting address. Alternatively, the usual hashing algorithm can be implemented with a hash table, and the corresponding access address can be obtained by searching in the hash table according to the key value. It should be noted that the general hash algorithm is a linear calculation method. In an embodiment of the invention, a hash algorithm can be used to uniquely determine a controller by inputting a starting address.

Optionally, the consistent hash algorithm can use a circular data structure to implement the location of the key value to the access location. In the ring data structure, a plurality of virtual nodes may be included, such as virtual node 0, virtual node 1, virtual node 2, virtual node 3, virtual node 10, and each virtual node corresponds to an access address. Each of the two adjacent virtual nodes is connected in turn, and the virtual node 10 is connected to the virtual node 0 to form a ring. Take three controllers in the storage system as an example. The three controllers are named controller A, controller B, and controller C. Each controller corresponds to a virtual node in several ring data structures, for example, controller A corresponds. Virtual node 0, virtual node 1, virtual node 2, virtual node 3; controller B corresponds to virtual node 4, virtual node 5, virtual node 6, virtual node 7; controller C corresponds to virtual node 8, virtual node 9, virtual node 10. In this way, it is also possible to make the starting address unique to one controller.

However, for a consistent hash algorithm, when a new controller is added to the storage system, for example, With controller D, there is no need to rearrange the data structure or modify the algorithm. Only need to adjust the corresponding virtual node of each controller, so that the newly added controller D can correspond to a starting address.

Optionally, each controller of the storage system may save a preset configuration table, where the configuration table includes a correspondence between a starting address and each controller, and the controller that receives the read data request may be configured according to The start address carried in the read data request is queried in the configuration table to obtain a controller corresponding to the start address.

Optionally, the configuration table may be saved in only one controller. When the other controller receives the read data request, the controller may send a query request to the controller that saves the configuration table, where the query request includes the read data request to carry. a starting address, so that the controller that saves the configuration table can perform a query in the configuration table according to the starting address, thereby obtaining a controller corresponding to the starting address, and sending the query result to the received read data. Requested controller.

Optionally, in order to prevent the configuration table from being lost when the controller that saves the configuration table is lost, a copy of the configuration table may be saved in another controller of the storage system.

Optionally, the controller corresponding to the start address may be obtained by using a modulo method. Specifically, the start address is divided by the number of controllers, and the corresponding control is obtained according to the calculated mode. Device.

Step S303: The second controller receives a second read data request sent by the host, where the second read data request carries address information of the second data to be read, where the address information includes a start of the second data to be read. Address (LBA) and length.

It should be noted that there is no order between step S301 and step S303.

Step S304: The second controller determines a controller corresponding to the second read data request.

Specifically, the corresponding controller may be obtained according to the LBA of the second data to be read, which is similar to step S302, and details are not described herein again.

When the controller corresponding to the first read data request is different from the controller corresponding to the second read data request, the two controllers may respectively perform the operations of buffering data prefetching without affecting each other. Here, the controller corresponding to the first read data request and the controller corresponding to the second read data request are mainly discussed. The same situation.

Step S305: The first controller sends the landing information of the first read data request to the second controller, where the drop information includes address information of the first data to be read, and the drop information may further include sending the first The ID of the host that reads the data request, and the ID of the first controller, etc., and the drop point information can be used as an analysis basis for data prefetching.

Since the controller corresponding to the second read data request is the second controller, the processor of the second controller may push the address stored in the cache of the second read data request to the pre-fetch of the second controller. The management unit, or otherwise, causes the prefetch management unit of the second controller to obtain the drop information of the second read data request, which is not limited herein.

Step S306: The second controller predicts the target data of the next read data request according to the drop point information of the first read data request and the drop point information of the second read data request.

The next read data request refers to the read data request that the storage system is about to receive (not yet received). For convenience of description, the next read data request is referred to as the third read data request. It should be noted that the next read data request is not limited to the read data request immediately following the first read data request and the second read data request, as long as the read data is received after the first read data request and the second read data request. The request can be called the next read data request.

The second controller may utilize the prefetch management unit to predict the target data of the third read data request. The prefetch management unit is a function unit included in the controller for performing a cache data read operation. As shown in FIG. 4, the prefetch management unit includes: a data block table, an interface, and a prefetch policy module. Where the data block table contains a plurality of data blocks, each data block corresponding to a logical block on the disk

( chunk ) , and the same size. The data block is used to record the landing information of the read data request as well as other information. The data block table may be sorted according to the LBAs included in the drop information, from small to large, or from large to small. In addition, the data block table of the embodiment of the present invention may not be limited to the form of a table, and may also be a red-black tree, or a binary tree or other data management structure that can implement sequential search. It should be noted that the data recorded in each data block is the landing information of the read data request and other information, but does not include the data to be read itself. In addition, the interface is used to receive other controls The drop information of the read data request sent by the device, or send a prefetch command to other controllers; the prefetch policy module is configured to perform a read operation according to the set prefetch policy.

As shown in FIG. 5, the information recorded in each data block may include: a Chunk ID, a drop point information, and an ID of a chunk that was last read, wherein the drop point information may specifically include a host ID, a controller ID, an LBA, and Length

The Chunk ID specifically refers to the ID of the chunk on the disk corresponding to the data block, and the Chunk ID is multiplied by the size of each chunk to obtain the starting address of the chunk. Since the prefetching of data is in chunks, it is also necessary to know the starting address of the chunk in which the target data is located when prefetching the target data. It should be noted that the LBA described above refers to the starting address of the request according to the request, which is different from the starting address of the chunk. In some cases, the start address of the request is the same as the start address of the chunk, but in most cases, the start address of the read data request is not the same as the start address of the chunk.

The ID of the most recently read chunk is used to identify the prefetch range of the last read operation occurring on the data block, since the read operation is performed in chunks. Therefore, it can be determined according to the ID of the chunk that was read last time whether the prefetching overlaps with the range of the last prefetch, and if so, the chunk of the overlapping portion may not be included when the reading operation is performed.

In addition, you can record the serial number, time stamp, bitmap, and other information of the read data request. As shown in FIG. 6, step S306 may specifically include the following steps:

S3061: Determine, according to the falling point information of the first read data request and the falling point information of the second read data request, that the first read data request and the second read data request have a sequential order relationship.

The specific determination method is: obtaining the end address of the first read data request according to the start address and the length of the first read data request, if the end address of the first read data request and the start address of the second read data request are consecutive, Describe that the first read data request and the second read data request have a sequential order relationship; or, obtain the end address of the second read data request according to the start address and length of the second read data request, if the end of the second read data request Address and first read data request The start address is continuous, indicating that the first read data request and the second read data request have a sequential order relationship.

It should be noted that, in the embodiment of the present invention, the first read data request and the second read data request may not be absolutely continuous, and a certain degree of address gap is allowed between them.

S3062: Determine a data block corresponding to the first read data request and the second read data request.

Specifically, the data block corresponding to the first read data request and the second read data request may be determined according to the drop information of the first read data request and the drop information of the second read data request. Since the first read data request and the second read data request have a sequential order relationship, their corresponding data blocks are also continuous. The data block corresponding to the first read data request and the second read data request may be as shown in FIG.

S3063: Determine a data block consecutive data block corresponding to the first read data request and the second read data request, to obtain a maximum continuous segment of the data block.

Specifically, the data block corresponding to the first read data request and the second read data request may be traversed forward, if the data block has a data block corresponding to the first read data request and the second read data request. Continuously, according to the falling point information recorded on the consecutive data blocks, it is determined whether the read data request on the data block last time is consecutive with the current first read data request and the second read data request, and if continuous, continue Consecutive data blocks are obtained in the data block table until a maximum contiguous segment of data blocks corresponding to the first read data request and the second read data request is obtained.

S3064: Get the length and start address of the target data.

Specifically, according to the size of the largest continuous segment of the data block obtained in step S3063, the length of the target data is obtained by calculating the prefetching strategy.

Optionally, if the ID of the chunk that was read last time is known, and part of the target data is prefetched during the last read operation, the data may be removed.

In addition, in order to predict the target data of the next read data request, in addition to determining the length of the target data, it is also necessary to determine the start address of the target data. In this embodiment, the start address of the target data is the end address of the first read data request or the end address of the second read data request. After the second controller predicts the target data of the next read data request, the target data may be read from the disk according to the start address and the length of the target data, and stored in the cache of the second controller, to A cache hit is available for the next execution of the third read data request. This situation is primarily applicable to scenarios where each controller can manage or access each disk in the storage device.

However, in some application scenarios, each controller can only manage or access part of the storage space (partial disk or part of the storage space of a disk) in the storage device, that is, each storage space in the storage device has The corresponding specific controller cannot be managed or accessed by other controllers. In this case, if the storage space in which the target data is located is managed by the third controller, the embodiment may further include:

Step S307: The second controller sends a prefetch command to the third controller, where the prefetch command includes a start address and a length of the target data, so that the third controller reads the target data into its cache.

Optionally, the second controller may determine, according to the start address of the target data, that the storage space where the target data is located is managed by the third controller, so that the data prefetch command may be sent to the third controller. Specifically, the second controller may obtain a controller corresponding to the storage space where the target data is located by using a system configuration or some existing calculation methods, from a starting address of the target data.

Step S308: The third controller prefetches the target data into its cache.

Specifically, the third controller may read the target data from the disk according to the starting address and length of the target data, and store the data in the cache of the third controller.

Step S309: The first controller receives a third read data request sent by the host, where the data to be read requested by the third read data is the target data, or the data to be read is a part of the target data.

After receiving the third read data request, the first controller finds that the target data is not stored in the cache, but the target data is stored in the cache of the third controller, and the step of executing S310.

Optionally, receiving the third read data request may be any one of the controllers in the storage system, and if receiving the third controller of the third read data request, the third controller may directly cache the Sending target data to the host does not need to perform steps S310-S311; if receiving the third read data request is another controller in the storage system, the operating step and the first controller receive the third read data The request is similar.

Step S310: The first controller sends a data read command to the third controller, where the data read command includes a start address and a length of the target data, and is used to request the third controller to send the target data. .

Step S311: The third controller sends the target data to the first controller.

It should be noted that the data channel between the controllers uses a high-speed data transmission channel. According to statistics, the access speed of the cache data between the controllers is generally less than lms. However, if there is no cache hit, the speed of reading target data from disk is 6-10ms. Therefore, even across controller hits, the speed is much faster than reading data from disk.

Step S312: After receiving the target data sent by the third controller, the first controller sends the target data to the host, that is, the cache hit is implemented.

In addition, in this embodiment, the first controller may further determine, according to the information carried in the third read data request, a controller corresponding to the third read data request, and send, to the controller, the drop information of the third read data request, so that The prefetch management unit of the controller continues to predict the data to be read of the next read data request, and performs an operation of buffering the data. The process may repeat steps S301 to S308, and details are not described herein again.

In the embodiment of the present invention, after receiving the read data request sent by the host, the first controller determines the second controller according to the address information carried by the read data request, and sends the address information to the second The controller obtains target data to be read according to the address information by the second controller to perform an operation of reading the target data to the cache. Since the controller performing the operation of obtaining the target data to be read is determined by the address information carried by the read data request, at a logical address The read data request generated by the surface can be analyzed centrally by a controller. For this logical address, the obtained read data request information is comprehensive, so the target data to be read can be accurately predicted and read. In the cache.

Storage System

The following describes the storage system provided by the embodiment of the present invention. As shown in FIG. 8 , it is a structural diagram of a storage system 80 according to an embodiment of the present invention, which includes multiple controllers, where each controller includes a processor and a cache; :

The first controller 801 is configured to receive a read data request sent by the host, where the read data request carries the address information, determine the second controller 802 according to the address information carried by the read data request, and send the second controller 802 to the second controller 802. The address information. Specifically, the above operation is performed by the processor in the first controller 801.

Optionally, the address information includes a start address and a length of the data to be read, and the controller corresponding to the read data request is obtained according to the set hash algorithm according to the start address of the data to be read. The second controller 802.

In the embodiment of the present invention, each controller includes at least one prefetch management unit. In the embodiment of the present invention, the number of prefetch management units included in each controller is substantially equal, and each prefetch The snap-in is used to manage storage space for a range of addresses, such as a section of the LU. Specifically, the controller corresponding to the first read data request may be obtained according to the LBA of the first data to be read according to the consistency hash algorithm or other hash algorithm, and then the prefetch management unit in the controller is obtained. When the controller includes a prefetch management unit, the controller corresponding to the first read data request determines, and then the prefetch management unit corresponding to the first read data request is determined; when the controller includes multiple prefetches When the unit is managed, each prefetch management unit manages the storage space of a range of addresses, so it is also possible to uniquely determine a prefetch management unit in a controller based on the LBA. For example, the controller corresponding to the first data to be read is a second controller.

Optionally, a hash algorithm may be used to uniquely determine a controller by inputting a starting address, and the hashing algorithm may be a consistent hashing algorithm. Optionally, each controller of the storage system may save a preset configuration table, where the configuration table includes a correspondence between a starting address and each controller, and the controller that receives the read data request may be configured according to The start address carried in the read data request is queried in the configuration table to obtain a controller corresponding to the start address.

The first controller 801 determines the second controller according to the address information carried by the read data request.

After 802, the address information can be sent to the second controller 802.

The second controller 802 is configured to obtain address information of the target data to be read according to the address information, to read the target data into the cache according to the address information of the target data. Specifically, the above operation is performed by the processor in the first controller 801.

Optionally, after obtaining the address information of the target data to be read according to the address information, the second controller 802 may read the target data to the cache of the second controller 802 according to the address information of the target data. Or sending a prefetch command to the third controller 803 according to the address information of the target data, and the third controller 803 reads the target data to the cache of the third controller 803 according to the address information of the target data. in.

The second controller 802 can use the prefetch management unit to obtain target data to be read. The specific obtaining method is similar to the method embodiment described above, and details are not described herein again. Optionally, the first controller 801 receives a next read data request sent by the host, where the data to be read of the next read data request is the target data, or the data to be read is a part of the target data. After receiving the next read data request, the first controller 801 finds that the target data is not stored in the cache, but the third controller 803 stores the target data in the cache, and may go to the third controller 803. And sending a data read command, where the data read command includes a start address and a length of the target data, and is used to request the third controller 803 to send the target data. After receiving the data read command, the third controller 803 transmits the target data to the first controller 801.

It should be noted that the next read data request is not limited to the read data request received immediately by the first controller, and the read data request received after the read data request may be referred to as the next read data request.

Since the data channel between the controllers uses a high-speed data transmission channel, according to statistics, the access speed of the cache data between the controllers is generally less than lms. However, if you can't cache hits, the target data is read from disk at 6-10ms. Therefore, even if it is hit across controllers, its speed is much greater than the speed of reading data from disk.

In the embodiment of the present invention, after receiving the read data request sent by the host, the first controller determines the second controller according to the address information carried by the read data request, and sends the address information to the second The controller obtains target data to be read according to the address information by the second controller to perform an operation of reading the target data to the cache. Since the controller performing the operation of obtaining the target data to be read is determined by the address information carried by the read data request, the read data request occurring above a logical address can be collectively analyzed by a controller for this logical address. In other words, the obtained information of the read data request is comprehensive, so the target data to be read can be accurately predicted and read into the cache.

Those of ordinary skill in the art will appreciate that various aspects of the present invention, or possible implementations of various aspects, may be embodied as a system, method, or computer program product. Thus, aspects of the invention, or possible implementations of various aspects, may employ an entirely hardware embodiment, full software Embodiments (including firmware, resident software, etc.), or a combination of software and hardware aspects, are collectively referred to herein as "circuits,""modules," or "systems." Furthermore, aspects of the invention, or possible implementations of various aspects, may take the form of a computer program product, which is a computer readable program code stored on a computer readable medium.

The computer readable medium can be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erase programmable read-only memory (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM:).

The processor in the computer reads the computer readable program code stored in the computer readable medium, such that the processor can perform the functional actions specified in each step or combination of steps in the flowchart; A device that functions as specified in each block, or combination of blocks.

The computer readable program code can be executed entirely on the user's computer, partly on the user's computer, as a separate software package, partly on the user's computer and partly on the remote computer, or entirely on the remote computer or server. . It should also be noted that in some alternative implementations, the functions noted in the various steps of the flowchart, or in the blocks in the block diagrams, may not occur in the order noted. For example, depending on the functionality involved, two steps, or two blocks, shown in succession may in fact be executed substantially simultaneously, or the blocks may sometimes be executed in the reverse order.

Claims

claims

1. A method of caching data. The method is applied to a storage system. The storage system includes multiple controllers, where each controller includes a cache; characterized in that the method includes:

The first controller receives a read data request sent by the host, and the read data request carries address information; determines the second controller based on the address information carried by the read data request; sends the address information to the second controller;

The second controller obtains the address information of the target data to be read according to the address information, and reads the target data into the cache according to the address information of the target data.

2. The method according to claim 1, characterized in that the address information carried in the read data request includes the starting address carried in the read data request;

Determining the second controller based on the address information carried in the read data request includes: determining the second controller based on the starting address carried in the read data request and according to the set hash algorithm.

3. The method according to claim 2, characterized in that the set hash algorithm includes a consistent hash algorithm.

4. The method according to claim 1, characterized in that the address information carried in the read data request includes the starting address carried in the read data request;

Determining the second controller according to the address information carried in the read data request includes: querying a preset configuration table according to the starting address to obtain the second controller corresponding to the starting address.

5. The method according to claim 1, characterized in that, reading the target data into the cache according to the address information of the target data includes:

The second controller reads the target data into the cache of the second controller according to the address information of the target data.

6. The method according to claim 1, characterized in that: according to the target data The address information to read the target data into the cache includes:

The second controller determines the third controller corresponding to the target data according to the address information of the target data; sends a prefetch command to the third controller, where the prefetch command includes the address of the target data. information;

The third controller reads the target data into the cache of the third controller according to the address information of the target data.

7. The method according to claim 6, further comprising:

The first controller receives the next read data request sent by the host, where the next read data request includes the address information of the target data;

Determine that the target data is stored in the cache of the third controller according to the address information of the target data;

Send a data read command to the third controller, causing the third controller to send the target data to the first controller;

Send the target data to the host.

8. The method according to claim 1, wherein the address information of the target data includes the starting address and length of the target data;

The second controller obtains the address information of the target data to be read according to the address information, including:

The second controller determines the starting address of the target data based on the address information carried in the read data request sent by the host; and

According to the address information carried in the read data request sent by the host, determine the corresponding first data block in the data block table and the second data block that is continuous with the first data block; According to the first data block and the third data block, The sum of the sizes of two data blocks is obtained to obtain the length of the target data.

9. A storage system, characterized in that the system includes:

The first controller is configured to receive a read data request sent by the host, where the read data request carries address information; determine the second controller according to the address information carried in the read data request; and send the data to the third controller. The second controller sends the address information;

The second controller is configured to obtain the address information of the target data to be read according to the address information, and read the target data into the cache according to the address information of the target data.

10. The system according to claim 9, characterized in that the address information carried in the read data request includes the starting address carried in the read data request;

The first controller is specifically configured to determine the second controller according to the starting address carried in the read data request and according to the set hash algorithm.

11. The system according to claim 10, characterized in that the set hash algorithm includes a consistent hash algorithm.

12. The system according to claim 9, characterized in that the address information carried in the read data request includes the starting address carried in the read data request;

The first controller is specifically configured to query a preset configuration table according to the starting address to obtain the second controller corresponding to the starting address.

13. The system according to claim 9, wherein the second controller is further configured to read the target data into the cache of the second controller according to the address information of the target data.

14. The system according to claim 9, wherein the system further includes a third controller;

The second controller is further configured to determine a third controller corresponding to the target data according to the address information of the target data; and send a prefetch command to the third controller, where the prefetch command includes the target data. The address information of the data;

The third controller is configured to read the target data into the cache of the third controller according to the address information of the target data.

15. The system according to claim 14, characterized in that,

The first controller is also configured to receive a next read data request sent by the host, where the next read data request includes address information of the target data; Determine that the target data is stored in the cache of the third controller according to the address information of the target data;

Send the target data to the host.

16. The system according to claim 9, wherein the address information of the target data includes the starting address and length of the target data;

The second controller is configured to determine the starting address of the target data according to the address information carried in the read data request sent by the host; and