WO2023045492A1 - 一种数据预取方法、计算节点和存储系统 - Google Patents
一种数据预取方法、计算节点和存储系统 Download PDFInfo
- Publication number
- WO2023045492A1 WO2023045492A1 PCT/CN2022/104124 CN2022104124W WO2023045492A1 WO 2023045492 A1 WO2023045492 A1 WO 2023045492A1 CN 2022104124 W CN2022104124 W CN 2022104124W WO 2023045492 A1 WO2023045492 A1 WO 2023045492A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- cache
- prefetch
- node
- request
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000004044 response Effects 0.000 claims abstract description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 39
- 230000015654 memory Effects 0.000 claims description 39
- 238000004590 computer program Methods 0.000 claims description 14
- 238000005065 mining Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000006978 adaptation Effects 0.000 description 45
- 238000005192 partition Methods 0.000 description 17
- 238000013500 data storage Methods 0.000 description 12
- 230000002776 aggregation Effects 0.000 description 11
- 238000004220 aggregation Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000011010 flushing procedure Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000004931 aggregating effect Effects 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 239000011232 storage material Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0868—Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/15—Use in a specific computing environment
- G06F2212/154—Networked environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/21—Employing a record carrier using a specific recording technology
- G06F2212/214—Solid state disk
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/22—Employing cache memory using specific memory technology
- G06F2212/224—Disk storage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/25—Using a specific main memory architecture
- G06F2212/254—Distributed memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/26—Using a specific storage system architecture
- G06F2212/261—Storage comprising a plurality of storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/28—Using a specific disk cache architecture
- G06F2212/283—Plural cache memories
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/31—Providing disk cache in a specific location of a storage system
- G06F2212/314—In storage network, e.g. network attached cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6024—History based prefetching
Definitions
- the present application relates to the field of computer technology, and in particular to a data prefetching method, a computing node and a computer system.
- a storage system usually includes multiple computing nodes and multiple storage nodes connected to each other.
- the computing nodes write generated data into the storage nodes and read data from the storage nodes.
- the memory of the storage system is usually used to store the data written or read by the computing node, or the data is preloaded from the main memory of the storage node to the memory of the storage system.
- cache resources such as memory
- Each computing node can cache data to any address in the cache pool.
- the cache pool is formed by, for example, cache resources in multiple storage nodes, or may be formed by cache resources in multiple cache nodes included in the storage system. Taking cache nodes as an example, data prefetching is usually recommended on each cache node side, and the prefetching accuracy of this solution is poor. Alternatively, a central node is set in the cache node for data prefetch recommendation, which results in a long prefetch delay and increases network communication costs.
- the embodiment of the present application aims to provide a data prefetching method, a computing node, and a storage system. By recommending prefetched data on the computing node side, the accuracy of prefetching is improved and network communication costs are reduced.
- the first aspect of the present application provides a data prefetching method, the method includes: the computing node acquires the access information of the first application to the storage node within a preset period of time; the computing node determines the prefetching method based on the access information fetch data information; the computing node determines a cache node to prefetch the prefetched data according to the information of the prefetching data, and generates a prefetching request for prefetching the prefetching data; the computing node sends the prefetching request to The cache node; the cache node performs a prefetch operation on the prefetch data in response to the prefetch request.
- the accuracy of the prefetch is improved and the network communication cost is reduced.
- the determining, by the computing node, the information of the prefetched data based on the access information includes: determining, by the computing node, the information of the prefetched data based on the access information according to a prefetching recommendation model.
- the accuracy and efficiency of the prefetch data recommendation are improved by determining the information of the prefetch data by the prefetch recommendation model.
- the prefetching recommendation model is based on at least one of the following algorithms: a clustering algorithm, a time series prediction algorithm, a frequent pattern mining algorithm, and a hot data identification algorithm.
- the access information includes the access information of the first user
- the computing node determines that the prefetch data information based on the access information according to the prefetch recommendation model includes, the prefetch The recommendation model determines an access pattern of the first user based on the access information of the first user, and determines data to be prefetched according to the access pattern.
- the prefetch request is a prefetch request for data blocks, file data, or object data
- the method further includes, when the cache node receives After the prefetch request for the prefetch data, the prefetch request is converted into a format and semantics uniformly set for data blocks, file data and object data.
- cache nodes By converting prefetch requests into a unified format and semantics, cache nodes only need to set up one prefetch interface, avoiding the cost and operational complexity of maintaining multiple protocols. Moreover, global cache pools corresponding to different applications and different data types can be set, which improves the utilization rate of cache resources.
- the information of the prefetched data includes a first identifier of the prefetched data
- the converting the prefetch request into a unified Setting the format and semantics includes converting the first identifier in the prefetching request into a second identifier conforming to a preset format.
- converting the first identifier in the prefetch request into a second identifier conforming to a preset format includes converting the first identifier into the Describe the second identifier.
- the cache node includes a write cache and a read cache
- performing a prefetch operation on the prefetch data by the cache node in response to the prefetch request includes: the cache The node determines whether the prefetch data is stored in the write cache based on the second identifier, and if it is determined that the prefetch data is stored in the write cache, combines the prefetch data with the second identifier Correspondingly stored in the read cache.
- the cache node executing the prefetch operation on the prefetch data in response to the prefetch request further includes: the cache node determines that the write cache does not store In the case of the prefetched data, determine whether the prefetched data is stored in the read cache based on the second identifier, and in the case of determining that the prefetched data is not stored in the read cache, based on The second identifier generates a data read request, and sends the data read request to the storage node; the storage node reads the prefetched data according to the data read request, and returns the prefetched data to the A cache node; the cache node stores the prefetched data and the second identifier in the read cache correspondingly.
- the second aspect of the present application provides a storage system, including a computing node, a cache node, and a storage node, and the computing node is used to: acquire the access information of the first application to the storage node within a preset period of time; determine based on the access information Prefetch data information; determine a cache node for prefetching the prefetch data according to the prefetch data information, and generate a prefetch request for prefetching the prefetch data; send the prefetch request to the cache A node; the cache node is configured to perform a prefetch operation on the prefetch data in response to the prefetch request.
- the information used by the computing node to determine the prefetched data based on the access information specifically includes: the computing node is used to determine the prefetched data based on the access information according to a prefetch recommendation model. Get data information.
- the prefetching recommendation model is based on at least one of the following algorithms: a clustering algorithm, a time series prediction algorithm, a frequent pattern mining algorithm, and a hot data identification algorithm.
- the access information includes access information of the first user
- the information used by the computing node to determine prefetch data based on the access information according to the prefetch recommendation model includes, the The computing node is configured to determine an access pattern of the first user based on the access information of the first user through the prefetching recommendation model, and determine data to be prefetched according to the access pattern.
- the prefetch request is a prefetch request for data blocks, file data, or object data
- the cache node is further configured to: receive the After the prefetching request of the prefetching data, the prefetching request is converted into a format and semantics uniformly set for data blocks, file data and object data.
- the information of the prefetched data includes a first identifier of the prefetched data
- the cache node is used to convert the prefetch request into
- the format and semantics set uniformly with the object data include that the cache node is used to convert the first identifier in the prefetching request into a second identifier conforming to a preset format.
- the cache node being used to convert the first identifier in the prefetch request into a second identifier conforming to a preset format includes, the cache node being used to pass the hash An algorithm converts the first identification to the second identification.
- the cache node includes a write cache and a read cache
- the cache node is configured to perform a prefetch operation on the prefetch data in response to the prefetch request includes: The cache node is configured to determine whether the prefetch data is stored in the write cache based on the second identifier, and if it is determined that the prefetch data is stored in the write cache, combine the prefetch data with the The second identifier is correspondingly stored in the read cache.
- the cache node being configured to perform the prefetch operation on the prefetch data in response to the prefetch request further includes: the cache node being configured to determine the write In the case that the prefetched data is not stored in the cache, determine whether the prefetched data is stored in the read cache based on the second identifier, and determine that the prefetched data is not stored in the read cache In this case, a data read request is generated based on the second identifier, and the data read request is sent to the storage node; the storage node is further configured to read the prefetched data according to the data read request, and send the data read request to the storage node. The prefetched data is returned to the cache node; the cache node is further configured to store the prefetched data and the second identifier in the read cache correspondingly.
- the third aspect of the present application provides a data prefetching method, the method is executed by a computing node, including: obtaining access information of a first application to a storage node within a preset period of time; determining information of prefetched data based on the access information ; Determine a cache node to prefetch the prefetch data according to the information of the prefetch data, and generate a prefetch request for prefetching the prefetch data; send the prefetch request to the cache node.
- the determining the information of the prefetched data based on the access information includes: determining the information of the prefetched data based on the access information according to a prefetch recommendation model.
- the prefetching recommendation model is based on at least one of the following algorithms: a clustering algorithm, a time series prediction algorithm, a frequent pattern mining algorithm, and a hot data identification algorithm.
- the access information includes the access information of the first user, and the information for determining prefetch data based on the access information according to the prefetch recommendation model includes the prefetch recommendation model
- An access pattern of the first user is determined based on the access information of the first user, and data to be prefetched is determined according to the access pattern.
- a fourth aspect of the present application provides a computing node, including a processor and a memory, where executable computer program instructions are stored in the memory, and the processor executes the executable computer program instructions to implement the third aspect and its possible Implement the method described in the manner.
- the fifth aspect of the present application provides a computer-readable storage medium, which is characterized in that the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed in a computer or a processor, the computer Or the processor implements the method described in the third aspect and its possible implementation manners.
- the sixth aspect of the present application provides a computer program product, including computer program instructions.
- the computer program instructions When the computer program instructions are run on a computer or a processor, the computer or processor implements the third aspect and its possible implementations. described method.
- Fig. 1 is the architectural diagram of the computer system provided by the embodiment of the present application.
- FIG. 2 is a schematic structural diagram of a computing node and a cache node provided in an embodiment of the present application
- Fig. 3 is a schematic diagram of a method for performing data routing by a client adaptation layer
- FIG. 4 is a flowchart of a method for writing data in a storage system provided by an embodiment of the present application
- FIG. 5 is a flowchart of a method for reading data in a storage system provided by an embodiment of the present application
- FIG. 6 is a flowchart of a method for prefetching data in a storage system according to an embodiment of the present application
- FIG. 7 is a schematic diagram of a user access mode provided by an embodiment of the present application.
- FIG. 8 is a structural diagram of a computing node provided by an embodiment of the present application.
- FIG. 1 is an architecture diagram of a computer system provided by an embodiment of the present application.
- the computer system is, for example, a storage system, including a computing cluster 100 , a cache cluster 200 and a storage cluster 300 .
- the computing cluster 100 includes a plurality of computing nodes, and Fig. 1 schematically shows a computing node 10a and a computing node 10b.
- a computing node can access data to a storage node through an application program or application (Application, APP), so it is also called an "application server”.
- Compute nodes can be physical machines or virtual machines. Physical machines include, but are not limited to, desktop computers, servers, laptops, and mobile devices.
- the cache cluster 200 may be an independent physical cluster, or share the same cluster with the storage cluster 300 (that is, be deployed in the same cluster).
- resources in the cluster such as storage resources, computing resources, etc.
- the cache cluster 200 includes multiple cache nodes, and the figure schematically shows a cache node 20a, a cache node 20b, and a cache node 20c.
- Each cache node is connected to each other through a network.
- the storage cluster 300 includes multiple storage nodes, and the figure schematically shows a storage node 30a, a storage node 30b, and a storage node 30c. Wherein, the cache node and the storage node may be physical machines or virtual machines.
- the cache node 20a includes a processor 201 , a memory 202 and a hard disk 203 .
- the processor 201 is a central processing unit (central processing unit, CPU), which is used for processing operation requests from computing nodes or operation requests from other cache nodes, and is also used to process requests generated inside the cache nodes.
- CPU central processing unit
- the memory 202 refers to an internal memory directly exchanging data with the processor 201. It can read and write data at any time, and the speed is relatively fast. It is used as a temporary data storage for the operating system or other running programs.
- the memory 202 includes at least two types of memory, for example, the memory can be a random access memory (Random Access Memory, RAM) or a read only memory (Read Only Memory, ROM).
- the RAM may include dynamic random access memory (Dynamic Random Access Memory, DRAM), or storage class memory (Storage Class Memory, SCM) and other memories.
- DRAM is a semiconductor memory, which, like most Random Access Memory (RAM), is a volatile memory device.
- SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory at the same time.
- DRAM is cheaper.
- the DRAM and the SCM are only exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as Static Random Access Memory (Static Random Access Memory, SRAM) and the like.
- SRAM Static Random Access Memory
- the volatile memory in the memory 202 can be configured to have a power saving function, so that when the system is powered off, the data stored in the memory 202 will not be lost. Memory with a power saving function is called non-volatile memory.
- the hard disk 203 is used to provide non-volatile storage resources, and its access delay is generally higher than that of memory, and its cost is lower than that of memory.
- the hard disk 203 may be, for example, a solid state disk (Solid State Disk, SSD), or a mechanical storage hard disk (Hard Disk Drive, HDD).
- a global cache pool can be provided by aggregating the storage resources (such as memory or hard disk) of multiple cache nodes in the cache cluster 200, so that applications in each computing node can use any cache node cache resources.
- the storage medium such as RAM, SCM, SSD
- the storage resource is usually selected as the storage resource in the cache node, so as to provide faster data access speed compared with the storage node.
- the global cache pool provides a unified address space (or namespace) for each computing node, and the computing node can route data to the cache node used to cache the data, avoiding data redundancy and consistency caused by repeated caching of data question.
- high availability of data can be achieved through technologies such as multi-copy, replication, and multi-active.
- a computing node may send a data access request (read request or write request) to the storage cluster to a cache node (eg, cache node 20a ) for caching the data.
- the cache node 20a includes, for example, a write cache and a read cache. If the data access request is a write request, the cache node 20a can write the data into the write cache, return the write success to the computing node 10a, and then perform writing from the write cache to the storage node in the background, thereby improving the performance of the data. Feedback rate for write requests.
- the cache node 20a can first determine whether the data is hit in the write cache, and if it is determined that the data does not exist in the write cache, it can be determined whether the data is stored in the read cache. With the data stored, the cache node 20a can directly read the data from the read cache, and return the data to the computing node 10a, thereby eliminating the need to store the cluster to read the data, shortening the data reading path, and improving the read request. feedback speed.
- storage resources such as memory and PCM in storage nodes may be used to form a cache pool for use by applications in computing nodes.
- Data prefetching generally includes two processes, one is to recommend prefetched data, and the other is to read the recommended prefetched data from the storage node to the cache pool in advance.
- each cache node recommends the prefetched data, so that the cache node can prefetch the prefetched data according to the recommendation.
- each cache node can only obtain the data access history of a part of the data accessed by an application in the latest preset period , the data access history of the full amount of data accessed by the application cannot be known, wherein the data access history includes identifiers of multiple data requested by the application and corresponding access time.
- the cache node only recommends the prefetched data based on the access history of the part of the application data processed by the node. The accuracy of the recommendation is low, and the prefetch bandwidth and cache resources are wasted.
- a specific cache node is set as a central node for data prefetching, and the central node collects the data access history of each application of each computing node from other cache nodes, so that Recommendations for prefetching data are made based on the complete data access history of a single application.
- Other cache nodes may send prefetching recommendation requests to the central node, thereby receiving prefetching recommendation results from the central node, and perform data prefetching according to the prefetching recommendation results.
- this data prefetching method increases additional network communication, increases communication costs, and may cause prefetching to be untimely.
- each computing node performs prefetch data recommendation.
- Computing nodes can use the prefetch recommendation model to make prefetch data recommendations based on the data access history of a single application in the computing node within the latest preset period, so that the recommendation accuracy is high and does not bring additional network communication.
- Prefetch recommendation lower latency.
- FIG. 2 is a schematic structural diagram of a computing node and a cache node provided by an embodiment of the present application.
- Compute node 10a, cache node 20a, and storage node 30a are shown in FIG. 2 as an example.
- one or more applications may be installed in the computing node 10a, and the multiple applications include, for example, databases, virtual machines (virtual machines, VMs), big data, high-performance computing ( High Performance Computing, HPC), Artificial Intelligence (AI), etc. These applications may use different data services provided by storage cluster 300 .
- the storage cluster 300 is a Ceph cluster
- the Ceph cluster is a distributed file system, which deploys Librados service components in the computing nodes to provide services such as block storage services, object storage services, and file storage services for various applications in the computing nodes .
- the client adaptation layer 11 can be deployed on the computing node 10a, and the client adaptation layer 11 can be embedded in the Librados service component deployed on the computing node 10a in the form of a function library. Therefore, the client adaptation layer 11 can intercept the data access request to the storage cluster generated by each application, determine the identifier of the cache node corresponding to the target data of the data access request, and generate The operation request is sent to the cache node, and the operation request is sent to the corresponding cache node 20a (that is, the cache server).
- the operation request includes information such as operation type, destination node, and original data access request, for example.
- the cache node 20a performs a corresponding operation according to the operation request and returns a response message to the client adaptation layer 11 . After receiving the response message from the server, the client adaptation layer 11 parses the message and returns the parsing result to the application in the computing node 10a.
- DAS12 A data analysis service (Data Analysis Service, DAS) module 12 (hereinafter referred to as DAS12) is also deployed in the computing node 10a.
- DAS 12 is used to register message service with client adaptation layer 11, thereby can pull user's read and write access requests of multiple moments from client adaptation layer 11.
- DAS 12 includes a prefetch recommendation model, which mines user access patterns based on the user's read and write access requests at multiple moments, performs prefetch data recommendation accordingly, and pushes the recommendation results to the client adaptation layer 11 .
- the client adaptation layer 11 generates a data prefetch request based on the recommendation result, and sends it to a corresponding cache node, so that the cache node performs data prefetch.
- a server adaptation layer 21 is deployed in the cache node 20a.
- the server adaptation layer 21 is used to receive an operation request from the client adaptation layer 11 through the network, and the operation request includes, for example, operation type, user original data access request and other information.
- the server adaptation layer 21 is also used to perform unified protocol translation and conversion on the original data access request, so as to convert the original data access request into a data access request with a unified format and semantics.
- the server adaptation layer 21 can call the operation interface to process the request according to the operation type.
- the operation interface includes, for example, a write interface, a read interface, and a prefetch interface.
- the cache node 20 a includes a write cache 22 , an L1 read cache (ie, a first-level read cache) 23 and an L2 read cache (ie, a second-level read cache) 24 .
- the write cache 22 includes, for example, the RAM storage space in the internal memory 202 and the SSD storage space in the hard disk 203 in FIG. Dirty data) for protection.
- written data can be stored in the form of multiple copies in the SSD storage space, so as to ensure high reliability of dirty data and high availability in failure scenarios.
- the L1 read cache mainly uses small-capacity, high-performance storage media, such as DRAM and SCM in the memory 202 .
- the L1 read cache serves as a unified entrance for read operations, shielding the existence of the second-level read cache upwards, and avoiding the introduction of management and interaction complexity.
- the L2 read cache mainly uses a large-capacity storage medium to receive the data eliminated by the first-level read cache.
- the L2 read cache can use a non-volatile storage medium (such as SCM or SSD).
- the large-capacity L2 read cache can avoid performance fluctuations and declines caused by untimely prefetching due to limited space in the L1 read cache, or a large amount of hot data being eliminated to the L2 read cache.
- the global cache of the present invention supports the expansion of three or more levels of cache.
- the aggregation module 25 in FIG. 2 is used to aggregate the data with a small amount of data stored in the write cache 22 into data with a large amount of data, and then the storage agent module 26 writes the data with a large amount of data into the storage cluster 300 .
- the cache node 20a also includes a cluster management module 27 .
- the cache cluster 200 generates a partitioned view of the cache cluster through a cluster management module in each cache node.
- a master node for cluster management can be set in the cache cluster 200, and each newly online cache node in the cache cluster registers with the master node through the cluster management module, so that the master node can obtain cache resources in each cache node Information.
- the master node can map cache resources in each cache node to each partition according to a preset algorithm, and generate a partition view, and the partition view includes a mapping relationship between each cache node and each partition. Among them, in the scenario of multi-copy storage, a partition can be mapped to multiple cache nodes. After the master node generates the partition view, it can send the partition view to other cache nodes.
- the cache resources include, for example, cache resources such as write cache, L1 read cache, and L2 read cache in each cache node.
- the client adaptation layer 11 of the computing node 10a can obtain the partitioned view of the cache cluster 200 from the cluster management module 27 of any cache node (eg, the cache node 20a ). After the client adaptation layer 11 intercepts the data access request from the application, it can determine the cache node that processes the data to be accessed from the partition view according to preset rules.
- the client adaptation layer can hash the key (Key) of the data to be accessed to obtain a summary, and then use the summary to modulo the number of partitions to determine the partition number corresponding to the data, and then according to the corresponding partition number in the partition view at least one cache node of the cache node, and determine the cache node corresponding to the data access request.
- Key the key of the data to be accessed
- the summary can be used to modulo the number of partitions to determine the partition number corresponding to the data, and then according to the corresponding partition number in the partition view at least one cache node of the cache node, and determine the cache node corresponding to the data access request.
- FIG. 3 is a schematic diagram of a method for performing data routing by a client adaptation layer.
- the client adaptation layer 11 can determine the partition pt0 and the cache node 20a, cache node 20b, and cache node 20a in Figure 3 according to the partition view in Figure 3 20c, so data access requests can be routed to cache node 20a, cache node 20b, and cache node 20c respectively.
- one partition in FIG. 3 corresponds to three cache nodes, indicating that data is stored in three copies to increase reliability.
- FIG. 4 is a flowchart of a method for writing data in a storage system provided by an embodiment of the present application.
- the method shown in FIG. 4 may be executed by computing nodes, cache nodes, and storage nodes in the storage system, and the computing node 10a, cache node 20a, and storage node 30a are used as examples for description below.
- step S401 the computing node 10a generates a cache node write request based on an application data write request.
- one or more applications can be installed in the computing node 10a, and the client adaptation layer 11 and DAS 12 are also installed in the computing node 10a.
- a data write request is generated.
- the database application may select a data storage service provided by the storage cluster as required, such as a block storage service, an object storage service, or a file storage service.
- the data write request includes, for example, the logical address of the data and the data to be written, and the logical address of the data includes, for example, a logical unit number (logical unit number, LUN), a logical block address (logical block address) , LBA) and data length and other information, the logical address of the data is equivalent to the Key of the data.
- the data write request includes, for example, the object name of the data and the data to be written, and the object name of the data is the Key of the data.
- the data write request includes, for example, the file name of the file data and the directory path where the file is located, and the file name and directory path are equivalent to the Key of the data. That is to say, when different applications use different data services, the format of the Key (such as the form of the Key, the length of the byte, etc.) in the data access requests (including data write requests, data read requests, etc.) big.
- the attributes such as field length, field semantics, etc.
- attributes of fields in data access requests generated by different applications may also be different.
- the client adaptation layer 11 can intercept the data write request from the Librados component, and generate a data write request for sending to the cache cluster 200 based on the data write request. Cache node write request. Specifically, the client adaptation layer 11 first determines that the data to be written corresponds to, for example, the partition pt0, and according to the routing process shown in FIG. 3 , the client adaptation layer 11 can determine that the data to be written should be routed to the cache nodes 20a, 20b and 20c, so that the client adaptation layer 11 can generate three cache node write requests sent to the cache nodes 20a, 20b, and 20c respectively.
- the cache node 20a is used as an example for description below, and the operations of the cache node 20b and the cache node 20c may refer to the operation of the cache node 20a.
- the generated cache node write request sent to the cache node 20a includes, for example: node identifier of the cache node 20a, operation type (write request type), initial data access request and other information.
- step S402 after the computing node 10a generates the cache node write request, it sends the cache node write request to a corresponding cache node, such as the cache node 20a.
- the client adaptation layer 11 may send the cache node write request to the server adaptation layer 21 in the cache node 20a.
- step S403 the cache node converts the cache node write request into a unified format and semantics.
- the server adaptation layer 21 converts the cache node write request to for a unified format and semantics.
- the unified format and semantics correspond to, for example, a data storage service in the storage cluster, such as an object storage service, so that only one data storage service may be provided in the storage cluster.
- the conversion operation may include converting the Key (for example, Key1) of the data to be written in the write request of the cache node to a preset length.
- the preset length is 20 bytes
- the server adaptation layer 21 can adaptively add bytes to the Key1 in a preset manner, so as to increase the length of the Key1 to the Key2 of 20 bytes.
- the server adaptation layer 21 may map Key1 to a 20-byte Key2 through, for example, a hash algorithm.
- the cache node 20a may maintain a correspondence table between the initial Key and the mapped Key through a data table.
- the server adaptation layer 21 maps Key1 to Key2 through a hash algorithm, it can determine whether there is a hash collision based on the data table. In the case of a hash collision, the server adaptation layer 21 can pass a preset algorithm Remap Key1 to a different 20-byte Key, and record the hash collision for query.
- the above conversion operation also includes converting the semantics of the cache node write request into preset semantics.
- the server adaptation layer 21 converts the write request of the cache node according to attributes such as length and semantics of multiple preset fields.
- the cache node can use a unified interface to process data service requests corresponding to different applications and different data storage services. Therefore, a unified global cache pool can be created for different applications, which improves the cache performance. Resource usage.
- step S404 the cache node executes the write request, and writes data to the cache node.
- the cache node when it executes the write request, it calls the write interface to write the data in the write request into the write cache 22 in the cache node.
- the write cache 22 includes RAM storage space and SSD storage space, for example.
- the server adaptation layer 21 converts the cache node write request into a unified format and semantics, it can call the write interface set in the cache node 20a, and the computer code included in the write interface is used for data caching and sending to A series of operations such as storage cluster writing data.
- the data requested to be written in the cache node write request is correspondingly written to the converted Key (for example, Key3) in the write cache 22, and the data is also the aforementioned data write request The data to be written in the request.
- write the data in the SSD space in the write cache 22 in the form of three copies, so as to protect the written data, and at the same time, store the data in the RAM space in the write cache to speed up Query and flush the data (that is, storage to the storage cluster).
- step S405 after writing the data, the cache node returns write request completion information to the computing node.
- the cache node 20a After the cache node 20a completes writing to the write cache, it can immediately return the write request completion information to the computing node 10a, instead of returning the write request completion information after writing the data to the storage cluster, which shortens the feedback time and improves the system efficiency.
- the cache node 20a may determine whether the disk flushing condition is satisfied in the write cache.
- the disk flushing condition includes, for example, any of the following conditions: the data stored in the write cache reaches a preset water level; the current time is a preset disk flushing time (for example, the idle time of the cache node); and a disk flushing instruction is received from a business person.
- the cache node 20a performs data deduplication, data merging, and other processing on part of the data stored in the write cache RAM with an earlier storage time, so as to store it in the storage cluster.
- step S406 the cache node 20a aggregates multiple pieces of data in the write cache received from the write cache.
- the aggregation module 25 in the cache node 20a aggregates multiple pieces of data in the write cache received from the write cache.
- the data volume of the small objects is relatively small, for example, the size is 8Kb.
- the multiple small objects include multiple new data that rewrite old data. These old data may be distributed in different storage addresses in different storage clusters. Therefore, if these small objects are directly written to In the storage cluster, the aforementioned different storage addresses need to be addressed separately, which will generate a large amount of random data writing in the storage cluster. Every random data write in the storage cluster needs to re-seek and rotate the disk in the HDD, which leads to a slowdown in disk flushing speed.
- the data storage speed of the storage medium in the storage cluster is usually slower than that of the cache medium in the cache cluster.
- the speed of flushing the data in the write cache cannot keep up with the speed of writing data in the write cache, and the capacity of the write cache of the cache node 20a is likely to be full, resulting in the loss of application data. Do not write directly to the backend storage, so the write cache will not be able to provide accelerated services.
- the aggregation module 25 aggregates multiple data in the write cache, and writes the aggregated larger data into the storage cluster, thereby increasing the speed of writing into the storage cluster.
- the aggregation module 25 may aggregate, for example, 1000 small objects in the write cache into an 8Mb large object for sequential writing into the storage cluster. In this way, multiple HDD random write operations can be converted into a sequential write operation, that is, only one seek + rotation delay time is required instead of 1000 times, thereby improving the data writing speed of the storage cluster.
- the aggregation module 25 After aggregating multiple small objects into one large object, the aggregation module 25 generates the unique Key of the large object, and records the information of the large object in the metadata in FIG. 2 , including the multiple small objects contained in the large object. The key of the object, and the offset address (offset) and data length (length) of each small object stored in the large object. Aggregation module 25 can store metadata in memory, and store this metadata in the form of multiple copies in non-volatile media (such as SSD) at the same time, and after updating metadata in memory each time, update synchronously Metadata in SSD.
- non-volatile media such as SSD
- the large object can be provided to the storage proxy module 26 for writing into the storage cluster.
- step S407 the cache node 20a generates a data write request.
- the storage proxy module 26 in the cache node 20a acquires the above-mentioned 8Mb large object, it determines the storage node (such as the storage node 30a) corresponding to the data according to the preset data allocation rules, and generates the large object accordingly.
- the object's write request includes, for example, the identifier of the storage node 30a, the Key of the large object, and the large object.
- the cache node 20a can provide each small object to the storage proxy module 26, and the storage proxy module 26 can similarly generate data write requests for each small object respectively.
- step S408 the cache node 20a sends the generated data write request to the storage node 30a.
- the storage proxy module 26 sends the generated data write request to the storage node 30a.
- step S409 the storage node 30a writes corresponding data after receiving the data write request.
- the storage node 30a invokes the write interface to write the data.
- the storage proxy module 26 generates a data write request in a unified format, for example, the data write request has the semantics and format of the object storage service, so the storage node 30a only needs to set a write interface corresponding to the object storage service. It can be understood that the storage proxy module 26 is not limited to generating write requests with the semantics and format of object storage services, but can generate write requests with the semantics and formats of other data storage services.
- the storage node 30a After the storage node 30a finishes writing the data, it may return write success information to the cache node 20a. After the cache node 20a receives the write success information, it can update the old version of each small object stored in the L1 read cache 23 and/or the L2 read cache 24 with the latest version of each small object written, so that the read cache The stored data is the latest version. At the same time, the cache node 20a may delete the flushed data stored in the write cache.
- the aggregation module 25 After the aggregation module 25 aggregates the small objects as described above, when most of the small objects in the large objects that have been flushed become invalid data due to being deleted or modified, it will cause the large objects stored in the storage cluster to Take up more invalid space. For this reason, the aggregation module 25 can reclaim large objects with a large amount of invalid space during idle time. Specifically, the aggregation module 25 can request the storage cluster to read the still valid small object in the large object through the storage agent module 26, and send a request to the storage cluster to delete the large object after the reading is completed, so as to complete the large object. Object recycling. The small objects that are still valid in the recycled large object can be aggregated into a new large object and then re-written into the storage cluster. Specifically, multiple large objects may be reclaimed in descending order of invalid space in each large object. After the aggregation module 25 reclaims the large object, it modifies the metadata accordingly.
- FIG. 5 is a flowchart of a method for reading data in a storage system provided by an embodiment of the present application.
- the method shown in FIG. 5 may be executed by computing nodes, cache nodes, and storage nodes in the storage system, and the computing node 10a, cache node 20a, and storage node 30a are used as examples for description below.
- step S501 the computing node 10a generates a cache node read request based on an application's data read request.
- step S401 when the database application in the computing node 10a wishes to read data from the storage cluster (such as the object whose name is Key1 above), a data read request is generated, which includes the data to be read The name of the object is "Key1". Similarly, the data read request has a format and semantics corresponding to the database application and the data storage service used by the application.
- the client adaptation layer 11 intercepts the data read request, and generates a cache node read request for sending to the cache cluster 200 based on the data read request. Specifically, client adaptation layer 11 may similarly determine that the data to be read should be routed to cache nodes 20a, 20b, and 20c, so that client adaptation layer 11 may generate three cache node read requests. The description below takes the cache node 20a as an example.
- the generated cache node read request sent to the cache node 20a includes, for example, information such as the node identifier of the cache node 20a, the operation type (read request type), and the initial data read request.
- step S502 the computing node 10a sends a cache node read request to the cache node 20a.
- the client adaptation layer 11 After the client adaptation layer 11 generates the cache node read request, it can send the cache node read request to the server adaptation layer 21 in the cache node 20a.
- step S503 the cache node 20a converts the cache node read request into a unified format and semantics.
- step S403 For this step, reference may be made to the description of step S403 above, and details are not repeated here.
- the cache node read request is converted to read the object Key2.
- step S504 the cache node 20a determines whether the local cache stores data to be read.
- the cache node 20a invokes the read interface to determine whether the local cache stores data to be read.
- the cache node 20a may read the data from the local cache, and perform step S508 to return the read data to the computing node 10a.
- the cache node 20a first determines whether the value of the object Key2 is stored in the RAM storage space of the write cache 22, and if yes, reads the value and returns it to the computing node 10a. In the case that the value of the object Key2 is not stored in the write cache 22, the cache node 20a can determine whether the value of the object Key2 is stored in the L1 read cache 23, and if it is determined, the value can be read and returned to the computing node 10a.
- the cache node 20a can determine whether the value of the object Key2 is stored in the L2 read cache 24, and if it is determined, the value can be read and returned to the calculation Node 10a.
- step S505 the cache node 20a generates a data read request when it is determined that no data to be read is stored in the local cache, and sends the data read request to the storage node 30a.
- the cache node 20a may generate a data read request for reading the object Key2.
- the cache node 20a first reads the metadata, determines that the object Key2 corresponds to the large object Key3, and determines the offset address of the read Key2 in the object Key3 and length, and then generate a data read request, the data read request includes the name "Key3" of the object to be read, and the offset address and length of the data to be read in the object Key3.
- step S506 the storage node 30a reads data.
- the storage node 30a After receiving the data read request from the cache node 20a, the storage node 30a reads the data corresponding to the offset address and the length in the object Key3, thereby reading the object Key2.
- step S507 the storage node 30a returns the read data to the cache node 20a.
- step S508 the cache node 20a returns the read data to the computing node 10a.
- the cache node 20a converts key2 into key1 through the server adaptation layer 21, and returns the value of key2 received from the storage node 30a as the value of key1 to the computing node 10a, so that the computing node 10a returns the value of key1 to application.
- FIG. 6 is a flowchart of a method for prefetching data in a storage system provided by an embodiment of the present application.
- the method shown in FIG. 6 may be executed by computing nodes, cache nodes, and storage nodes in the storage system, and the computing node 10a, cache node 20a, and storage node 30a are used as examples for description below.
- step S601 the computing node 10 a acquires the data access history of the application within the latest preset period.
- the DAS 12 in the computing node 10a acquires the data access history of the application within the latest preset period.
- the DAS 12 can pull the read and write access requests of the application from the client adaptation layer 11, so as to obtain the data access history of the application user within the latest preset period.
- the data access history includes, for example, data identifiers read or written by the user in the latest preset period, and time information for reading or writing.
- step S602 the computing node 10a recommends data to be prefetched based on the data access history of each application.
- the DAS 12 in the computing node recommends the data to be prefetched through the prefetch recommendation model.
- the prefetch recommendation model can use various algorithms.
- the prefetching recommendation model may include a clustering model, which is used to perform multi-dimensional feature clustering on the data in the user's data access history, so as to perform data prefetching recommendation according to the clustering result.
- the prefetch recommendation model may also include a time series prediction model, which is used to predict the data that the user will access at the next moment, so as to perform data prefetch recommendation according to the prediction result.
- the prefetching recommendation model may also include algorithms such as frequent pattern mining and hot data identification.
- the prefetching recommendation model can determine user access modes based on various algorithms, such as streaming mode, hotspot mode, association mode, working set association mode, and the like.
- Fig. 7 shows a schematic diagram of various user access modes.
- the horizontal axis represents time, for example, and the vertical axis represents data identification (ie, the key of the data), for example.
- the prefetch recommendation model can predict the data that the user will access at the next moment as the recommended prefetch data according to this relationship, specifically,
- the prefetch recommendation model outputs an identification of recommended prefetched data.
- the hotspot mode the hotspot data at different times can be predicted, so that the hotspot data at the next time can be predicted according to this mode as the recommended prefetch data.
- the association mode read-read association or write-read association
- the data read by the user in the next period is associated with the data read or written in the previous period. Therefore, the data that the user will read at the next moment can be predicted according to this mode.
- data as recommended prefetch data In the work machine association mode, the user's access to a data table (such as Table 2) is associated with the user's access to another data table (such as Table 1). data as recommended prefetch data.
- DAS 12 supports stateless deployment, and mode mining can be performed again after the computing node 10a fails or the process is restarted.
- DAS12 can also write the access patterns mined by the prefetched recommendation model into persistent media, and read user access patterns from the persistent media after events such as faults, restarts, and upgrades, so as to achieve rapid warm-up.
- the prefetch recommendation model provides the identifier of the recommended prefetch data to the client adaptation layer 11 after predicting the identifier of the recommended prefetch data (for example, Key1).
- the prefetch recommendation model is only an implementation manner of the embodiment of the present application, and other manners for recommending prefetch data are also included in the scope disclosed in the embodiment of the present application.
- step S603 the computing node 10a generates a data prefetching request according to the recommended prefetching data.
- the client adaptation layer 11 of the computing node 10a determines the corresponding cache node (such as cache node Key2) according to the recommended prefetch data identifier Key1, thereby generating a data prefetch request.
- the data prefetch request includes an operation request type (prefetch type), an identifier of the cache node 20a, and an identifier of the data to be prefetched (Key2).
- step S604 the computing node sends a data prefetching request to the cache node 20a.
- the client adaptation layer 11 sends the data prefetching request to the server adaptation layer 21 in the cache node 20a.
- step S605 the cache node 20a converts the data prefetching request into a unified format and semantics.
- step S403 For this step, reference may be made to the description of step S403 above, and details are not repeated here.
- the data prefetch request is used to prefetch the value of the object Key2.
- step S606 the cache node 20a determines whether the write cache stores data to be prefetched.
- the cache node 20a first invokes the prefetch interface, and after executing the prefetch interface, the cache node 20a first determines whether the value of the object Key2 is stored in the RAM storage space of the write cache 22 .
- the cache node 20a determines that the data to be prefetched is stored in the write cache 22, it can read the data from the write cache, and perform step S611 to store the data in the L1 read cache or the L2 read cache, and End this prefetch operation.
- the cache node 20a may determine whether the data to be prefetched is stored in the read cache when it is determined that the write cache does not store the data to be prefetched.
- the cache node 20a may end the prefetch operation. Or, optionally, the cache node 20a may read the data from the read cache, and perform step S612 to return the data to the computing node 10a.
- the cache node 20a can determine whether the value of the object Key2 is stored in the L1 read cache 23, and if it is determined that there is, the prefetching operation can be terminated .
- the cache node 20a can determine whether the value of the object Key2 is stored in the L2 read cache 24, and if it is determined that there is, the prefetching operation can be terminated.
- step S608 when it is determined that the data to be prefetched is not stored in the read cache, the cache node 20a generates a data read request and sends the data read request to the storage node 30a.
- step S609 the storage node 30a reads data according to the data read request.
- step S610 the storage node 30a returns the read data to the cache node 20a.
- step S611 the cache node 20a stores the data returned by the storage node 30a into the read cache.
- the cache node 20a may store the returned object Key2 in the L1 read cache or the L2 read cache, and end the prefetching operation.
- the cache node 20a determines that Key1 corresponds to Key2 by converting the read request, so that the value of the variable Key2 can be read from the read cache, and the value of Key2 is used as The value of Key1 is returned to the computing node 10a without reading from the storage cluster, which shortens the user access delay.
- the cache node 20a may execute step S612 to return the prefetched data to the computing node.
- Fig. 8 is a structural diagram of a computing node provided by an embodiment of the present application, the computing node is used to execute the method shown in Fig. 4, Fig. 5 or Fig. 6, and the computing node includes:
- An obtaining unit 81 configured to obtain access information of the first application to the storage node within a preset period of time
- a determining unit 82 configured to determine information of prefetched data based on the access information
- a generating unit 83 configured to determine a cache node for prefetching the prefetch data according to the information of the prefetch data, and generate a prefetch request for prefetching the prefetch data;
- a sending unit 84 configured to send the prefetch request to the cache node.
- the determining unit 82 is specifically configured to determine the information of the prefetched data based on the access information according to a prefetch recommendation model.
- the access information includes the access information of the first user
- the determining unit 82 is specifically configured to determine, based on the access information of the first user, the prefetch recommendation model of the first user's An access mode, determining data to be prefetched according to the access mode.
- the aforementioned program can be stored in a computer-readable storage medium.
- the program executes all or part of the steps comprising the above-mentioned method embodiments; and the aforementioned storage medium includes: read-only memory (read-only memory, ROM), random-access memory (random-access memory, RAM)
- ROM read-only memory
- RAM random-access memory
- all or part of them may be implemented by software, hardware, firmware or any combination thereof.
- software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (e.g.
- the computer readable storage medium may be a computer Any available media that can be accessed or a data storage device such as a server, data center, etc. integrated with one or more available media.
- the available media can be magnetic media, (for example, floppy disks, hard disks, tapes), optical media, or Semiconductor media (such as solid state disk (Solid State Disk, SSD), etc.
- the disclosed devices and methods can be implemented in other ways without exceeding the scope of the present application.
- the above-described embodiments are only illustrative.
- the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may also be distributed to multiple network units .
- Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请实施例提供一种数据预取方法、计算节点和存储系统,所述方法包括:计算节点获取第一应用在预设时段内对存储节点的访问信息;计算节点基于所述访问信息确定预取数据的信息;计算节点根据所述预取数据的信息确定预取所述预取数据的缓存节点,并生成预取所述预取数据的预取请求;计算节点发送所述预取请求至所述缓存节点。所述缓存节点响应于所述预取请求执行对所述预取数据的预取操作。本申请实施例提供的方案提高了预取准确性。
Description
本申请要求于2021年9月23日提交中国专利局、申请号为202111117681.6、申请名称为“一种数据预取方法、计算节点和存储系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及计算机技术领域,具体涉及一种数据预取方法、计算节点和计算机系统。
存储系统通常包括相互连接的多个计算节点和多个存储节点,计算节点将生成的数据写入存储节点中,并从存储节点中读取数据。为了缩短计算节点到存储节点的数据访问路径,通常使用存储系统的内存存储计算节点写入或读取的数据,或者将数据从存储节点的主存储器预加载到存储系统中的内存中。随着数据量的迅速增长,出现了用于存储系统的全局缓存技术。通过使用全局缓存技术,可以对存储系统中的缓存资源(例如内存)进行统一命名,从而形成缓存池。每个计算节点可以将数据缓存到缓存池中的任一地址中。所述缓存池例如由多个存储节点中的缓存资源构成,或者可以由存储系统中包括的多个缓存节点中的缓存资源构成。以缓存节点为例,通常在每个缓存节点侧进行数据预取推荐,该方案的预取准确性较差。或者,在缓存节点中设置中心节点用于进行数据预取推荐,该方案导致预取时延较长,并且增加了网络通信成本。
发明内容
本申请实施例旨在提供一种数据预取方法、计算节点和存储系统,通过在计算节点侧进行预取数据推荐,提高了预取准确性,减少了网络通信成本。
为实现上述目的,本申请第一方面提供一种数据预取方法,所述方法包括:计算节点获取第一应用在预设时段内对存储节点的访问信息;计算节点基于所述访问信息确定预取数据的信息;计算节点根据所述预取数据的信息确定预取所述预取数据的缓存节点,并生成预取所述预取数据的预取请求;计算节点发送所述预取请求至所述缓存节点;所述缓存节点响应于所述预取请求执行对所述预取数据的预取操作。
通过由计算节点根据本地的访问信息确定预取数据的信息,提高了预取准确性,减少了网络通信成本。
在第一方面一种可能的实现方式中,所述计算节点基于所述访问信息确定预取数据的信息包括:所述计算节点根据预取推荐模型基于所述访问信息确定预取数据的信息。
通过由预取推荐模型确定预取数据的信息,提高了预取数据推荐的准确性和高效性。
在第一方面一种可能的实现方式中,所述预取推荐模型基于以下至少一种算法:聚类算法、时间序列预测算法、频繁模式挖掘算法、热点数据识别算法。
在第一方面一种可能的实现方式中,所述访问信息包括第一用户的访问信息,所述计算节点根据预取推荐模型基于所述访问信息确定预取数据的信息包括,所述预取推荐模型基于所述第一用户的访问信息确定所述第一用户的访问模式,根据所述访问模式确定待预取的数据。
在第一方面一种可能的实现方式中,所述预取请求为对数据块、文件数据或者对象数据的预取请求,所述方法还包括,所述缓存节点在从所述计算节点接收到对所述预取数据的预取请求之后,将所述预取请求转换为针对数据块、文件数据和对象数据统一设置的格式和语义。
通过将预取请求转换为统一的格式和语义,缓存节点只需要设置一种预取接口,避免维护多种协议的成本开销和操作复杂性。并且可以设置与不同的应用和不同的数据类型对应的全局缓存池,提高了缓存资源的利用率。
在第一方面一种可能的实现方式中,所述预取数据的信息包括所述预取数据的第一标识,所述将所述预取请求转换为针对数据块、文件数据和对象数据统一设置的格式和语义包括,将所述预取请求中的第一标识转换为符合预设格式的第二标识。
在第一方面一种可能的实现方式中,所述将所述预取请求中的第一标识转换为符合预设格式的第二标识包括,通过哈希算法将所述第一标识转换为所述第二标识。
在第一方面一种可能的实现方式中,所述缓存节点包括写缓存和读缓存,所述缓存节点响应于所述预取请求执行对所述预取数据的预取操作包括:所述缓存节点基于所述第二标识确定写缓存中是否存储有所述预取数据,在确定所述写缓存中存储有所述预取数据的情况中,将所述预取数据与所述第二标识对应地存储到所述读缓存中。
在第一方面一种可能的实现方式中,所述缓存节点响应于所述预取请求执行对所述预取数据的预取操作还包括:所述缓存节点在确定所述写缓存中未存储有所述预取数据的情况中,基于所述第二标识确定所述读缓存中是否存储有所述预取数据,在确定所述读缓存中未存储所述预取数据的情况中,基于所述第二标识生成数据读取请求,将所述数据读取请求发送给存储节点;所述存储节点根据所述数据读取请求读取所述预取数据,将所述预取数据返回给缓存节点;所述缓存节点将所述预取数据与所述第二标识对应地存储到所述读缓存中。
本申请第二方面提供一种存储系统,包括计算节点、缓存节点和存储节点,所述计算节点用于:获取第一应用在预设时段内对存储节点的访问信息;基于所述访问信息确定预取数据的信息;根据所述预取数据的信息确定预取所述预取数据的缓存节点,并生成预取所述预取数据的预取请求;发送所述预取请求至所述缓存节点;所述缓存节点用于响应于所述预取请求执行对所述预取数据的预取操作。
在第二方面一种可能的实现方式中,所述计算节点用于基于所述访问信息确定预取数据的信息具体包括:所述计算节点用于根据预取推荐模型基于所述访问信息确定预取数据的信息。
在第二方面一种可能的实现方式中,所述预取推荐模型基于以下至少一种算法:聚类算法、时间序列预测算法、频繁模式挖掘算法、热点数据识别算法。
在第二方面一种可能的实现方式中,所述访问信息包括第一用户的访问信息,所述计算节点用于根据预取推荐模型基于所述访问信息确定预取数据的信息包括,所述计算节点用于通过所述预取推荐模型基于所述第一用户的访问信息确定所述第一用户的访问模式,根据所述访问模式确定待预取的数据。
在第二方面一种可能的实现方式中,所述预取请求为对数据块、文件数据或者对象数据的预取请求,所述缓存节点还用于:在从所述计算节点接收到对所述预取数据的预取请求之后,将所述预取请求转换为针对数据块、文件数据和对象数据统一设置的格式和语义。
在第二方面一种可能的实现方式中,所述预取数据的信息包括所述预取数据的第一标识,所述缓存节点用于将所述预取请求转换为针对数据块、文件数据和对象数据统一设置的格式和语义包括,所述缓存节点用于将所述预取请求中的第一标识转换为符合预设格式的第二标识。
在第二方面一种可能的实现方式中,所述缓存节点用于将所述预取请求中的第一标识转换为符合预设格式的第二标识包括,所述缓存节点用于通过哈希算法将所述第一标识转换为所述第二标识。
在第二方面一种可能的实现方式中,所述缓存节点包括写缓存和读缓存,所述缓存节点用于响应于所述预取请求执行对所述预取数据的预取操作包括:所述缓存节点用于基于所述第二标识确定写缓存中是否存储有所述预取数据,在确定所述写缓存中存储有所述预取数据的情况中,将所述预取数据与所述第二标识对应地存储到所述读缓存中。
在第二方面一种可能的实现方式中,所述缓存节点用于响应于所述预取请求执行对所述预取数据的预取操作还包括:所述缓存节点用于在确定所述写缓存中未存储有所述预取数据的情况中,基于所述第二标识确定所述读缓存中是否存储有所述预取数据,在确定所述读缓存中未存储所述预取数据的情况中,基于所述第二标识生成数据读取请求,将所述数据读取请求发送给存储节点;所述存储节点还用于根据所述数据读取请求读取所述预取数据,将所述预取数据返回给缓存节点;所述缓存节点还用于将所述预取数据与所述第二标识对应地存储到所述读缓存中。
本申请第三方面提供一种数据预取方法,所述方法由计算节点执行,包括:获取第一应用在预设时段内对存储节点的访问信息;基于所述访问信息确定预取数据的信息;根据所述预取数据的信息确定预取所述预取数据的缓存节点,并生成预取所述预取数据的预取请求;发送所述预取请求至所述缓存节点。
在第三方面一种可能的实现方式中,所述基于所述访问信息确定预取数据的信息包括:根据预取推荐模型基于所述访问信息确定预取数据的信息。
在第三方面一种可能的实现方式中,所述预取推荐模型基于以下至少一种算法:聚类算法、时间序列预测算法、频繁模式挖掘算法、热点数据识别算法。
在第三方面一种可能的实现方式中,所述访问信息包括第一用户的访问信息,所述根据预取推荐模型基于所述访问信息确定预取数据的信息包括,所述预取推荐模型基于所述第一用户的访问信息确定所述第一用户的访问模式,根据所述访问模式确定待预取的数据。
本申请第四方面提供一种计算节点,包括处理器和存储器,所述存储器中存储有可执行计算机程序指令,所述处理器执行所述可执行计算机程序指令以实现第三方面及其可能的实现方式所述的方法。
本申请第五方面提供一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序指令,当所述计算机程序指令在计算机或处理器中执行时,使得所述计算机或处理器实现第三方面及其可能的实现方式所述的方法。
本申请第六方面提供一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机或处理器中运行时,使得所述计算机或处理器实现第三方面及其可能的实现方式所述的方法。
通过结合附图描述本申请实施例,可以使得本申请实施例更加清楚:
图1为本申请实施例提供的计算机系统的架构图;
图2为本申请实施例提供的计算节点和缓存节点的结构示意图;
图3为客户端适配层进行数据路由的方法示意图;
图4为本申请实施例提供的在存储系统中写入数据的方法流程图;
图5为本申请实施例提供的在存储系统中读数据的方法流程图;
图6为本申请实施例提供的在存储系统中预取数据的方法流程图;
图7为本申请实施例提供的用户访问模式的示意图;
图8为本申请实施例提供的一种计算节点的结构图。
下面将结合附图,对本申请实施例中的技术方案进行描述。
图1为本申请实施例提供的计算机系统的架构图。该计算机系统例如为存储系统,包括计算集群100、缓存集群200和存储集群300。其中,计算集群100中包括多个计算节点,图1中示意示出计算节点10a和计算节点10b。计算节点可通过应用程序或应用(Application,APP)向存储节点存取数据,因此又称为“应用服务器”。计算节点可以是物理机,也可以是虚拟机。物理机包括但不限于桌面电脑、服务器、笔记本电脑以及移动设备。
缓存集群200可以是独立的物理集群,也可以与存储集群300共用同一个集群(即部署在同一个集群中)。在缓存集群200与存储集群300同属一个集群的情况中,集群中的资源(如存储资源、计算资源等)被预先划分为用于进行缓存操作的资源和用于进行存储操作的资源。缓存集群200包括多个缓存节点,图中示意示出缓存节点20a、缓存节点20b和缓存节点20c。各个缓存节点通过网络相互连接。存储集群300包括多个存储节点,图中示意示出存储节点30a、存储节点30b和存储节点30c。其中,缓存节点和存储节点可以是物理机,也可以是虚拟机。
以缓存节点20a为例,缓存节点20a中包括处理器201、内存202和硬盘203。其中,处理器201是中央处理器(central processing unit,CPU),用于处理来自计算节点的操作请求或来自其他缓存节点的操作请求,也用于处理缓存节点内部生成的 请求。
内存202是指与处理器201直接交换数据的内部存储器,它可以随时读写数据,而且速度较快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存202包括至少两种存储器,例如内存既可以是随机存取存储器(Random Access Memory,RAM),也可以是只读存储器(Read Only Memory,ROM)。举例来说,RAM可包括动态随机存取存储器(Dynamic Random Access Memory,DRAM),或者存储级存储器(Storage Class Memory,SCM)等存储器。DRAM是一种半导体存储器,与大部分随机存取存储器(Random Access Memory,RAM)一样,属于一种易失性存储器(volatile memory)设备。SCM是一种同时结合传统储存装置与存储器特性的复合型储存技术,为非易失性存储器,SCM能够提供比硬盘更快速的读写速度,但运算速度上比DRAM慢,在成本上也比DRAM更为便宜。然而,DRAM和SCM在本实施例中只是示例性的说明,内存还可以包括其他随机存取存储器,例如静态随机存取存储器(Static Random Access Memory,SRAM)等。此外,可对内存202中的易失性存储器进行配置使其具有保电功能,从而在系统发生断电时,内存202中存储的数据不会丢失。具有保电功能的内存被称为非易失性存储器。
硬盘203用于提供非易失性存储资源,其访问时延通常高于内存,成本上低于内存。硬盘203例如可以为固态硬盘(Solid State Disk,SSD),或者机械存储硬盘(Hard Disk Drive,HDD)等。
为了适应当前海量的计算和存储需求,可通过聚合缓存集群200中多个缓存节点的存储资源(例如内存或硬盘),提供全局缓存池,从而各个计算节点中的应用可使用任一缓存节点中的缓存资源。其中,通常选择访问时延低于存储节点中硬盘的存储介质(例如RAM、SCM、SSD)作为缓存节点中的存储资源,从而相比于存储节点提供更快的数据访问速度。如此,当计算集群100的缓存需求增大时,可在缓存集群200中增加更多的缓存节点,从而横向扩展全局缓存池的容量。该全局缓存池对于各个计算节点提供统一的地址空间(或命名空间),计算节点可将数据路由到用于缓存该数据的缓存节点,避免由于对数据的重复缓存造成的数据冗余和一致性问题。另外,在全局缓存池中可通过多副本、复制、多活等技术实现数据的高可用。
计算节点(例如计算节点10a)可以将对存储集群的数据访问请求(读请求或者写请求)发送给与用于缓存该数据的缓存节点(例如缓存节点20a)。缓存节点20a中例如包括写缓存和读缓存。如果该数据访问请求为写请求,缓存节点20a可将该数据写入写缓存之后,对计算节点10a返回写入成功,之后再后台进行从写缓存到存储节点的写入,从而提高了对该写请求的反馈速度。如果该数据访问请求为读请求,缓存节点20a可首先确定写缓存中是否命中该数据,在确定写缓存中没有该数据的情况中,可确定读缓存中是否存储有该数据,如果读缓存中存储有该数据,缓存节点20a可直接从读缓存读取该数据,并将该数据返回给计算节点10a,从而不用去存储集群读取数据,缩短了数据读取路径,提高了对该读请求的反馈速度。
在另外一种结构中,也可以不用部署独立的缓存节点,而是利用存储节点中的存储资源例如内存、PCM等构成缓存池,以提供给计算节点中的应用使用。
一般,为了提高数据的访问效率,会提前将即将访问的数据预取至缓存池,这样, 提高了数据在缓存中的命中率,从而提升了数据的访问效率。数据的预取一般包括两个过程,一个是预取数据的推荐,另一个是将推荐的预取数据提前从存储节点读取至缓存池。
在相关技术中,一般在缓存集群侧进行数据的预取,而对于预取数据的推荐,一般有两种实施方式。在一种实施方式中,在每个缓存节点进行预取数据的推荐,从而缓存节点可根据该推荐进行对预取数据的预取。然而,由于计算节点中的同一应用产生的数据访问请求通常分散发给多个缓存节点,每个缓存节点只能获取到某个应用在最近预设时段内的所访问的部分数据的数据访问历史,并不能获知该应用所访问的全量数据的数据访问历史,其中,所述数据访问历史中包括应用请求访问的多个数据的标识及对应的访问时间。缓存节点只基于本节点的所处理的应用的部分数据的访问历史进行对预取数据的推荐,推荐的准确性较低,并浪费了预取带宽和缓存资源。
在另一种实施方式中,在缓存集群中将特定的缓存节点设置为用于进行数据预取的中心节点,该中心节点从其他缓存节点收集各个计算节点的各个应用的数据访问历史,从而可基于单个应用的完整的数据访问历史进行预取数据的推荐。其他缓存节点可将预取推荐请求发送给中心节点,从而从中心节点接收到预取推荐结果,并根据该预取推荐结果进行数据的预取。然而,这种数据预取方式增加了额外的网络通信,增加了通信成本,并可能导致预取不及时。
本申请实施例通过由各个计算节点进行预取数据推荐。计算节点可通过预取推荐模型基于计算节点中的单个应用在最近预设时段内的数据访问历史进行预取数据推荐,从而推荐准确率较高,并且未带来额外的网络通信,预取推荐的时延较低。
图2为本申请实施例提供的计算节点和缓存节点的结构示意图。图2中示出计算节点10a、缓存节点20a和存储节点30a作为示例。如图2所示,计算节点10a中可能安装了一个或多个应用,所述多个应用如图2中所示例如包括数据库、虚拟机(virtual machine,VM)、大数据、高性能计算(High Performance Computing,HPC)、人工智能(Artificial Intelligence,AI)等。这些应用可能使用由存储集群300提供的不同的数据服务。例如,该存储集群300为Ceph集群,Ceph集群为分布式文件系统,其在计算节点中部署Librados服务组件,以对计算节点中的各个应用提供块存储服务、对象存储服务、文件存储服务等服务。
在计算节点10a可部署客户端适配层11,该客户端适配层11可通过函数库的形式嵌入计算节点10a中部署的Librados服务组件中。从而,客户端适配层11可截获各个应用生成的对存储集群的数据访问请求,确定该数据访问请求的目标数据对应的缓存节点的标识,基于数据访问请求和对应的缓存节点的标识,生成发送给缓存节点的操作请求,并将该操作请求发送给对应的缓存节点20a(即缓存服务端)。该操作请求中例如包括操作类型、目的节点、原始数据访问请求等信息。缓存节点20a根据操作请求进行相应的操作并对客户端适配层11返回应答消息。客户端适配层11在收到服务端的应答消息后,对消息进行解析,并将解析结果返回给计算节点10a中的应用。
计算节点10a中还部署了数据分析服务(Data Analysis Service,DAS)模块12(下文称为DAS12)。DAS12用于向客户端适配层11注册消息服务,从而可从客户端 适配层11拉取用户的多个时刻的读写访问请求。DAS12中包括预取推荐模型,预取推荐模型基于用户的多个时刻的读写访问请求挖掘用户访问模式,据此进行预取数据推荐,并将推荐结果推送给客户端适配层11。客户端适配层11基于推荐结果生成数据预取请求,并发送给对应的缓存节点,以使得该缓存节点进行数据预取。
缓存节点20a中部署有服务端适配层21。服务端适配层21用于通过网络接收来自客户端适配层11的操作请求,该操作请求中例如包括操作类型、用户原始数据访问请求等信息。如上文所述,由于不同的应用使用不同的数据服务,该用户原始数据访问请求有可能具有不同的格式和语义。为此,服务端适配层21还用于对该原始数据访问请求进行统一的协议翻译和转换,从而将原始数据访问请求转换为具有统一格式和语义的数据访问请求。之后,服务端适配层21可根据操作类型调用操作接口进行请求处理,所述操作接口例如包括写接口、读接口、预取接口等。
如图2中所示,缓存节点20a中包括写缓存22、L1读缓存(即一级读缓存)23和L2读缓存(即二级读缓存)24。写缓存22中例如包括图1中内存202中RAM存储空间和硬盘203中的SSD存储空间,其中,RAM存储空间用于加速查询和刷盘,SSD存储空间用于进行对写入RAM的数据(即脏数据)进行保护。例如,在SSD存储空间中可以以多副本的形式存储写入的数据,从而保证脏数据的高可靠和故障场景下的高可用性。
所述L1读缓存主要采用小容量、高性能存储介质,如内存202中的DRAM、SCM等。L1读缓存作为读操作的统一入口,向上屏蔽二级读缓存的存在,避免引入管理和交互复杂性。L2读缓存主要采用大容量存储介质,接收一级读缓存淘汰的数据,L2读缓存可采用非易失性存储介质(例如SCM或SSD)。大容量的L2读缓存可以避免L1读缓存由于空间有限等场景导致预取不及时,或者大量热点数据被淘汰至L2读缓存导致的性能波动和下降。本发明全局缓存支持拓展三级或更多层次的缓存。
图2中的聚合模块25用于将写缓存22中存储的小数据量的数据聚合为大数据量的数据,之后由存储代理模块26将大数据量的数据写入存储集群300。
缓存节点20a中还包括集群管理模块27。缓存集群200通过各个缓存节点中的集群管理模块生成缓存集群的分区视图。具体是,缓存集群200中可设置用于进行集群管理的主节点,缓存集群中每个新上线的缓存节点通过集群管理模块向该主节点注册,从而主节点可获取各个缓存节点中的缓存资源的信息。主节点可根据预设算法将各个缓存节点中的缓存资源映射至各个分区(partition),并生成分区视图,该分区视图中包括各个缓存节点与各个分区的映射关系。其中,在多副本存储的场景下,一个分区可映射至多个缓存节点。主节点在生成分区视图之后,可将该分区视图发送给其它各个缓存节点。
所述缓存资源例如包括各个缓存节点中的写缓存、L1读缓存和L2读缓存等缓存资源。计算节点10a的客户端适配层11例如可从任一缓存节点(例如缓存节点20a)的集群管理模块27获取缓存集群200的分区视图。客户端适配层11在从应用截获数据访问请求之后,可根据预设规则从分区视图中确定处理该待访问的数据的的缓存节点。例如,客户端适配层可对待访问数据的键(Key)进行哈希,得到摘要,然后以该摘要对分区数取模,确定该数据对应的分区号,然后根据分区视图中该分区号对应的 至少一个缓存节点,确定该数据访问请求对应的缓存节点。
图3为客户端适配层进行数据路由的方法示意图。如图3所示,客户端适配层11在确定待访问的数据与分区pt0对应之后,根据图3中的分区视图可确定分区pt0与图3中的缓存节点20a、缓存节点20b、缓存节点20c对应,因此可将数据访问请求分别路由到缓存节点20a、缓存节点20b、缓存节点20c。其中,图3中的一个分区与三个缓存节点对应,表示对数据进行三副本存储,以增加可靠性。
下文中将结合图2描述本说明书实施例提供的写数据、读数据和预取数据的方法流程。
图4为本申请实施例提供的在存储系统中写入数据的方法流程图。图4所示方法可由存储系统中的计算节点、缓存节点和存储节点执行,下文中以计算节点10a、缓存节点20a和存储节点30a为例进行描述。
如图4所示,首先,在步骤S401,计算节点10a基于应用的数据写请求生成缓存节点写请求。
如上文参考图2所述,计算节点10a中可安装一个或多个应用,例如数据库应用,计算节点10a中还安装有客户端适配层11和DAS12。当数据库应用希望向存储集群写入数据时,生成数据写请求。其中,数据库应用可根据需要选择由存储集群提供的一种数据存储服务,例如块存储服务、对象存储服务或者文件存储服务等。在块存储服务的情况下,该数据写请求中例如包括数据的逻辑地址和待写入的数据,数据的逻辑地址例如包括逻辑单元号(logical unit number,LUN)、逻辑块地址(logical block address,LBA)和数据长度等信息,该数据的逻辑地址即相当于该数据的Key。在对象存储服务的情况下,该数据写请求中例如包括数据的对象名称和待写入的数据,该数据的对象名称即为该数据的Key。在文件存储服务的情况下,该数据写请求中例如包括文件数据的文件名、及该文件所在的目录路径,所述文件名和目录路径即相当于该数据的Key。也就是说,当不同的应用使用不同的数据服务时,其生成的数据访问请求(包括数据写请求、数据读请求等)中的Key的格式(如Key的形式、字节长短等)差异较大。同时,不同的应用在使用不同的数据服务时生成的数据访问请求中各个字段的属性(如字段长度、字段语义等)也可能不相同。另外,不同的应用所生成的数据访问请求的字段的属性也可能不同。
参考图2,数据库应用在生成上述数据写请求并发送给Librados组件之后,客户端适配层11可从Librados组件截获该数据写请求,并基于该数据写请求生成用于发送给缓存集群200的缓存节点写请求。具体是,客户端适配层11首先确定待写的数据例如与分区pt0对应,根据图3所示路由过程,客户端适配层11可确定该待写的数据应路由至缓存节点20a、20b和20c,从而客户端适配层11可生成分别发送给缓存节点20a、20b和20c的三个缓存节点写请求。下文以缓存节点20a为例进行描述,缓存节点20b和缓存节点20c的操作可参考缓存节点20a的操作。
在生成的发送给缓存节点20a的缓存节点写请求中例如包括:缓存节点20a的节点标识、操作类型(写请求类型)和初始的数据访问请求等信息。
在步骤S402,计算节点10a在生成缓存节点写请求之后,将缓存节点写请求发送给对应的缓存节点,例如缓存节点20a。
具体是,客户端适配层11在生成缓存节点写请求之后,可将缓存节点写请求发送给缓存节点20a中的服务端适配层21。
在步骤S403,缓存节点将缓存节点写请求转换为统一的格式和语义。
如上文所述,由于与不同的应用和/或不同的数据存储服务对应的数据访问请求具有不同的格式和/或语义,本申请实施例中通过服务端适配层21将缓存节点写请求转换为统一的格式和语义。该统一的格式和语义例如与存储集群中的一种数据存储服务相对应,例如对象存储服务,从而存储集群中可以仅提供一种数据存储服务。
具体是,该转换操作可包括将缓存节点写请求中的待写数据的Key(例如Key1)转换为预设长度。例如,假设所述预设长度为20字节,在应用使用的数据存储服务为块存储服务的情况下,数据的Key的长度通常小于或者等于20字节。在Key1小于20字节的情况下,服务端适配层21可将通过预设方式自适应地对该Key1增加字节,从而将该Key1的长度增加至20字节的Key2。在应用对应的数据存储服务为对象存储服务的情况下,对象名称的长度可能不固定,服务端适配层21可通过例如哈希算法将Key1映射到20字节的Key2。缓存节点20a中可通过数据表维护初始的Key与映射之后的Key的对应关系表。服务端适配层21在通过哈希算法将Key1映射为Key2之后,可基于该数据表确定是否存在哈希碰撞,在存在哈希碰撞的情况下,服务端适配层21可通过预设算法将Key1重新映射为不同的20字节的Key,并记录该哈希碰撞,以供查询。
通过如上所述将不同缓存节点写请求中的待写数据的Key转换为相同的预设长度,降低了缓存集群中的数据管理的复杂度,并节省了存储空间。
上述转换操作还包括将缓存节点写请求的语义转换为预设的语义。具体是,服务端适配层21根据预设的多个字段的长度、语义等属性对缓存节点写请求进行转换。
在经过上述转换处理之后,缓存节点对于与不同的应用和不同的数据存储服务对应的数据服务请求可采用统一的接口进行处理,因此,可以对于不同的应用创建统一的全局缓存池,提高了缓存资源的使用率。
在步骤S404,缓存节点执行写请求,写入数据至缓存节点。
如上文所述,缓存节点在执行写请求时,调用写接口将写请求中的数据写入缓存节点中的写缓存22中,写缓存22中例如包括RAM存储空间和SSD存储空间。服务端适配层21在将缓存节点写请求转换为统一的格式和语义之后,可调用缓存节点20a中设置的写接口,该写接口中包括的计算机代码在执行后用于进行数据缓存及向存储集群写入数据等一系列操作。在开始执行该写接口之后,根据该写接口,在写缓存22中与转换后的Key(例如Key3)对应地写入缓存节点写请求中请求写入的数据,该数据也即前述数据写请求中请求写入的数据。
具体是,在写缓存22中的SSD空间中以例如三副本的形式写入该数据,从而进行对该写入数据的保护,同时,在写缓存中的RAM空间中存储该数据,以加速对该数据的查询和刷盘(即到存储集群的存储)。
在步骤S405,缓存节点在写入数据之后,向计算节点返回写请求完成信息。
缓存节点20a在完成对写缓存的写入之后,可以马上向计算节点10a返回写请求完成信息,而不需要在将数据写到存储集群之后再返回写请求完成信息,缩短了反馈 时间,提高了系统效率。
缓存节点20a在如上所述写入数据之后,可确定写缓存中是否满足刷盘条件。该刷盘条件例如包括以下任一条件:写缓存中存储的数据达到预设水位;当前时间为预设刷盘时间(例如缓存节点的空闲时间);接收到来自业务人员的刷盘指令。在确定满足刷盘条件的情况下,缓存节点20a对写缓存RAM中存储的部分存储时间较早的数据进行重复数据删除、数据合并等处理,以用于进行向存储集群的存储。
可选地,在步骤S406,缓存节点20a对从写缓存接收的写缓存中的多个数据进行聚合。
具体是,缓存节点20a中的聚合模块25对从写缓存接收的写缓存中的多个数据进行聚合。
假设写缓存中待刷盘的多个数据为多个小对象,该小对象的数据量较小,例如具有8Kb的大小。该多个小对象中包括多个对老数据进行改写的新数据,这些老数据有可能分布在不同的存储集群中的不同的存储地址中,因此,如果直接将这些小对象分别单独地写入存储集群中,需要对前述不同的存储地址分别寻址,这将在存储集群中产生大量的随机数据写。存储集群中每进行一次随机数据写都需要在HDD中重新寻道、旋转磁盘,从而导致刷盘速度下降。另外,存储集群中的存储介质的数据存储速度通常比缓存集群中的缓存介质慢。如此,在大并发的场景下,对写缓存中的数据进行刷盘的速度跟不上写缓存中写入数据的速度,缓存节点20a的写缓存的容量容易被写满,从而导致应用数据不得不直接写入后端存储,如此写缓存将无法提供加速服务。
针对此问题,本申请实施例中通过由聚合模块25对写缓存中的多个数据进行聚合,将聚合之后的较大的数据写入存储集群,从而提高写入存储集群的速度。
具体是,聚合模块25可将写缓存中的例如1000个小对象聚合成一个8Mb的大对象,以用于顺序写入存储集群中。通过这样可将多次HDD的随机写操作转化为一次顺序写操作,即只需要一次寻道+旋转延迟时间,而不是1000次,从而提升了存储集群的数据写入速度。
聚合模块25在将多个小对象聚合为一个大对象之后,生成大对象的唯一的Key,并在图2中的元数据中记录该大对象的信息,其中包括该大对象包含的多个小对象的key,以及每个小对象存储在大对象中的偏移地址(offset)和数据长度(length)。聚合模块25可将元数据存储在内存中,并同时将该元数据以多副本的形式存储在非易失性介质(如SSD)中,并在每次在内存中更新元数据之后,同步更新SSD中的元数据。
聚合模块25在将多个小对象聚合为大对象之后,可将该大对象提供给存储代理模块26,以用于写入存储集群。
在步骤S407,缓存节点20a生成数据写请求。
具体地,缓存节点20a中的存储代理模块26在获取上述8Mb的大对象之后,根据预设的数据分配规则,确定该数据对应的存储节点(例如存储节点30a),并据此生成对该大对象的写请求。具体是,该写请求例如包括存储节点30a的标识、该大对象的Key和该大对象。
在不对小对象进行聚合的情况中,缓存节点20a可将各个小对象分别提供给存储 代理模块26,存储代理模块26可类似地相对于各个小对象分别生成数据写请求。
在步骤S408,缓存节点20a将生成的数据写请求发送给存储节点30a。
具体地,由存储代理模块26将生成的数据写请求发送给存储节点30a。
在步骤S409,存储节点30a在接收到数据写请求之后,写入相应的数据。
具体是,存储节点30a在接收到写请求之后,调用写接口进行对数据的写入。通过由存储代理模块26生成统一格式的数据写请求,该数据写请求例如具有对象存储服务的语义和格式,因此,存储节点30a处只需要设置与对象存储服务对应的写接口。可以理解,存储代理模块26不限于生成具有对象存储服务的语义和格式的写请求,而可以生成具有其它数据存储服务的语义和格式的写请求。
存储节点30a在完成对数据的写入之后,可向缓存节点20a返回写入成功信息。缓存节点20a在接收到写入成功信息之后,可以以写入的各个小对象的最新版本更新L1读缓存23和/或L2读缓存24中存储的各个小对象的老版本,以使得读缓存中存储的数据为最新版本。同时,缓存节点20a可删除写缓存中存储的已刷盘的数据。
聚合模块25在如上所述进行对小对象的聚合之后,当已经刷盘的大对象中的大部分小对象因被删除或者被修改而变为无效数据后,会导致存储集群中存储的大对象占用了较多无效空间。为此,聚合模块25可在空闲时间对无效空间校多的大对象进行回收。具体是,聚合模块25可通过存储代理模块26向存储集群请求读取该大对象中仍有效的小对象,并在读取完成之后向存储集群发送删除该大对象的请求,以完成对该大对象的回收。该回收的大对象中仍有效的小对象可再次聚合到新的大对象中再重新写入存储集群。具体是,可按照各个大对象中无效空间从多到少的顺序对多个大对象进行回收。聚合模块25在对大对象进行回收之后,相应地修改元数据。
图5为本申请实施例提供的在存储系统中读数据的方法流程图。图5所示方法可由存储系统中的计算节点、缓存节点和存储节点执行,下文中以计算节点10a、缓存节点20a和存储节点30a为例进行描述。
如图5所示,首先,在步骤S501,计算节点10a基于应用的数据读请求生成缓存节点读请求。
参考上文中对步骤S401的描述,当计算节点10a中的数据库应用希望从存储集群读数据(例如上述对象名称为Key1的对象)时,生成数据读请求,该数据读请求中包括待读取的对象的名称“Key1”。类似地,该数据读请求具有与该数据库应用和该应用使用的数据存储服务对应的格式和语义。
计算节点10a在生成上述数据读请求之后,客户端适配层11截获该数据读请求,并基于该数据读请求生成用于发送给缓存集群200的缓存节点读请求。具体是,客户端适配层11可类似地确定该待读的数据应路由至缓存节点20a、20b和20c,从而客户端适配层11可生成分别发送给缓存节点20a、20b和20c的三个缓存节点读请求。下文以缓存节点20a为例进行描述。
在生成的发送给缓存节点20a的缓存节点读请求中例如包括:缓存节点20a的节点标识、操作类型(读请求类型)和初始的数据读请求等信息。
在步骤S502,计算节点10a向缓存节点20a发送缓存节点读请求。
具体是,客户端适配层11在生成缓存节点读请求之后,可将缓存节点读请求发送 给缓存节点20a中的服务端适配层21。
在步骤S503,缓存节点20a将缓存节点读请求转换为统一的格式和语义。
该步骤可参考上文对步骤S403的描述,在此不再赘述。在经过该转换之后,该缓存节点读请求被转换为用于读取对象Key2。
在步骤S504,缓存节点20a确定本地缓存是否存储有待读的数据。
具体是,缓存节点20a调用读接口,确定本地缓存是否存储有待读的数据。
缓存节点20a在确定本地缓存中存储了待读的数据的情况中,可从本地缓存中读取数据,并执行步骤S508,向计算节点10a返回该读取的数据。
具体是,在执行读接口之后,缓存节点20a首先确定写缓存22的RAM存储空间中是否存储有对象Key2的值,在确定有的情况中,可读取该值并返回给计算节点10a。在写缓存22中未存储对象Key2的值的情况中,缓存节点20a可确定L1读缓存23中是否存储有对象Key2的值,在确定有的情况中,可读取该值并返回给计算节点10a。在L1读缓存23中未存储对象Key2的值的情况中,缓存节点20a可确定L2读缓存24中是否存储有对象Key2的值,在确定有的情况中,可读取该值并返回给计算节点10a。
在步骤S505,缓存节点20a在确定本地缓存中未存储待读的数据的情况中,生成数据读请求,并向存储节点30a发送该数据读请求。
在一种实施方式中,缓存节点20a可以生成用于读取对象Key2的数据读请求。
在另一种实施方式中,对于上述将小对象聚合为大对象的场景,缓存节点20a首先读取元数据,确定对象Key2对应于大对象Key3,并确定读Key2在对象Key3中的偏移地址和长度,然后生成数据读请求,该数据读请求中包括待读取的对象的名称“Key3”、以及待读的数据在对象Key3中的偏移地址和长度。
在步骤S506,存储节点30a读取数据。
存储节点30a在从缓存节点20a接收到数据读请求之后,读取对象Key3中的与所述偏移地址和长度对应的数据,从而读取对象Key2。
在步骤S507,存储节点30a向缓存节点20a返回读取的数据。
在步骤S508,缓存节点20a向计算节点10a返回读取的数据。
具体是,缓存节点20a通过服务端适配层21将key2转换成key1,并将从存储节点30a接收的key2的值作为key1的值返回给计算节点10a,从而计算节点10a将key1的值返回给应用。
图6为本申请实施例提供的在存储系统中预取数据的方法流程图。图6所示方法可由存储系统中的计算节点、缓存节点和存储节点执行,下文中以计算节点10a、缓存节点20a和存储节点30a为例进行描述。
如图6所示,首先,在步骤S601,计算节点10a获取应用在最近预设时段内的数据访问历史。
具体是,计算节点10a中的DAS12获取应用在最近预设时段内的数据访问历史。
如前文所述,DAS12可从客户端适配层11拉取应用的读写访问请求,从而可得到应用的用户在最近预设时段内的数据访问历史。该数据访问历史中例如包括用户在最近预设时段中读取或写入的数据标识、以及进行读取或写入的时刻信息。
在步骤S602,计算节点10a基于各个应用的数据访问历史推荐待预取的数据。
具体地,在本申请实施例中,由计算节点中的DAS12通过预取推荐模型对待预取的数据进行推荐。该预取推荐模型可使用多种算法。例如,预取推荐模型可以包括聚类模型,用于对用户的数据访问历史中的数据进行多维特征聚类,从而根据聚类结果进行数据预取推荐。该预取推荐模型中还可以包括时间序列预测模型,用于预测用户在下一个时刻访问的数据,从而根据预测结果进行数据预取推荐。该预取推荐模型还可以包括频繁模式挖掘、热点数据识别等算法。
预取推荐模型可基于多种算法,确定用户访问的模式,该模式例如包括流式模式、热点模式、关联模式、工作集关联模式等。图7示出各种用户访问模式的示意图。在图7中的各个坐标轴中,横轴例如表示时间,纵轴例如表示数据标识(即数据的Key)。
如图7所示,在流式模式中,用户访问的数据与时间呈线性关系,从而预取推荐模型可根据该关系预测用户在下一个时刻将要访问的数据作为推荐的预取数据,具体是,预取推荐模型输出推荐的预取数据的标识。在热点模式中,可预测不同时刻的热点数据,从而可根据该模式预测下一个时刻的热点数据作为推荐的预取数据。在关联模式(读读关联或写读关联)中,用户在下一个时段读取的数据与前一个时段读取或写入的数据关联,因此,可根据该模式预测用户在下一个时刻将要读取的数据作为推荐的预取数据。在工作机关联模式中,用户对一个数据表(例如表2)的访问与用户对另一个数据表(例如表1)的访问相关联,因此,可根据该模式预测用户在下一个时刻将要访问的数据作为推荐的预取数据。
其中,DAS12支持无状态部署,在计算节点10a故障或者进程重启后可以重新进行模式挖掘。或者,DAS12中也可以将预取推荐模型挖掘的访问模式写入持久化介质,在故障、重启、升级等事件后从持久化介质读取用户访问模式,从而实现快速预热。
预取推荐模型在预测推荐的预取数据的标识(例如Key1)之后,将推荐的预取数据的标识提供给客户端适配层11。
预取推荐模型只是本申请实施例的一种实现方式,其他可以对预取数据进行推荐的方式也包括在本申请实施例所公开的范围内。
在步骤S603,计算节点10a根据所推荐的预取数据生成数据预取请求。
与生成缓存节点读请求类似地,计算节点10a的客户端端适配层11根据推荐的预取数据的标识Key1确定对应的缓存节点(例如缓存节点Key2),从而生成数据预取请求。该数据预取请求中包括操作请求类型(预取类型)、缓存节点20a的标识、和待预取的数据的标识(Key2)。
在步骤S604,计算节点将数据预取请求发送给缓存节点20a。
具体是,客户端适配层11将数据预取请求发送给缓存节点20a中的服务端适配层21。
在步骤S605,缓存节点20a将数据预取请求转换为统一的格式和语义。
该步骤可参考上文对步骤S403的描述,在此不再赘述。在经过该转换之后,数据预取请求用于预取对象Key2的值。
在步骤S606,缓存节点20a确定写缓存是否存储有待预取的数据。
具体是,缓存节点20a首先调用预取接口,并在执行预取接口之后,缓存节点20a首先确定写缓存22的RAM存储空间中是否存储有对象Key2的值。缓存节点20a在确 定写缓存22中存储了待预取的数据的情况中,可从写缓存中读取该数据,并执行步骤S611,将该数据存储到L1读缓存或L2读缓存中,并结束本次预取操作。
在步骤S607,缓存节点20a在确定写缓存未存储有待预取的数据的情况中,可确定读缓存中是否存储有待预取的数据。
缓存节点20a在确定任一读缓存中存储了待预取的数据的情况中,可结束此次预取操作。或者,可选地,缓存节点20a可从读缓存中读取该数据,并执行步骤S612,向计算节点10a返回该数据。
具体是,在写缓存22中未存储对象Key2的值的情况中,缓存节点20a可确定L1读缓存23中是否存储有对象Key2的值,在确定有的情况中,可结束此次预取操作。在L1读缓存23中未存储对象Key2的值的情况中,缓存节点20a可确定L2读缓存24中是否存储有对象Key2的值,在确定有的情况中,可结束此次预取操作。
在步骤S608,缓存节点20a在确定读缓存中未存储待预取的数据的情况中,生成数据读请求,并向存储节点30a发送该数据读请求。在步骤S609,存储节点30a根据数据读请求读取数据。在步骤S610,存储节点30a向缓存节点20a返回读取的数据。步骤S608~S610可参考上文对步骤S505~S507的描述,在此不再赘述。
在步骤S611,缓存节点20a将存储节点30a返回的数据存入读缓存中。
缓存节点20a可将返回的对象Key2存储到L1读缓存或L2读缓存中,并结束此次预取操作。当计算节点10a发送对对象Key1的读取请求时,缓存节点20a通过对读请求的转换,确定Key1与Key2对应,从而可从读缓存中读取到变量Key2的值,并将Key2的值作为Key1的值返回给计算节点10a,而不需要从存储集群中读取,缩短了用户访问时延。
可选地,缓存节点20a在将对象Key的值存储到读缓存之后,可执行步骤S612,向计算节点返回该预取数据。
图8为本申请实施例提供的一种计算节点的结构图,所述计算节点用于执行图4、图5或图6所示的方法,所述计算节点包括:
获取单元81,用于获取第一应用在预设时段内对存储节点的访问信息;
确定单元82,用于基于所述访问信息确定预取数据的信息;
生成单元83,用于根据所述预取数据的信息确定预取所述预取数据的缓存节点,并生成预取所述预取数据的预取请求;
发送单元84,用于发送所述预取请求至所述缓存节点。
在一种实现方式中,所述确定单元82具体用于根据预取推荐模型基于所述访问信息确定预取数据的信息。
在一种实现方式中,所述访问信息包括第一用户的访问信息,所述确定单元82具体用于,所述预取推荐模型基于所述第一用户的访问信息确定所述第一用户的访问模式,根据所述访问模式确定待预取的数据。
需要理解,本文中的“第一”,“第二”等描述,仅仅为了描述的简单而对相似概念进行区分,并不具有其他限定作用。
本领域的技术人员可以清楚地了解到,本申请提供的各实施例的描述可以相互参照,为描述的方便和简洁,例如关于本申请实施例提供的各装置、设备的功能以及执 行的步骤可以参照本申请方法实施例的相关描述,各方法实施例之间、各装置实施例之间也可以互相参照。
本领域技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的全部或部分步骤;而前述的存储介质包括:只读存储器(read-only memory,ROM)、随机存取存储器(random-access memory,RAM)、磁盘或者光盘等各种可以存储程序代码的介质。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质、或者半导体介质(例如固态硬盘(Solid State Disk,SSD)等。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,在没有超过本申请的范围内,可以通过其他的方式实现。例如,以上所描述的实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
另外,所描述装置和方法以及不同实施例的示意图,在不超出本申请的范围内,可以与其它系统,模块,技术或方法结合或集成。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电子、机械或其它的形式。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。
Claims (24)
- 一种数据预取方法,其特征在于,所述方法包括:计算节点获取第一应用在预设时段内对存储节点的访问信息;计算节点基于所述访问信息确定预取数据的信息;计算节点根据所述预取数据的信息确定预取所述预取数据的缓存节点,并生成预取所述预取数据的预取请求;计算节点发送所述预取请求至所述缓存节点;所述缓存节点响应于所述预取请求执行对所述预取数据的预取操作。
- 根据权利要求1所述的方法,其特征在于,所述计算节点基于所述访问信息确定预取数据的信息包括:所述计算节点根据预取推荐模型基于所述访问信息确定预取数据的信息。
- 根据权利要求2所述的方法,其特征在于,所述预取推荐模型基于以下至少一种算法:聚类算法、时间序列预测算法、频繁模式挖掘算法、热点数据识别算法。
- 根据权利要求2或3所述的方法,其特征在于,所述访问信息包括第一用户的访问信息,所述计算节点根据预取推荐模型基于所述访问信息确定预取数据的信息包括,所述预取推荐模型基于所述第一用户的访问信息确定所述第一用户的访问模式,根据所述访问模式确定待预取的数据。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述预取请求为对数据块、文件数据或者对象数据的预取请求,所述方法还包括,所述缓存节点在从所述计算节点接收到对所述预取数据的预取请求之后,将所述预取请求转换为针对数据块、文件数据和对象数据统一设置的格式和语义。
- 根据权利要求5所述的方法,其特征在于,所述预取数据的信息包括所述预取数据的第一标识,所述将所述预取请求转换为针对数据块、文件数据和对象数据统一设置的格式和语义包括,将所述预取请求中的第一标识转换为符合预设格式的第二标识。
- 根据权利要求6所述的方法,其特征在于,所述将所述预取请求中的第一标识转换为符合预设格式的第二标识包括,通过哈希算法将所述第一标识转换为所述第二标识。
- 根据权利要求6或7所述的方法,其特征在于,所述缓存节点包括写缓存和读缓存,所述缓存节点响应于所述预取请求执行对所述预取数据的预取操作包括:所述缓存节点基于所述第二标识确定写缓存中是否存储有所述预取数据,在确定所述写缓存中存储有所述预取数据的情况中,将所述预取数据与所述第二标识对应地存储到所述读缓存中。
- 根据权利要求8所述的方法,其特征在于,所述缓存节点响应于所述预取请求执行对所述预取数据的预取操作还包括:所述缓存节点在确定所述写缓存中未存储有所述预取数据的情况中,基于所述第二标识确定所述读缓存中是否存储有所述预取数据,在确定所述读缓存中未存储所述预取数据的情况中,基于所述第二标识生成数据读取请求,将所述数据读取请求发送给存储节点;所述存储节点根据所述数据读取请求读取所述预取数据,将所述预取数据返回给缓存节点;所述缓存节点将所述预取数据与所述第二标识对应地存储到所述读缓存中。
- 一种存储系统,其特征在于,包括计算节点、缓存节点和存储节点,所述计算节点用于:获取第一应用在预设时段内对存储节点的访问信息;基于所述访问信息确定预取数据的信息;根据所述预取数据的信息确定预取所述预取数据的缓存节点,并生成预取所述预取数据的预取请求;发送所述预取请求至所述缓存节点;所述缓存节点用于响应于所述预取请求执行对所述预取数据的预取操作。
- 根据权利要求10所述的存储系统,其特征在于,所述计算节点用于基于所述访问信息确定预取数据的信息具体包括:所述计算节点用于根据预取推荐模型基于所述访问信息确定预取数据的信息。
- 根据权利要求11所述的存储系统,其特征在于,所述预取推荐模型基于以下至少一种算法:聚类算法、时间序列预测算法、频繁模式挖掘算法、热点数据识别算法。
- 根据权利要求11或12所述的存储系统,其特征在于,所述访问信息包括第一用户的访问信息,所述计算节点用于根据预取推荐模型基于所述访问信息确定预取数据的信息包括,所述计算节点用于通过所述预取推荐模型基于所述第一用户的访问信息确定所述第一用户的访问模式,根据所述访问模式确定待预取的数据。
- 根据权利要求10-13任一项所述的存储系统,其特征在于,所述预取请求为对数据块、文件数据或者对象数据的预取请求,所述缓存节点还用于:在从所述计算节点接收到对所述预取数据的预取请求之后,将所述预取请求转换为针对数据块、文件数据和对象数据统一设置的格式和语义。
- 根据权利要求14所述的存储系统,其特征在于,所述预取数据的信息包括所述预取数据的第一标识,所述缓存节点用于将所述预取请求转换为针对数据块、文件数据和对象数据统一设置的格式和语义包括,所述缓存节点用于将所述预取请求中的第一标识转换为符合预设格式的第二标识。
- 根据权利要求15所述的存储系统,其特征在于,所述缓存节点用于将所述预取请求中的第一标识转换为符合预设格式的第二标识包括,所述缓存节点用于通过哈希算法将所述第一标识转换为所述第二标识。
- 根据权利要求15或16所述的存储系统,其特征在于,所述缓存节点包括写缓存和读缓存,所述缓存节点用于响应于所述预取请求执行对所述预取数据的预取操作包括:所述缓存节点用于基于所述第二标识确定写缓存中是否存储有所述预取数据,在确定所述写缓存中存储有所述预取数据的情况中,将所述预取数据与所述第二标识对应地存储到所述读缓存中。
- 根据权利要求17所述的存储系统,其特征在于,所述缓存节点用于响应于所述预取请求执行对所述预取数据的预取操作还包括:所述缓存节点用于在确定所述写缓存中未存储有所述预取数据的情况中,基于所述第二标识确定所述读缓存中是否存储有所述预取数据,在确定所述读缓存中未存储所述预取数据的情况中,基于所述第二标识生成数据读取请求,将所述数据读取请求发送给存储节点;所述存储节点还用于根据所述数据读取请求读取所述预取数据,将所述预取数据返回给缓存节点;所述缓存节点还用于将所述预取数据与所述第二标识对应地存储到所述读缓存中。
- 一种数据预取方法,其特征在于,所述方法由计算节点执行,包括:获取第一应用在预设时段内对存储节点的访问信息;基于所述访问信息确定预取数据的信息;根据所述预取数据的信息确定预取所述预取数据的缓存节点,并生成预取所述预取数据的预取请求;发送所述预取请求至所述缓存节点。
- 根据权利要求19所述的方法,其特征在于,所述基于所述访问信息确定预取数据的信息包括:根据预取推荐模型基于所述访问信息确定预取数据的信息。
- 根据权利要求20所述的方法,其特征在于,所述预取推荐模型基于以下至少一种算法:聚类算法、时间序列预测算法、频繁模式挖掘算法、热点数据识别算法。
- 根据权利要求20或21所述的方法,其特征在于,所述访问信息包括第一用户的访问信息,所述根据预取推荐模型基于所述访问信息确定预取数据的信息包括,所述预取推荐模型基于所述第一用户的访问信息确定所述第一用户的访问模式,根据所述访问模式确定待预取的数据。
- 一种计算节点,其特征在于,包括处理器和存储器,所述存储器中存储有可执行计算机程序指令,所述处理器执行所述可执行计算机程序指令以实现权利要求19-22任意一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计 算机程序指令,当所述计算机程序指令在计算机或处理器中执行时,使得所述计算机或处理器执行权利要求19-22中任一项的所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22871553.8A EP4386567A1 (en) | 2021-09-23 | 2022-07-06 | Data pre-fetching method, and computing node and storage system |
US18/613,698 US20240264773A1 (en) | 2021-09-23 | 2024-03-22 | Data Prefetching Method, Computing Node, and Storage System |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111117681.6 | 2021-09-23 | ||
CN202111117681.6A CN115858409A (zh) | 2021-09-23 | 2021-09-23 | 一种数据预取方法、计算节点和存储系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/613,698 Continuation US20240264773A1 (en) | 2021-09-23 | 2024-03-22 | Data Prefetching Method, Computing Node, and Storage System |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023045492A1 true WO2023045492A1 (zh) | 2023-03-30 |
Family
ID=85652389
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/104124 WO2023045492A1 (zh) | 2021-09-23 | 2022-07-06 | 一种数据预取方法、计算节点和存储系统 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240264773A1 (zh) |
EP (1) | EP4386567A1 (zh) |
CN (1) | CN115858409A (zh) |
WO (1) | WO2023045492A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116055429A (zh) * | 2023-01-17 | 2023-05-02 | 杭州鸿钧微电子科技有限公司 | 基于pcie的通信数据处理方法、装置、设备及存储介质 |
CN117666963B (zh) * | 2023-12-13 | 2024-07-19 | 湖南承希科技有限公司 | 一种cpu云计算平台的数据io加速方法 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7716425B1 (en) * | 2006-09-27 | 2010-05-11 | Hewlett-Packard Development Company, L.P. | Prefetching data in distributed storage systems |
CN103795781A (zh) * | 2013-12-10 | 2014-05-14 | 西安邮电大学 | 一种基于文件预测的分布式缓存模型 |
-
2021
- 2021-09-23 CN CN202111117681.6A patent/CN115858409A/zh active Pending
-
2022
- 2022-07-06 WO PCT/CN2022/104124 patent/WO2023045492A1/zh active Application Filing
- 2022-07-06 EP EP22871553.8A patent/EP4386567A1/en active Pending
-
2024
- 2024-03-22 US US18/613,698 patent/US20240264773A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7716425B1 (en) * | 2006-09-27 | 2010-05-11 | Hewlett-Packard Development Company, L.P. | Prefetching data in distributed storage systems |
CN103795781A (zh) * | 2013-12-10 | 2014-05-14 | 西安邮电大学 | 一种基于文件预测的分布式缓存模型 |
Non-Patent Citations (2)
Title |
---|
LIAO JIANWEI, TRAHAY FRANCOIS, XIAO GUOQIANG, LI LI, ISHIKAWA YUTAKA: "Performing Initiative Data Prefetching in Distributed File Systems for Cloud Computing", IEEE TRANSACTIONS ON CLOUD COMPUTING, vol. 5, no. 3, 27 March 2017 (2017-03-27), pages 550 - 562, XP093052429, DOI: 10.1109/TCC.2015.2417560 * |
XIA YUAN , HE YING-SI: "Push Data Prefetching Scheme in a Distributed File System", JOURNAL OF SOUTHWEST CHINA NORMAL UNIVERSITY(NATURAL SCIENCE EDITION), vol. 43, no. 5, 20 May 2018 (2018-05-20), pages 101 - 105, XP093052426, ISSN: 1000-5471, DOI: 10.13718/j.cnki.xsxb.2018.05.017 * |
Also Published As
Publication number | Publication date |
---|---|
EP4386567A1 (en) | 2024-06-19 |
US20240264773A1 (en) | 2024-08-08 |
CN115858409A (zh) | 2023-03-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11467967B2 (en) | Managing a distributed cache in a cloud-based distributed computing environment | |
US10657101B2 (en) | Techniques for implementing hybrid flash/HDD-based virtual disk files | |
US8499121B2 (en) | Methods and apparatus to access data in non-volatile memory | |
US11562091B2 (en) | Low latency access to physical storage locations by implementing multiple levels of metadata | |
US9946569B1 (en) | Virtual machine bring-up with on-demand processing of storage requests | |
US9141529B2 (en) | Methods and apparatus for providing acceleration of virtual machines in virtual environments | |
US9348842B2 (en) | Virtualized data storage system optimizations | |
TWI549060B (zh) | Access methods and devices for virtual machine data | |
US8788628B1 (en) | Pre-fetching data for a distributed filesystem | |
CN114860163B (zh) | 一种存储系统、内存管理方法和管理节点 | |
WO2023045492A1 (zh) | 一种数据预取方法、计算节点和存储系统 | |
CN106775446B (zh) | 基于固态硬盘加速的分布式文件系统小文件访问方法 | |
CN107203411B (zh) | 一种基于远程ssd的虚拟机内存扩展方法及系统 | |
US11652883B2 (en) | Accessing a scale-out block interface in a cloud-based distributed computing environment | |
US20150067283A1 (en) | Image Deduplication of Guest Virtual Machines | |
WO2019085769A1 (zh) | 一种数据分层存储、分层查询方法及装置 | |
WO2019061352A1 (zh) | 数据加载方法及装置 | |
US11016676B2 (en) | Spot coalescing of distributed data concurrent with storage I/O operations | |
EP4016312B1 (en) | Data operations using a cache table in a file system | |
US11853574B1 (en) | Container flush ownership assignment | |
EP4318257A1 (en) | Method and apparatus for processing data, reduction server, and mapping server | |
US11586353B2 (en) | Optimized access to high-speed storage device | |
CN115495412A (zh) | 一种查询系统及装置 | |
US20240319876A1 (en) | Caching techniques using a unified cache of metadata leaf objects with mixed pointer types and lazy content resolution | |
CN116490847A (zh) | 支持分布式文件系统中的垃圾收集的虚拟数据复制 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22871553 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022871553 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022871553 Country of ref document: EP Effective date: 20240315 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |