WO2012109879A1 - 多节点系统中数据缓存的方法、装置及系统 - Google Patents
多节点系统中数据缓存的方法、装置及系统 Download PDFInfo
- Publication number
- WO2012109879A1 WO2012109879A1 PCT/CN2011/077994 CN2011077994W WO2012109879A1 WO 2012109879 A1 WO2012109879 A1 WO 2012109879A1 CN 2011077994 W CN2011077994 W CN 2011077994W WO 2012109879 A1 WO2012109879 A1 WO 2012109879A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cache
- file
- area
- cache area
- thread
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0864—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0871—Allocation or management of cache space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0888—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0842—Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/22—Employing cache memory using specific memory technology
- G06F2212/222—Non-volatile memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/26—Using a specific storage system architecture
- G06F2212/261—Storage comprising a plurality of storage devices
Definitions
- the invention belongs to the field of data processing, and in particular relates to a method, device and system for data buffering in a multi-node system.
- Cache can speed up the read and write rate of data, thereby improving the input/output (I/O) performance of the system.
- Solid State Drive Solid State Disk
- SSD Solid State Drive
- the SSD cache and the disk are group-connected. As shown in Figure 1, the data of each disk area is cached to the corresponding SSD group (the data of the corresponding color area is cached to the corresponding SSD). Area), when the application needs to read the data of the white area in the disk, first retrieve whether the data has been cached in the white area of the SSD.
- the prior art retains some commonly used data through the SSD cache, reduces the number of disk seeks, and improves the efficiency of data reading.
- the four regions of different colors in the SSD cache represent nodes 1, 2, 3, 4, respectively, in order.
- the mapping mode between the SSD cache and the disk the data in the white area of the disk can only be cached in the cache area of the node 1. Even if the system recognizes that the data in the white area of the disk is in affinity with the node 2, it cannot be cached to the node. 2 cache area, which increases the system's remote access overhead and reduces the efficiency of the cache.
- Embodiments of the present invention provide a method for data caching in a multi-node system.
- an affinity node can be identified.
- Embodiments of the present invention also provide an apparatus for data caching in a multi-node system.
- Embodiments of the present invention also provide a system including a data cache device in the multi-node system.
- a method for data caching in a multi-node system comprising:
- Each of the sub-areas is divided into a thread cache area and a global cache area, and the thread cache area is mapped with the disk array in a fully associated mapping manner, and the global cache area and the disk array adopt a group phase Mapping by means of mapping;
- a method for mapping data caches in a multi-node system comprising:
- Each of the sub-areas is divided into a thread cache area and a global cache area, and the thread cache area is mapped with the disk array in a fully associated mapping manner, and the global cache area and the disk array adopt a group phase The mapping method of the association is established.
- the embodiment of the present invention further provides an apparatus for data caching in a multi-node system, where the system includes at least a cache medium and a disk array, and the apparatus includes:
- a first dividing unit sound configured to divide a buffer area in the cache medium into a plurality of sub-areas, each sub-area corresponding to one node in the system;
- mapping unit configured to divide each of the sub-areas into a thread cache area and a global cache area, where the thread cache area is mapped with the disk array in a fully associated mapping manner, the global cache area and the The disk array establishes a mapping by using a group-associated mapping manner;
- a detecting unit configured to detect a reading frequency of the file when the process reads the file
- control unit configured to cache the file to the thread cache area if a read frequency of the file is greater than a first threshold, and a size of the file does not exceed a second threshold; If the read frequency of the file is greater than a first threshold and the size of the file exceeds a second threshold, the file is cached to the global cache area.
- a device for mapping data caches in a multi-node system comprising at least a cache medium and a disk array, the device comprising:
- a first dividing unit sound configured to divide a buffer area in the cache medium into a plurality of sub-areas, each sub-area corresponding to one node in the system;
- mapping unit configured to divide each of the sub-areas into a thread cache area and a global cache area, where the thread cache area is mapped with the disk array in a fully associated mapping manner, the global cache area and the The disk array establishes a mapping by using a group-associated mapping manner;
- Embodiments of the present invention also provide a system including a data cache device in the multi-node system.
- the embodiment of the present invention divides the cache area of each node into a thread cache area and a global cache area, and the thread cache area and the disk array adopt a fully associated mapping manner.
- the global cache area and the disk array are grouped in a mapping manner, thereby intelligently solving the problem that the existing single cache area cannot identify the affinity node, reducing the remote access overhead of the system, and improving the efficiency of data access.
- the utilization of cache space and the hit rate of file reading are improved.
- FIG. 1 is a schematic diagram of a data caching method in an existing multi-node system
- FIG. 2 is a flowchart of implementing a data caching method in a multi-node system according to Embodiment 1 of the present invention
- Embodiment 3 is a schematic diagram of a fully connected mapping manner in Embodiment 1 of the present invention.
- FIG. 4 is a flowchart of implementing a data caching method in a multi-node system according to Embodiment 2 of the present invention.
- FIG. 5 is a schematic diagram of process-aware cache mapping provided by Embodiment 2 of the present invention.
- FIG. 6 is a flowchart of implementing a data caching method in a multi-node system according to Embodiment 3 of the present invention.
- FIG. 7 is a structural diagram of a data cache apparatus in a multi-node system according to Embodiment 4 of the present invention.
- FIG. 8 is a structural diagram of a data cache apparatus in a multi-node system according to Embodiment 5 of the present invention.
- the thread cache area and the disk array adopt a fully associated mapping manner, and the global cache area and the The disk array adopts a group-associated mapping mode, thereby cleverly solving the problem that the existing single-buffer area cannot identify the affinity node, reducing the remote access overhead of the system, and improving the efficiency of data access.
- the utilization of cache space and the hit rate of file reading are improved.
- Embodiment 1 is a diagrammatic representation of Embodiment 1:
- FIG. 2 is a flowchart showing an implementation process of a data caching method in a multi-node system according to Embodiment 1 of the present invention. The process is detailed as follows:
- step S201 the cache area in the cache medium is divided into a plurality of sub-areas, each sub-area corresponding to one of the nodes in the system.
- the architecture of the multi-node system includes, but is not limited to, non-uniform memory access (Non Uniform Memory) Access
- the Achitecture (NUMA) architecture includes at least a cache medium and a disk array, wherein the cache medium includes but is not limited to an SSD, and any cache medium having a read/write speed greater than a disk but smaller than a memory may be used.
- the multi-node system includes a plurality of nodes, each of which has its own independent processor.
- the multi-node system divides the cache area in the cache medium into a plurality of sub-areas, each sub-area corresponding to one of the nodes in the system.
- the manner of division includes but is not limited to uniform division. For example, suppose that the multi-node system includes node 1 and node 2, node 1 includes processor 1, and node 2 includes processor 2, and the multi-node system divides the cache area in the SSD cache medium in the system into node 1 evenly.
- node 2 the time when processor 1 accesses the cache area in node 1 is a, the time of accessing the cache area in node 2 is b, and the time when processor 2 accesses the cache area in node 2 is a, and access node 1
- the time of the cache area is b, where b is greater than a, that is, the time when the processor accesses the cache area in the node is less than the time of accessing the cache area of the remote node, and node 1 is the pro of processor 1 with respect to processor 1.
- the node the memory affinity between the node containing the processor and one or more nodes on which the memory is mounted decreases as the level of hardware separation increases.
- the buffer area in the cache medium when the buffer area in the cache medium is divided into nodes, it may also be divided according to the performance of the node processor, for example, the performance of the processor in the node 1 is higher than that in the node 2.
- the performance of the device the cache area obtained in node 1 will be larger than the cache area divided in node 2.
- each of the sub-areas is divided into a thread cache area and a global cache area, and the thread cache area is mapped with the disk array in a fully associated mapping manner, the global cache area and the The disk array uses a group-associated mapping to establish a mapping.
- the sub-area divided by each node is further divided into two areas, one is a thread cache area, and the other is a global cache area.
- the manner of division is not limited to uniform division.
- the thread cache area is mapped with the disk array in a fully associated mapping manner
- the global cache area is mapped with the disk array in a group-associated mapping manner.
- the fully associative mapping mode is shown in Figure 3: the number of blocks in the buffer area is C, the number of blocks in the disk array is M, and the mappings of all associated lines are C ⁇ M. Any block in the disk array can be mapped to any of the cache areas.
- the block number of the disk array is stored in the tag of the block.
- the address of the data in the disk array (block number + address within the block) is first searched in the cache, and the block number is compared with the tags of all the blocks in the cache area. If the hit, the block number and the address in the block constitute the address of the access buffer area. If it does not hit, the data is read from the disk array according to the address.
- the use of a fully associative mapping method improves the utilization of cache space and the hit rate of access data.
- the associative mapping of groups is shown in Figure 1: The disk array and the cache area are divided into blocks according to the same size. The disk array and the cache area are divided into groups according to the same size. The disk array is an integer multiple of the cache capacity, and the disk array space is divided into areas according to the size of the cache area, in the disk array. The number of groups in each zone is the same as the number of groups in the cache area.
- the group number of the disk array and the cache area should be equal, that is, a certain block in each area can only be stored in the space of the same group number of the cache area, but the address of each block in the group Can be stored at will, That is, the direct mapping mode is adopted from the group of the disk array to the group of the buffer area, and the full associative mapping mode is adopted inside the two corresponding groups.
- any one of the cache regions can be mapped, and the disk array is analyzed by analyzing the hardware separation level between the nodes including the processor and the cache area.
- the data block is mapped to the cache block of its corresponding affinity node, so that the problem that the existing cache cannot identify the affinity node by using the group connection mapping manner can be effectively solved, the remote access overhead of the system is reduced, and the data access is improved. s efficiency.
- step S203 when the process reads the file, the read frequency of the file is detected.
- the system automatically records the reading frequency of the file, and when the process reads the file, the reading frequency of the file is detected.
- step S204 it is determined whether the read frequency of the file is greater than a preset first threshold. If the determination result is “Yes”, step S206 is performed, and if the determination result is “No”, step S205 is performed.
- the first threshold may be preset according to the performance of the system processor and/or the size of the cache space. For example, when the system has high processing performance and large cache space, the threshold can be set lower to improve data reading efficiency.
- step S205 when the read frequency of the file is less than or equal to a preset first threshold, the file is stored to the disk array.
- step S206 when the read frequency of the file is greater than the preset first threshold, it is determined whether the size of the file exceeds a preset second threshold. If the determination result is “Yes”, step S207 is performed. If the result of the determination is "NO”, then step S208 is performed.
- step S207 when the read frequency of the file is greater than a preset first threshold, the file is a frequently accessed file. In order to utilize the cache space more fully and effectively, it is necessary to continue to judge the size of the file.
- a preset second threshold for example, 512K
- step S207 when the size of the file exceeds a preset second threshold, the file is cached to the global cache area.
- the group-associated mapping mode reduces the collision probability of the block and improves the utilization of the block, but the hit rate of the group connection is relatively low, so To improve the hit rate, cache large files to the global cache area.
- step S208 when the size of the file does not exceed (less than or equal to) the preset second threshold, the file is cached to the thread cache area.
- the thread cache area and the disk array adopt a fully-connected mapping mode, because the hit rate of the all-associated mapping mode access data is high, so a smaller file is cached to the thread cache area. Moreover, the file is cached into a cache block of its corresponding affinity node, thereby improving the efficiency of file reading.
- the data blocks in the disk array are mapped into the cache blocks of the corresponding affinity nodes. Therefore, the problem that the existing cache cannot directly identify the affinity node by using the group connection mapping method is solved, the remote access overhead of the system is reduced, and the efficiency of data access is improved.
- the size of the read file when the size of the file exceeds a preset second threshold, the file is cached to the global cache area, which reduces the collision probability of the block and improves the utilization of the block. And the hit rate.
- the file is cached to a thread cache area of the file affinity node, which improves the utilization of the cache space and the hit rate of the file, and improves the hit rate. The efficiency of file reading.
- Embodiment 2 is a diagrammatic representation of Embodiment 1:
- FIG. 4 is a flowchart showing an implementation process of a data caching method in a multi-node system according to Embodiment 2 of the present invention. The process of the method is as follows:
- step S401 the buffer area in the cache medium is divided into a plurality of sub-areas, and each sub-area corresponds to one of the nodes in the system.
- the specific implementation process is as described in step S201 of the first embodiment. Let me repeat.
- step S402 the sub-area of each node is divided into a thread cache area and a global cache area, and the thread cache area is mapped with the disk array in a fully associated mapping manner, and the global cache area is The disk array is configured in a group-associated mapping manner.
- the specific implementation process is as described in step S202 of the first embodiment, and details are not described herein again.
- step S403 the thread cache area is divided into a plurality of small areas, and each process allocates a corresponding small area in a thread cache area of its affinity node, and each small area is divided into a private cache area and a shared cache area. .
- the thread cache area is divided into a plurality of small thread cache areas, and each small thread cache area is further divided into two areas, one is a private cache area, and one is a shared cache area, as shown in the figure. 5 is shown.
- the private cache area is used to cache the private data of the process, and the data in the private cache area can only be accessed by the process, for example, parameters set in a game process.
- the shared cache area is used to cache shared data of a process, and data in the shared cache area can be accessed by other processes.
- each small thread cache area is assigned a unique number, and the shared cache area is connected in series by using a doubly linked list.
- the scheduler in the multi-node system adopts the process scheduling principle (each process has the greatest affinity with its own node) for each process. A corresponding small area is allocated on the allocation of the thread cache area of the affinity node.
- the scheduler tries to get the process running on the processor it was running on, ensuring cache space utilization.
- the process migration of multiple processors between the same node does not affect the node affinity of the process.
- step S404 when the process reads the file, the reading frequency of the file is detected.
- the specific implementation process is as described in step S203 of the first embodiment, and details are not described herein again.
- step S405 it is determined whether the read frequency of the file is greater than a preset first threshold. If the determination result is “Yes”, step S407 is performed, and if the determination result is “No”, step S406 is performed, where specific The implementation process is as described in step S204 of the first embodiment, and details are not described herein again.
- step S406 when the read frequency of the file is less than or equal to the preset first threshold, the file is stored in the disk array, and the specific implementation process is as described in step S205 of the first embodiment. This will not be repeated here.
- step S407 when the read frequency of the file is greater than the preset first threshold, it is determined whether the size of the file exceeds a preset second threshold. If the determination result is “Yes”, step S408 is performed. If the result of the determination is No, step S409 is performed, and the specific implementation process is as described in step S206 of the first embodiment, and details are not described herein again.
- step S408 when the size of the file exceeds a preset second threshold, the file is cached to the global cache area, and the specific implementation process is as described in step S208 of the first embodiment. Let me repeat.
- step S409 when the size of the file does not exceed the preset second threshold, it is determined whether the file is a private file of the process. If the determination result is YES, step S410 is performed, if the result is determined. If it is "No", then step S411 is performed.
- each file has a unique identifier, and it is determined by the unique identifier of the file whether the file is a private file of the current process.
- step S410 when the file is a private file of the process, the file is cached to a private cache area corresponding to the small area of the process.
- step S411 when the file is not a private file of the process, the file is cached to a shared cache area corresponding to the small area of the process.
- the shared file is cached to the shared cache area, which facilitates file management and improves file search efficiency.
- unnecessary frequent reading and writing operations of private files are reduced, the write overhead of the cache medium is reduced, and system I/O performance is improved.
- Embodiment 3 is a diagrammatic representation of Embodiment 3
- FIG. 6 is a flowchart showing an implementation process of a data caching method in a multi-node system according to Embodiment 3 of the present invention. This embodiment adds Step S612 to Embodiment 2.
- step S612 when the execution of the process ends, the cache area allocated to the process is released.
- the system allocates a unified size cache space when allocating space for each process, which is easy. Causing a waste of cache space. Moreover, at the end of the execution of the process, the prior art releases the entire block of cache space allocated, and the present invention releases the small cache space allocated to the process, thereby avoiding frequent read and write operations on other unallocated small cache spaces. The cost.
- Embodiment 4 is a diagrammatic representation of Embodiment 4:
- FIG. 7 is a block diagram showing the structure of a data cache device in a multi-node system according to Embodiment 4 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown.
- the data cache device in the multi-node system may be a software unit, a hardware unit or a combination of hardware and software running in each application system.
- the data buffering apparatus 700 in the multi-node system includes a first dividing unit 71, a mapping unit 72, a detecting unit 73, and a control unit 74.
- a mapping unit 72 maps a mapping of a mapping into a mapping into a mapping into a mapping into a mapping into a mapping into a mapping into a mapping into a mapping into a mapping into a mapping into a mapping into a mapping into a mapping unit 73
- detecting unit 73 a detecting unit 73
- control unit 74 the specific functions of each unit are as follows:
- a first dividing unit 71 configured to divide a cache area in the cache medium into a plurality of sub-areas, each sub-area corresponding to one node in the system;
- the mapping unit 72 is configured to divide the sub-area of each node into a thread cache area and a global cache area, where the thread cache area is mapped with the disk array in a fully associated mapping manner, where the global cache area is The disk array establishes a mapping by using a group mapping manner;
- the detecting unit 73 is configured to detect a reading frequency of the file when the process reads the file;
- the control unit 74 is configured to cache the file to the thread cache area when the read frequency of the file is greater than a first threshold, and the size of the file does not exceed a second threshold;
- the file is cached to the global cache area when the read frequency of the file is greater than a first threshold and the size of the file exceeds a second threshold.
- the data caching apparatus in the multi-node system provided in this embodiment may be used in the data caching method in the foregoing corresponding multi-node system.
- the data caching apparatus in the multi-node system may be used in the data caching method in the foregoing corresponding multi-node system.
- Embodiment 5 is a diagrammatic representation of Embodiment 5:
- FIG. 8 is a block diagram showing the structure of a data cache device in a multi-node system according to Embodiment 5 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown.
- the data cache device in the multi-node system may be a software unit, a hardware unit or a combination of hardware and software running in each application system.
- the data buffering apparatus 800 in the multi-node system includes a first dividing unit 81, a mapping unit 82, a second dividing unit 83, a detecting unit 84 storage unit 85, and a control unit 86.
- the specific functions of each unit are as follows:
- a first dividing unit 81 configured to divide a cache area in the cache medium into a plurality of sub-areas, each sub-area corresponding to one node in the system;
- the mapping unit 82 is configured to divide the sub-area of each node into a thread cache area and a global cache area, where the thread cache area is mapped with the disk array in a fully associated mapping manner, where the global cache area is The disk array establishes a mapping by using a group mapping manner;
- a second dividing unit 83 configured to divide the thread cache area into a plurality of small areas, and allocate, for each process, a corresponding small area in a thread cache area of the affinity node, and each small area is divided into a private cache area and a shared cache area, where the private cache area is used to cache private data for the process.
- the shared cache area is used to cache shared data of the process;
- the detecting unit 84 is configured to detect a reading frequency of the file when the process reads the file;
- the storage unit 85 is configured to store the file to the disk array when a read frequency of the file is less than or equal to a preset first threshold;
- the control unit 86 is configured to cache the file to the thread cache area when the read frequency of the file is greater than a first threshold, and the size of the file does not exceed a second threshold; The read frequency of the file is greater than a first threshold, and the size of the file exceeds a second threshold, Caching the file to the global cache area.
- the control unit 86 further includes a third determining module 861, a private cache module 862, and a shared cache module 863:
- the first determining module 861 is configured to determine, when the size of the file does not exceed a preset second threshold, whether the file is a private file of the process;
- the private cache module 862 is configured to cache the file to a private cache area corresponding to the small area of the process when the file is a private file of the process;
- the shared cache module 863 is configured to cache the file to a shared cache area corresponding to the small area of the process when the file is not a private file of the process.
- the data cache device in the multi-node system provided in this embodiment may be used in the data caching method in the corresponding multi-node system.
- the data cache device in the multi-node system may be used in the data caching method in the corresponding multi-node system.
- Narration refer to the related descriptions of the second and third embodiments of the data caching method in the multi-node system. Narration.
- the data cache apparatus in the multi-node system further includes a release unit 87, and the release unit 87 is configured to release the cache area allocated to the process when the process execution ends.
- the data caching apparatus in the multi-node system provided in this embodiment may be used in the data caching method in the corresponding multi-node system.
- the data caching apparatus in the multi-node system may be used in the data caching method in the corresponding multi-node system.
- the data blocks in the disk array are mapped into the cache blocks of the corresponding affinity nodes. Therefore, the problem that the existing cache cannot directly identify the affinity node by using the group connection mapping method is solved, the remote access overhead of the system is reduced, and the efficiency of data access is improved.
- the size of the read file when the size of the file exceeds a preset second threshold, the file is cached to the global cache area, which reduces the collision probability of the block and improves the utilization of the block. And the hit rate.
- the file When the size of the file does not exceed a preset second threshold, the file is cached to a thread cache area of the file affinity node, which improves the utilization of the cache space and the hit rate of the file, and improves the hit rate.
- the efficiency of file reading In addition, by caching the private file to the private cache area, the shared file is cached to the shared cache area, which facilitates file management and improves file search efficiency.
- the data caching method in the multi-node system provided by the embodiment of the present invention may be completed in whole or in part by hardware related to program instructions. For example, it can be done by computer running.
- the program can be stored in a readable storage medium such as a random access memory, a magnetic disk, an optical disk, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
本发明适用于数据处理领域,提供了一种多节点系统中数据缓存的方法、装置及系统,所述方法包括:将缓存介质的缓存区域划分为多个子区域,每个子区域与系统的一个节点对应;将每个子区域划分为线程缓存区域和全局缓存区域,线程缓存区域与磁盘阵列采用全相联方式建立映射,全局缓存区域与磁盘阵列采用组相联方式建立映射;在进程读取文件时,检测文件的读取频率;在文件的读取频率大于第一阈值且文件的大小未超过第二阈值将文件缓存至线程缓存区域;在文件的读取频率大于第一阈值且文件的大小超过第二阈值,将文件缓存至全局缓存区域。本发明可有效解决现有单缓存区域无法识别亲和节点的问题,减少系统的远程访问开销,提高系统的I/O性能。
Description
本发明属于数据处理领域,尤其涉及多节点系统中数据缓存的方法、装置及系统。
随着计算机技术的发展,计算机系统也变的越来越复杂,现有的计算机系统可能包含多个模块化的节点。在多节点计算机系统中缓存(Cache)设计的好坏是影响系统性能的重要因素,通过Cache可以加快数据的读写速率,进而提高系统的输入/输出(I/O)性能。
固态硬盘(Solid State
Disk,SSD)由于其优异的读写性能,被作为一种缓存介质广泛的应用于内存和磁盘之间。
在现有的应用中,SSD缓存和磁盘之间采用组相连的映射方式,如图1所示,每一块磁盘区域的数据缓存到相对应的SSD组(对应颜色区域的数据缓存到对应的SSD区域),当应用程序需要读取磁盘中白色区域的数据时,首先在SSD的白色区域中检索是否已缓存该数据。现有技术通过SSD缓存来保留一些常用的数据,减少了磁盘寻道的次数,提升了数据读取的效率。
然而,在将SSD缓存应用到多节点的计算机系统中时,如图1,假设SSD缓存中不同颜色的四个区域按顺序分别代表节点1、2、3、4。根据SSD缓存和磁盘之间组相连的映射方式,磁盘白色区域的数据只能缓存到节点1的缓存区域,即使系统识别出磁盘白色区域的数据与节点2亲和,也无法将其缓存到节点2的缓存区域,从而增加了系统的远程访问开销,降低了缓存的效率。
本发明实施例提供一种多节点系统中数据缓存的方法,在多节点计算机系统中进行数据缓存时,可以识别亲和节点。
本发明实施例还提供一种多节点系统中数据缓存的装置。
本发明实施例还提供一种包括所述多节点系统中数据缓存装置的系统。
一种多节点系统中数据缓存的方法,该系统中至少包括缓存介质和磁盘阵列,所述方法包括:
将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;
将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射;
在进程读取文件时,检测所述文件的读取频率;
如果所述文件的读取频率大于第一阈值,并且所述文件的大小未超过第二阈值将所述文件缓存至所述线程缓存区域;
如果所述文件的读取频率大于第一阈值,并且所述文件的大小超过第二阈值, 将所述文件缓存至所述全局缓存区域。
一种多节点系统中数据缓存的映射方法,该系统中至少包括缓存介质和磁盘阵列,所述方法包括:
将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;
将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射。
本发明实施例还提供一种多节点系统中数据缓存的装置,该系统中至少包括缓存介质和磁盘阵列,所述装置包括:
第一划分单元音,用于将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;
映射单元,用于将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射;
检测单元,用于在进程读取文件时,检测所述文件的读取频率;
控制单元,用于如果所述文件的读取频率大于第一阈值,并且所述文件的大小未超过第二阈值将所述文件缓存至所述线程缓存区域;
如果所述文件的读取频率大于第一阈值,并且所述文件的大小超过第二阈值, 将所述文件缓存至所述全局缓存区域。
一种多节点系统中数据缓存的映射装置,该系统中至少包括缓存介质和磁盘阵列,所述装置包括:
第一划分单元音,用于将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;
映射单元,用于将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射;
本发明实施例还提供一种包括所述多节点系统中数据缓存装置的系统。
从上述方案中可以看出,本发明实施例通过将每个节点的缓存区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式,从而巧妙的解决了现有单缓存区域无法识别亲和节点的问题,减少了系统的远程访问开销,提高了数据访问的效率。同时,通过将不同大小的文件缓存到不同的缓存区域中,提高了缓存空间的利用率和文件读取的命中率。
图1为现有多节点系统中数据缓存方法的示意图;
图2是本发明实施例一提供的多节点系统中数据缓存方法的实现流程图;
图3是本发明实施例一中全相连映射方式的示意图;
图4是本发明实施例二提供的多节点系统中数据缓存方法的实现流程图;
图5是本发明实施例二提供的进程感知的缓存映射示意图;
图6是本发明实施例三提供的多节点系统中数据缓存方法的实现流程图;
图7是本发明实施例四提供的多节点系统中数据缓存装置的组成结构图;
图8是本发明实施例五提供的多节点系统中数据缓存装置的组成结构图。
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
在本发明实施例中,通过将每个节点的缓存区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式,从而巧妙的解决了现有单缓存区域无法识别亲和节点的问题,减少了系统的远程访问开销,提高了数据访问的效率。同时,通过将不同大小的文件缓存到不同的缓存区域中,提高了缓存空间的利用率和文件读取的命中率。
为了说明本发明所述的技术方案,下面通过具体实施例来进行说明。
实施例一:
图2示出了本发明实施例一提供的多节点系统中数据缓存方法的实现流程,该方法过程详述如下:
在步骤S201中,将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应。
在本实施例中,多节点系统的架构包括但不局限于非一致性内存访问(Non Uniform Memory
Access
Achitecture,NUMA)架构,该系统中至少包括缓存介质和磁盘阵列,其中,所述缓存介质包括但不局限于SSD,任何读写速度大于磁盘但小于内存的缓存介质都可以。
在本实施例中,多节点系统包括多个节点,每个节点都有各自独立的处理器。多节点系统将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应。其中,划分的方式包括但不局限于均匀划分。例如:假设多节点系统中包括节点1和节点2,节点1中包括处理器1,节点2中包括处理器2,多节点系统将系统中的SSD缓存介质中的缓存区域平均的划分给节点1和节点2,处理器1访问节点1内的缓存区域的时间为a,访问节点2内的缓存区域的时间为b,处理器2访问节点2内的缓存区域的时间为a,访问节点1内的缓存区域的时间为b,其中b大于a,即处理器访问本节点内的缓存区域的时间小于访问远程节点缓存区域的时间,节点1相对于处理器1来说,是处理器1的亲和节点,包含处理器的节点与其上安装了存储器的一个或者多个节点之间的存储器亲和性随着硬件分离级别的增加而减少。
作为本发明的一个实施例,在将缓存介质中的缓存区域划分给各节点时,还可以根据节点处理器的性能来划分,例如:在节点1中的处理器的性能高于节点2中处理器的性能,则节点1中划分得到的缓存区域将大于节点2中划分得到的缓存区域。
在步骤S202中,将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射。
在本实施例中,将每个节点划分得到的子区域再划分为两块区域,一块为线程缓存区域,另一块为全局缓存区域。其中,划分的方式不局限于均匀划分。所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射。
全相联的映射方式如图3所示:缓存区域的块数为C,磁盘阵列的块数为M,全相联的映射共有C×M种。磁盘阵列中的任意一块可以映射到缓存区域中的任意一块。
若磁盘阵列中的数据已存入缓存区域的某块,则在该块的标记中存放磁盘阵列的块号。当处理器要访问该数据时,给出该数据在磁盘阵列的地址(块号+块内地址),先在缓存中查找,将所述块号与所有缓存区域中块的标记进行比较,若命中,则该块号与块内地址组成访问缓存区域的地址,若不命中,则根据该地址,从磁盘阵列中读取该数据。采用全相联的映射方式提高了缓存空间的利用率和访问数据的命中率。
组相联的映射方式如图1所示:
磁盘阵列与缓存区域按同样大小划分成块,磁盘阵列与缓存区域按同样大小将块划分成组,磁盘阵列是缓存容量的整数倍,将磁盘阵列空间按缓存区域的大小分成区,磁盘阵列中每一区的组数与缓存区域的组数相同。当磁盘阵列的数据调入缓存区域时,磁盘阵列与缓存区域的组号应相等,也就是各区中的某一块只能存入缓存区域的同组号的空间内,但组内各块地址之间则可以任意存放,
即从磁盘阵列的组到缓存区域的组之间采用直接映象方式,在两个对应的组内部采用全相联映象方式。
当进行数据访问时,先根据组号,在目录表中找到该组所包含的各块的目录,然后将被访问数据的磁盘阵列区号和组内块号,与本组内各块的目录同时进行比较。如果比较相等,则命中。如果比较不相等,说明没命中,所访问的数据块尚没有进入缓存,则进行组内替换。
通过组相联的映射方式降低了块的冲突概率,提高了块的利用率。包含处理器的节点与其上安装了存储器的一个或者多个节点之间的存储器亲和性随着硬件分离级别的增加而减少。
在本实施例中,根据全相联映射方式中磁盘阵列中的任意一块可以映射到缓存区域中任意一块的特性,通过分析包含处理器和缓存区域的节点之间的硬件分离级别,将磁盘阵列中的数据块映射到其对应的亲和节点的缓存块中,从而可以有效解决现有缓存单纯采用组相连映射方式无法识别亲和节点的问题,减少了系统的远程访问开销,提高了数据访问的效率。
在步骤S203中,在进程读取文件时,检测所述文件的读取频率。
在本实施例中,系统会自动记录文件的读取频率,在进程读取文件时,检测所述文件的读取频率。
在步骤S204中,判断所述文件的读取频率是否大于预设的第一阈值,如果判断结果为“是”,则执行步骤S206,如果判断结果为“否”,则执行步骤S205。
在本实施例中,所述第一阈值可以根据系统处理器的性能和/或缓存空间的大小来预先设置。例如:在系统处理性能高、缓存空间大时,可以将阈值设置低一些,提高数据的读取效率。
在步骤S205中,在所述文件的读取频率小于或者等于预设的第一阈值时,将所述文件存储至所述磁盘阵列。
在步骤S206中,在所述文件的读取频率大于预设的第一阈值时,判断所述文件的大小是否超过预设的第二阈值,如果判断结果为“是”,则执行步骤S207,如果判断结果为“否”,则执行步骤S208。
在本实施例中,在所述文件的读取频率大于预设的第一阈值时,说明所述文件为频繁访问的文件。为了更充分、有效的利用缓存空间,需要继续判断所述文件的大小,在所述文件的大小超过预设的第二阈值(例如:512K)时,执行步骤S207,否则执行步骤S208。
在步骤S207中,当所述文件的大小超过预设的第二阈值时,将所述文件缓存至所述全局缓存区域。
在本实施例中,因为全局缓存区域与磁盘阵列采用组相连映射方式,组相联的映射方式降低了块的冲突概率,提高了块的利用率,但组相连的命中率相对较低,所以为了提高命中率,将较大的文件缓存至全局缓存区域。
在步骤S208中,当所述文件的大小未超过(小于或者等于)预设的第二阈值时,将所述文件缓存至所述线程缓存区域。
在本实施例中,线程缓存区域与磁盘阵列采用全相连映射方式,因为全相联的映射方式访问数据的命中率较高,所以将较小的文件缓存至所述线程缓存区域。而且,所述文件是缓存到其对应亲和节点的缓存块中,从而提高了文件读取的效率。
在本发明实施例中,根据全相联映射方式中磁盘阵列中的任意一块可以映射到缓存区域中任意一块的特性,将磁盘阵列中的数据块映射到其对应的亲和节点的缓存块中,从而解决现有缓存单纯采用组相连映射方式无法识别亲和节点的问题,减少了系统的远程访问开销,提高了数据访问的效率。同时,通过判断读取文件的大小,在所述文件的大小超过预设的第二阈值时,将所述文件缓存至所述全局缓存区域,降低了块的冲突概率,提高了块的利用率和命中率。在所述文件的大小未超过预设的第二阈值时,将所述文件缓存至所述文件亲和节点的线程缓存区域,既提高了缓存空间的利用率和文件的命中率,又提高了文件读取的效率。
实施例二:
图4示出了本发明实施例二提供的多节点系统中数据缓存方法的实现流程,该方法过程详述如下:
在步骤S401中,将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应,其具体实施过程如实施例一的步骤S201所述,在此不再赘述。
在步骤S402中,将每个节点所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射,其具体实施过程如实施例一的步骤S202所述,在此不再赘述。
在步骤S403中,将所述线程缓存区域划分为多个小区域,为每个进程在其亲和节点的线程缓存区域分配对应的小区域,每个小区域划分为私有缓存区域和共享缓存区域。
在本实施例中,将所述线程缓存区域划分为多个小的线程缓存区域,每个小的线程缓存区域再划分为两块区域,一块为私有缓存区域,一块为共享缓存区域,如图5所示。所述私有缓存区域用于缓存进程的私有数据,该私有缓存区域中的数据只能被该进程程访问,例如:某游戏进程中设置的参数等。所述共享缓存区域用于缓存进程的共享数据,该共享缓存区域中的数据可以被其他进程访问。
在本实施例中,为了方便查找和统一管理,为每个小线程缓存区域分配一个唯一编号,所述共享缓存区域采用双向链表的方式串联连接。
在本实施例中,为了节省数据的访问时间,降低节点间的通信负担,多节点系统中的调度器通过进程调度原则(每个进程与其自身节点具有最大的亲和性),为每个进程在亲和节点的线程缓存区域分配一个上分配一个对应的小区域。在进程被重新调入时,调度器会尽量让该进程在其之前运行的处理器上运行,从而保证缓存空间的利用率。
在本实施例中,同一个节点间多个处理器的进程迁移并不影响该进程的节点亲和性。
在步骤S404中,在进程读取文件时,检测所述文件的读取频率,其具体实施过程如实施例一的步骤S203所述,在此不再赘述。
在步骤S405中,判断所述文件的读取频率是否大于预设的第一阈值,如果判断结果为“是”,则执行步骤S407,如果判断结果为“否”,则执行步骤S406,其具体实施过程如实施例一的步骤S204所述,在此不再赘述。
在步骤S406中,在所述文件的读取频率小于或者等于预设的第一阈值时,将所述文件存储至所述磁盘阵列,其具体实施过程如实施例一的步骤S205所述,在此不再赘述。
在步骤S407中,在所述文件的读取频率大于预设的第一阈值时,判断所述文件的大小是否超过预设的第二阈值,如果判断结果为“是”,则执行步骤S408,如果判断结果为“否”,则执行步骤S409,其具体实施过程如实施例一的步骤S206所述,在此不再赘述。
在步骤S408中,当所述文件的大小超过预设的第二阈值时,将所述文件缓存至所述全局缓存区域,,其具体实施过程如实施例一的步骤S208所述,在此不再赘述。
在步骤S409中,当所述文件的大小未超过预设的第二阈值时,判断所述文件是否为所述进程的私有文件,如果判断结果为“是”,则执行步骤S410,如果判断结果为“否”,则执行步骤S411。
在本实施例中,每个文件都存在一个唯一的标识符,通过所述文件的唯一的标识符来判断所述文件是否为当前进程的私有文件。
在步骤S410中,当所述文件为所述进程的私有文件时,将所述文件缓存至所述进程对应小区域的私有缓存区域。
在步骤S411中,当所述文件不是所述进程的私有文件时,将所述文件缓存至所述进程对应小区域的共享缓存区域。
在本发明实施例中,通过将私有文件缓存至私有缓存区域,共享文件缓存至共享缓存区域,方便了文件的管理,提高了文件的查找效率。另外,通过对私有文件的管理,减少了私有文件不必要的频繁读取与写入操作,降低了缓存介质的写开销,提高了系统I/O性能。
实施例三:
图6示出了本发明实施例三提供的多节点系统中数据缓存方法的实现流程,该实施例是在实施二的基础上增加了步骤S612。
在步骤S612中,在所述进程执行结束时,释放分配给所述进程的缓存区域。
在本实施例中,为了节省缓存空间,在所述进程执行结束时,释放分配给所述进程的缓存区域。
在本发明实施例中,由于现有技术没有对缓存区域进行细分,也没有采用多种存储空间映射方式,导致系统在为每个进程分配空间时,都是分配统一大小的缓存空间,容易造成缓存空间的浪费。而且,在进程执行结束时,现有技术会释放分配的一整块缓存空间,而本发明是释放分配给进程的小缓存空间,从而避免了对其他未分配的小缓存空间的频繁读写操作所带来的开销。
实施例四:
图7示出了本发明实施例四提供的多节点系统中数据缓存装置的组成结构,为了便于说明,仅示出了与本发明实施例相关的部分。
该多节点系统中数据缓存装置可以是运行于各应用系统内的软件单元、硬件单元或者软硬件相结合的单元。
该多节点系统中数据缓存装置700包括第一划分单元71、映射单元72、检测单元73以及控制单元74。其中,各单元的具体功能如下:
第一划分单元71,用于将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;
映射单元72,用于将每个节点所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射;
检测单元73,用于在进程读取文件时,检测所述文件的读取频率;
控制单元74,用于在所述文件的读取频率大于第一阈值,并且所述文件的大小未超过第二阈值将所述文件缓存至所述线程缓存区域;
在所述文件的读取频率大于第一阈值,并且所述文件的大小超过第二阈值, 将所述文件缓存至所述全局缓存区域。
本实施例提供的多节点系统中数据缓存装置可以使用在前述对应的多节点系统中数据缓存方法中,详情参见上述多节点系统中数据缓存方法实施例一、二、三的相关描述,在此不再赘述。
实施例五:
图8示出了本发明实施例五提供的多节点系统中数据缓存装置的组成结构,为了便于说明,仅示出了与本发明实施例相关的部分。
该多节点系统中数据缓存装置可以是运行于各应用系统内的软件单元、硬件单元或者软硬件相结合的单元。
该多节点系统中数据缓存装置800包括第一划分单元81、映射单元82、第二划分单元83、检测单元84存储单元85以及控制单元86。其中,各单元的具体功能如下:
第一划分单元81,用于将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;
映射单元82,用于将每个节点所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射;
第二划分单元83,用于将所述线程缓存区域划分为多个小区域,为每个进程在其亲和节点的线程缓存区域分配对应的小区域,每个小区域划分为私有缓存区域和共享缓存区域,其中私有缓存区域用于缓存进程的私有数据,
共享缓存区域用于缓存进程的共享数据;
检测单元84,用于在进程读取文件时,检测所述文件的读取频率;
存储单元85,用于在所述文件的读取频率小于或者等于预设的第一阈值时,将所述文件存储至所述磁盘阵列;
控制单元86,用于在所述文件的读取频率大于第一阈值,并且所述文件的大小未超过第二阈值将所述文件缓存至所述线程缓存区域;
在所述文件的读取频率大于第一阈值,并且所述文件的大小超过第二阈值,
将所述文件缓存至所述全局缓存区域。其中,所述控制单元86还包括第三判断模块861、私有缓存模块862以及共享缓存模块863:
所述第一判断模块861用于在所述文件的大小未超过预设的第二阈值时,判断所述文件是否为所述进程的私有文件;
所述私有缓存模块862用于当所述文件为所述进程的私有文件时,将所述文件缓存至所述进程对应小区域的私有缓存区域;
所述共享缓存模块863用于当所述文件不是所述进程的私有文件时,将所述文件缓存至所述进程对应小区域的共享缓存区域。
本实施例提供的多节点系统中数据缓存装置可以使用在前述对应的多节点系统中数据缓存方法中,详情参见上述多节点系统中数据缓存方法实施例二、三的相关描述,在此不再赘述。
作为本发明的另一实施例,该多节点系统中数据缓存装置还包括一释放单元87,所述释放单元87用于在所述进程执行结束时,释放分配给所述进程的缓存区域。
本实施例提供的多节点系统中数据缓存装置可以使用在前述对应的多节点系统中数据缓存方法中,详情参见上述多节点系统中数据缓存方法实施例三的相关描述,在此不再赘述。
在本发明实施例中,根据全相联映射方式中磁盘阵列中的任意一块可以映射到缓存区域中任意一块的特性,将磁盘阵列中的数据块映射到其对应的亲和节点的缓存块中,从而解决现有缓存单纯采用组相连映射方式无法识别亲和节点的问题,减少了系统的远程访问开销,提高了数据访问的效率。同时,通过判断读取文件的大小,在所述文件的大小超过预设的第二阈值时,将所述文件缓存至所述全局缓存区域,降低了块的冲突概率,提高了块的利用率和命中率。在所述文件的大小未超过预设的第二阈值时,将所述文件缓存至所述文件亲和节点的线程缓存区域,既提高了缓存空间的利用率和文件的命中率,又提高了文件读取的效率。另外,通过将私有文件缓存至私有缓存区域,共享文件缓存至共享缓存区域,方便了文件的管理,提高了文件的查找效率。通过对私有文件的管理,减少了私有文件不必要的频繁读取与写入操作,降低了缓存介质的写开销,提高了系统I/O性能。
本发明实施例提供的多节点系统中数据缓存方法,其全部或部分步骤是可以通过程序指令相关的硬件来完成。比如可以通过计算机运行程来完成。该程序可以存储在可读取存储介质,例如,随机存储器、磁盘、光盘等。
Claims (18)
- 一种多节点系统中数据缓存的方法,该系统中至少包括缓存介质和磁盘阵列,其特征在于,所述方法包括:将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射;在进程读取文件时,检测所述文件的读取频率;如果所述文件的读取频率大于第一阈值,并且所述文件的大小未超过第二阈值将所述文件缓存至所述线程缓存区域; 如果所述文件的读取频率大于第一阈值,并且所述文件的大小超过第二阈值, 将所述文件缓存至所述全局缓存区域。
- 如权利要求1所述的方法,其特征在于,所述方法还包括:将所述线程缓存区域划分为多个小区域,为每个进程在其亲和节点的线程缓存区域分配对应的小区域,每个小区域划分为私有缓存区域和共享缓存区域,其中私有缓存区域用于缓存进程的私有数据, 共享缓存区域用于缓存进程的共享数据。
- 如权利要求2所述的方法,其特征在于,所述共享缓存区域采用双向链表的方式串联连接。
- 如权利要求2所述的方法,其特征在于,当所述文件的大小未超过第二阈值时,所述方法还包括:判断所述文件是否为所述进程的私有文件;当所述文件为所述进程的私有文件时,将所述文件缓存至所述进程对应小区域的私有缓存区域;当所述文件不是所述进程的私有文件时,将所述文件缓存至所述进程对应小区域的共享缓存区域。
- 如权利要求1所述的方法,其特征在于,所述方法还包括:在所述进程执行结束时,释放分配给所述进程的缓存区域。
- 如权利要求1所述的方法,其特征在于,所述方法还包括:在所述文件的读取频率小于或者等于第一阈值时,将所述文件存储至所述磁盘阵列。
- 一种多节点系统中数据缓存的映射方法,该系统中至少包括缓存介质和磁盘阵列,其特征在于,所述方法包括:将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射。
- 如权利要求7所述的方法,其特征在于,所述方法还包括将所述线程缓存区域划分为多个小区域,为每个进程在其亲和节点的线程缓存区域分配对应的小区域,每个小区域划分为私有缓存区域和共享缓存区域,其中私有缓存区域用于缓存进程的私有数据, 共享缓存区域用于缓存进程的共享数据。
- 如权利要求8所述的方法,其特征在于,所述共享缓存区域采用双向链表的方式串联连接。
- 一种多节点系统中数据缓存的装置,该系统中至少包括缓存介质和磁盘阵列,其特征在于,所述装置包括:第一划分单元,用于将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;映射单元,用于将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射;检测单元,用于在进程读取文件时,检测所述文件的读取频率;控制单元,用于在所述文件的读取频率大于第一阈值,并且所述文件的大小未超过第二阈值将所述文件缓存至所述线程缓存区域; 在所述文件的读取频率大于第一阈值,并且所述文件的大小超过第二阈值, 将所述文件缓存至所述全局缓存区域。
- 如权利要求10所述的装置,其特征在于,所述装置还包括:存储单元,用于在所述文件的读取频率小于或者等于第一阈值时,将所述文件存储至所述磁盘阵列;第二划分单元,用于将所述线程缓存区域划分为多个小区域,为每个进程在其亲和节点的线程缓存区域分配对应的小区域,每个小区域划分为私有缓存区域和共享缓存区域,其中私有缓存区域用于缓存进程的私有数据, 共享缓存区域用于缓存进程的共享数据。
- 如权利要求11所述的装置,其特征在于,所述共享缓存区域采用双向链表的方式串联连接。
- 如权利要求10所述的装置,其特征在于,所述控制单元还包括:第一判断模块,用于在所述文件的大小未超过第二阈值时,判断所述文件是否为所述进程的私有文件;私有缓存模块,用于在所述文件为所述进程的私有文件时,将所述文件缓存至所述进程对应小区域的私有缓存区域;共享缓存模块,用于在所述文件不是所述进程的私有文件时,将所述文件缓存至所述进程对应小区域的共享缓存区域。
- 如权利要求10所述的装置,其特征在于,所述装置还包括:释放单元,用于在所述进程执行结束时,释放分配给所述进程的缓存区域。
- 一种多节点系统中数据缓存的映射装置,该系统中至少包括缓存介质和磁盘阵列,其特征在于,所述装置包括:第一划分单元,用于将所述缓存介质中的缓存区域划分为多个子区域,每个子区域与所述系统中的一个节点对应;映射单元,用于将每个所述子区域划分为线程缓存区域和全局缓存区域,所述线程缓存区域与所述磁盘阵列采用全相联的映射方式建立映射,所述全局缓存区域与所述磁盘阵列采用组相联的映射方式建立映射。
- 如权利要求15所述的装置,其特征在于,所述装置还包括:第二划分单元,用于将所述线程缓存区域划分为多个小区域,为每个进程在其亲和节点的线程缓存区域分配对应的小区域,每个小区域划分为私有缓存区域和共享缓存区域,其中私有缓存区域用于缓存进程的私有数据, 共享缓存区域用于缓存进程的共享数据。
- 如权利要求16所述的装置,其特征在于,所述共享缓存区域采用双向链表的方式串联连接。
- 一种包含权利要求10至14任一项所述多节点系统中数据缓存装置的系统。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201180001894.0A CN103038755B (zh) | 2011-08-04 | 2011-08-04 | 多节点系统中数据缓存的方法、装置及系统 |
EP11858742.7A EP2645259B1 (en) | 2011-08-04 | 2011-08-04 | Method, device and system for caching data in multi-node system |
PCT/CN2011/077994 WO2012109879A1 (zh) | 2011-08-04 | 2011-08-04 | 多节点系统中数据缓存的方法、装置及系统 |
US13/968,714 US9223712B2 (en) | 2011-08-04 | 2013-08-16 | Data cache method, device, and system in a multi-node system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2011/077994 WO2012109879A1 (zh) | 2011-08-04 | 2011-08-04 | 多节点系统中数据缓存的方法、装置及系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/968,714 Continuation US9223712B2 (en) | 2011-08-04 | 2013-08-16 | Data cache method, device, and system in a multi-node system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012109879A1 true WO2012109879A1 (zh) | 2012-08-23 |
Family
ID=46671940
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2011/077994 WO2012109879A1 (zh) | 2011-08-04 | 2011-08-04 | 多节点系统中数据缓存的方法、装置及系统 |
Country Status (4)
Country | Link |
---|---|
US (1) | US9223712B2 (zh) |
EP (1) | EP2645259B1 (zh) |
CN (1) | CN103038755B (zh) |
WO (1) | WO2012109879A1 (zh) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9183017B2 (en) * | 2012-10-19 | 2015-11-10 | International Business Machines Corporation | Affinity of virtual processor dispatching |
US9858185B1 (en) * | 2014-12-23 | 2018-01-02 | Emc Corporation | Multi-tier data storage using inclusive/exclusive burst buffer caching based on reference counts |
CN106569728B (zh) * | 2015-10-09 | 2021-02-23 | 中兴通讯股份有限公司 | 多磁盘阵列raid共享写缓存的处理方法及装置 |
KR20170051563A (ko) * | 2015-10-29 | 2017-05-12 | 에스케이하이닉스 주식회사 | 데이터 저장 장치 및 그것의 동작 방법 |
CN106371916B (zh) * | 2016-08-22 | 2019-01-22 | 浪潮(北京)电子信息产业有限公司 | 一种存储系统io线程优化方法及其装置 |
CN108009019B (zh) * | 2016-10-29 | 2021-06-22 | 网宿科技股份有限公司 | 分布式数据定位实例的方法、客户端及分布式计算系统 |
US11138178B2 (en) | 2016-11-10 | 2021-10-05 | Futurewei Technologies, Inc. | Separation of computation from storage in database for better elasticity |
US20180292988A1 (en) * | 2017-04-07 | 2018-10-11 | GM Global Technology Operations LLC | System and method for data access in a multicore processing system to reduce accesses to external memory |
CN109508140B (zh) * | 2017-09-15 | 2022-04-05 | 阿里巴巴集团控股有限公司 | 存储资源管理方法、装置、电子设备及电子设备、系统 |
FR3076003B1 (fr) * | 2017-12-27 | 2021-01-22 | Bull Sas | Acces multiples a un fichier de donnees stocke dans un systeme de stockage de donnees associe a un espace memoire tampon |
CN111694768B (zh) * | 2019-03-15 | 2022-11-01 | 上海寒武纪信息科技有限公司 | 运算方法、装置及相关产品 |
WO2021066844A1 (en) * | 2019-10-04 | 2021-04-08 | Visa International Service Association | Techniques for multi-tiered data storage in multi-tenant caching systems |
CN110968271B (zh) * | 2019-11-25 | 2024-02-20 | 北京劲群科技有限公司 | 一种高性能数据存储方法、系统与装置 |
CN111371950A (zh) * | 2020-02-28 | 2020-07-03 | Oppo(重庆)智能科技有限公司 | 分屏响应方法及装置、终端、存储介质 |
CN111984197B (zh) * | 2020-08-24 | 2023-12-15 | 许昌学院 | 计算机缓存分配方法 |
CN112148225B (zh) * | 2020-09-23 | 2023-04-25 | 上海爱数信息技术股份有限公司 | 一种基于NVMe SSD的块存储缓存系统及其方法 |
CN117112268B (zh) * | 2023-10-23 | 2024-02-13 | 深圳市七彩虹禹贡科技发展有限公司 | 一种内存共享管理方法及系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1499382A (zh) * | 2002-11-05 | 2004-05-26 | 华为技术有限公司 | 廉价冗余磁盘阵列系统中高效高速缓存的实现方法 |
CN1620651A (zh) * | 2002-01-09 | 2005-05-25 | 国际商业机器公司 | 使用全局窥探向单个相干系统中的分布式计算机节点提供高速缓存一致性的方法和设备 |
US7743200B1 (en) * | 2007-05-24 | 2010-06-22 | Juniper Networks, Inc. | Instruction cache using perfect hash function |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7996644B2 (en) * | 2004-12-29 | 2011-08-09 | Intel Corporation | Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache |
US20090320036A1 (en) | 2008-06-19 | 2009-12-24 | Joan Marie Ries | File System Object Node Management |
-
2011
- 2011-08-04 CN CN201180001894.0A patent/CN103038755B/zh active Active
- 2011-08-04 WO PCT/CN2011/077994 patent/WO2012109879A1/zh active Application Filing
- 2011-08-04 EP EP11858742.7A patent/EP2645259B1/en active Active
-
2013
- 2013-08-16 US US13/968,714 patent/US9223712B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1620651A (zh) * | 2002-01-09 | 2005-05-25 | 国际商业机器公司 | 使用全局窥探向单个相干系统中的分布式计算机节点提供高速缓存一致性的方法和设备 |
CN1499382A (zh) * | 2002-11-05 | 2004-05-26 | 华为技术有限公司 | 廉价冗余磁盘阵列系统中高效高速缓存的实现方法 |
US7743200B1 (en) * | 2007-05-24 | 2010-06-22 | Juniper Networks, Inc. | Instruction cache using perfect hash function |
Non-Patent Citations (1)
Title |
---|
See also references of EP2645259A4 * |
Also Published As
Publication number | Publication date |
---|---|
US9223712B2 (en) | 2015-12-29 |
CN103038755A (zh) | 2013-04-10 |
EP2645259A4 (en) | 2014-02-19 |
CN103038755B (zh) | 2015-11-25 |
US20130346693A1 (en) | 2013-12-26 |
EP2645259B1 (en) | 2015-10-14 |
EP2645259A1 (en) | 2013-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2012109879A1 (zh) | 多节点系统中数据缓存的方法、装置及系统 | |
WO2013042880A2 (ko) | 다양한 블록 크기를 지원하는 주소 사상을 사용하여 플래시 메모리 내에 데이터를 저장하는 방법 및 장치 | |
WO2017196141A1 (en) | Autonomous prefetch engine | |
WO2014157817A1 (ko) | 플래시 메모리 기반의 페이지 주소 사상 방법 및 시스템 | |
WO2017065379A1 (ko) | 프로세싱-인-메모리를 이용한 명령어 처리 방법 및 그 장치 | |
WO2012111905A2 (ko) | 맵 리듀스를 이용한 분산 메모리 클러스터 제어 장치 및 방법 | |
US5727150A (en) | Apparatus and method for page migration in a non-uniform memory access (NUMA) system | |
WO2013024952A1 (ko) | 메모리 컨트롤러 및 이의 데이터 관리방법 | |
WO2012091234A1 (ko) | 비휘발성 메모리 및 휘발성 메모리를 포함하는 메모리 시스템 및 그 시스템을 이용한 처리 방법 | |
WO2014142553A1 (ko) | 워크 로드에 따라 동적 자원 할당 가능한 상호 연결 패브릭 스위칭 장치 및 방법 | |
CN1195817A (zh) | 在非包含的高速缓存存储器分级体系内使用的实现高速缓存一致性机制的方法和系统 | |
WO2018054035A1 (zh) | 一种基于 Spark 语义的数据重用方法及其系统 | |
WO2011023121A1 (zh) | 一种次级内存的分配方法和装置 | |
JP2024029007A (ja) | 記憶システムをメインメモリとして使用するための方法および装置 | |
WO2013032101A1 (ko) | 메모리 시스템 및 그 관리방법 | |
WO2012149815A1 (zh) | 磁盘缓存的管理方法及装置 | |
WO2014112831A1 (en) | Method and system for dynamically changing page allocator | |
WO2024111952A1 (ko) | 샤딩 기반의 키-값 캐싱 시스템 및 방법 | |
WO2018194237A1 (ko) | 하이브리드 트랜잭셔널 메모리 시스템에서의 트랜잭션 처리 방법 및 트랜잭션 처리 장치 | |
CN104321754A (zh) | 一种Cache工作模式的设置方法和装置 | |
WO2015016615A1 (ko) | 프로세서 및 메모리 제어 방법 | |
KR101102260B1 (ko) | 가상 어드레스 캐시 및 고유 태스크 식별자를 이용하는데이터를 공유하기 위한 방법 | |
WO2012119384A1 (zh) | 文件系统的挂载方法、装置及系统 | |
WO2009088194A2 (ko) | 컴퓨터 저장장치에서의 프리페칭 데이터 관리 방법 | |
Wang et al. | Fine-grained data management for dram/ssd hybrid main memory architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180001894.0 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11858742 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011858742 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |