US20090320036A1 - File System Object Node Management - Google Patents
File System Object Node Management Download PDFInfo
- Publication number
- US20090320036A1 US20090320036A1 US12/142,391 US14239108A US2009320036A1 US 20090320036 A1 US20090320036 A1 US 20090320036A1 US 14239108 A US14239108 A US 14239108A US 2009320036 A1 US2009320036 A1 US 2009320036A1
- Authority
- US
- United States
- Prior art keywords
- file system
- system object
- home node
- node
- thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5033—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering data affinity
Definitions
- application 135 may be a multithreaded application running on computer system 100 in parallel with other applications and operating system 135 may include a thread dispatcher configured to select a thread ready for execution and to dispatch it to one of the processing nodes 102 , 104 , 106 , and 108 . Once dispatched, the thread may be scheduled for execution on the dispatched node. Further, operating system 135 may be configured to assign a preferred home node to a given thread, as well as assign a home node to a given file from file system 125 . That is, the operating system may select to create nodal affinities between the processing nodes, threads, and files from system objects, as is appropriate in particular case.
- the thread dispatcher 210 may provide a component of the operating system 135 configured to select a thread 235 ready for execution and to dispatch the selected thread to a processing node where the thread may then be scheduled for execution. For example, in a NUMA based system, the thread dispatcher 210 may dispatch a thread to execute on a processing node which the thread has been assigned a nodal affinity. i.e., a thread home node.
- the thread may be dispatched to that node (step 545 ).
- the thread may be scheduled for execution on the CPU of the processing node to which the thread was dispatched.
- the operating system may update a set of thread dispatch statistics to reflect what node the thread was dispatched to, and, as described above, the operating system may evaluate the thread dispatch statistics to determine whether performance may be improved by changing the home node of the thread (steps 520 - 525 ).
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments of the invention provide a method for assigning a home node to a file system object and using information associated with file system objects to improve locality of reference during thread execution. Doing so may improve application performance on a computer system configured using a non-uniform memory access (NUMA) architecture. Thus, embodiments of the invention allow a computer system to create a nodal affinity between a given file system object and a given processing node.
Description
- 1. Field of the Invention
- Embodiments of the invention generally relate to managing access to shared resources on a computer system. More specifically, embodiments of the invention relate to techniques for managing thread access to objects in a file system on a multi-node computer system.
- 2. Description of the Related Art
- Computer systems typically include a memory for storing programs and one or more processors which execute programs stored in the memory. Typically, an operating system may be configured to schedule and execute multiple threads as separate units of execution. In a multithreaded computing environment, each thread may access resources, including files stored in a file system.
- NUMA (short for non-uniform memory access) refers to a computing architecture for a cluster of processors. Computer systems configured using NUMA architectures include multiple processing nodes, where each node includes one or more processors and local memory resources. Typically, NUMA systems are configured as “tightly-coupled,” “share everything” systems where the nodes are managed by a single operating system and may access each others memory over a common bus. That is, a processor in one node may access memory in another. Nevertheless, in such architectures, it is faster for a processor to reference the memory local to that node. Thus, poor nodal affinity for data in memory results in poor performance, i.e., when a thread executing on one node frequently accesses data in memory on another node, system performance suffers.
- A general solution to this problem is to assign each thread to a home node (i.e., create a nodal “affinity” for each thread). Nodal affinity causes the system to allocate the thread's memory pages from the home node, if possible. A thread dispatcher, in turn, preferentially dispatches the thread for execution on its assigned home node. This increases the probability that memory references for the thread will be local (i.e., within the home node).
- One embodiment of the invention includes a method of improving locality of reference for thread access to a file system object on a computing system. The method may generally include identifying the file system object. The file system object is accessible by threads executing on a plurality of processing nodes of the computing system. The method may also include receiving, from a first thread executing on a first one of the plurality of processing nodes, a request to access the file system object and determining whether a current home node attribute of the file system object is set to identify one of the plurality of processing nodes. Upon determining the current home node attribute is not set for the file system object, a second one of the plurality of processing nodes may be selected to set as the current home node attribute of the file system object. The method may also include setting the current home node attribute of the file system object to identify the second processing node.
- Another embodiment of the invention includes a computer-readable storage medium containing a program which, when executed, performs an operation for improving locality of reference for thread access to a file system object on a computing system. The operation may generally include identifying the file system object. The file system object is accessible by threads executing on a plurality of processing nodes of the computing system. The operation may also include receiving, from a first thread executing on a first one of the plurality of processing nodes, a request to access the file system object and determining whether a current home node attribute of the file system object is set to identify one of the plurality of processing nodes. Upon determining the current home node attribute is not set for the file system object, a second one of the plurality of processing nodes may be selected to set as the current home node attribute of the file system object. The operation may also include setting the current home node attribute of the file system object to identify the second processing node.
- Still another embodiment of the invention includes a system having a plurality of processing nodes, each having a respective processor and a memory, wherein the plurality of processing nodes are communicatively coupled to a common bus and an operating system configured to manage a plurality of threads executing on the plurality of processing nodes, wherein the operating system is configured to perform an operation for improving locality of reference for thread access to a file system object. The operation may generally include identifying the file system object. The file system object is accessible by threads executing on a plurality of processing nodes of the computing system. The operation may also include receiving, from a first thread executing on a first one of the plurality of processing nodes, a request to access the file system object and determining whether a current home node attribute of the file system object is set to identify one of the plurality of processing nodes. Upon determining the current home node attribute is not set for the file system object, a second one of the plurality of processing nodes may be selected to set as the current home node attribute of the file system object. The operation may also include setting the current home node attribute of the file system object to identify the second processing node.
- So that the manner in which the above recited features, advantages and objects of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings.
- It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
-
FIG. 1 is a block diagram illustrating a computer system configured using a NUMA architecture, according to one embodiment of the invention. -
FIG. 2 is a block diagram further illustrating aspects of the computer system ofFIG. 1 , according to one embodiment of the invention. -
FIG. 3 illustrates a method for assigning a current home node to a file system object on a computer system configured using a NUMA architecture, according to one embodiment of the invention. -
FIG. 4 illustrates a method for evaluating and updating the node assigned as a home node for a for a file system object on a computer system configured using a NUMA architecture, according to one embodiment of the invention. -
FIG. 5 illustrates a method for adjusting thread execution on a NUMA based computer system, according to one embodiment of the invention. - Embodiments of the invention provide a method for assigning a home node to a file system object and using information associated with file system objects to improve locality of reference during thread execution. Doing so may improve application performance on a computer system configured using a NUMA architecture. In one embodiment, each file system object may be assigned a home node. That is, embodiments of the invention allow a computer system to create a nodal affinity between a given file system object and a given processing node.
- In NUMA-based systems, a thread may be assigned a preferred home node and preferentially allocate memory resources from the local memory of the home node. While this approach frequently works well for threads that create objects stored in the memory of the home node, it is not always ideal when the thread accesses file system objects, since they often have a system-wide scope and may be accessed regularly by multiple threads which could themselves have different home nodes. For example, if a first thread frequently accesses a given file, and a second thread accesses the file only occasionally, then assigning the file to the home node of the first thread provides superior execution performance versus assigning the file to the home node of the second thread.
- Thus, determining the appropriate nodal affinity for file system objects can result in improved performance for applications that perform a significant amount of file system activity. Accordingly, in one embodiment, when a thread accesses a file system object, a current home node may be assigned to that object. The particular node may be selected based on a variety of factors associated with the file system object (e.g., a preferred home node attribute, historical usage patterns, access type details, etc.), as well as take into account a home node of the thread requesting access to the file system object. For example, if a given thread has an assigned nodal affinity for a particular node, the current home node of the file system object may be set to the same node. Further, during thread execution, a thread control block may note when file system object data is about to be accessed. If the thread gets re-dispatched to another node during the access (e.g., if the thread ends up waiting on a mutex lock or an I/O request), then the home node selected for the thread may be based on the home node of the thread and the home node of the file system object. By re-dispatching the thread to the same processing node associated with the file, locality of reference, and thus system performance, may be improved.
- In the following, reference is made to embodiments of the invention. However, it should be understood that the invention is not limited to specific described embodiments. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the invention. Furthermore, in various embodiments the invention provides numerous advantages over the prior art. However, although embodiments of the invention may achieve advantages over other possible solutions and/or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the invention. Thus, the following aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s). Likewise, reference to “the invention” shall not be construed as a generalization of any inventive subject matter disclosed herein and shall not be considered to be an element or limitation of the appended claims except where explicitly recited in a claim(s).
- One embodiment of the invention is implemented as a program product for use with a computer system. The program(s) of the program product defines functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive) on which information is permanently stored; (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive) on which alterable information is stored. Such computer-readable storage media, when carrying computer-readable instructions that direct the functions of the present invention, are embodiments of the present invention. Other media include communications media through which information is conveyed to a computer, such as through a computer or telephone network, including wireless communications networks. The latter embodiment specifically includes transmitting information to/from the Internet and other networks. Such communications media, when carrying computer-readable instructions that direct the functions of the present invention, are embodiments of the present invention. Broadly, computer-readable storage media and communications media may be referred to herein as computer-readable media.
- In general, the routines executed to implement the embodiments of the invention, may be part of an operating system or a specific application, component, program, module, object, or sequence of instructions. The computer program of the present invention typically is comprised of a multitude of instructions that will be translated by the native computer into a machine-readable format and hence executable instructions. Also, programs are comprised of variables and data structures that either reside locally to the program or are found in memory or on storage devices. In addition, various programs described hereinafter may be identified based upon the application for which they are implemented in a specific embodiment of the invention. However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
-
FIG. 1 is a block diagram illustrating acomputer system 100 configured using a NUMA architecture, according to one embodiment of the invention. As shown, thecomputer system 100 includes fourprocessing nodes processing node 102 includes aCPU 112 and amemory 122,processing node 104 includes aCPU 114 and amemory 124,processing node 106 includes aCPU 116 and amemory 126, andprocessing node 108 includes aCPU 118 and amemory 128.CPUs computer system 100. For example,CPU 112 may represent a single CPU, multiple CPUs, a single CPU having multiple processing cores, and the like.Memory - Also as shown, the
processing nodes system bus 110, and each of theCPUs memory system bus 110. Of course, the time required for a processing node to access its own local memory (e.g., processingnode 102 to access memory 112) may be substantially faster than the time required to access memory on another one of the processing node (e.g., processingnode 102 to access memory 128). Further, each node may include a memory cache (not shown) used to further improve the speed at which the processor may access information. -
Storage device 120 stores application programs and data on afile system 125 for use bycomputer system 100.Storage 120 may be one or more hard-disk drives, flash memory devices, optical media and the like. Additionally,computer system 100 may be connected to a data communications network (e.g., a local area network, which itself may be connected to other networks such as the internet), and may include input/output devices such as a mouse, keyboard and monitor. - Illustratively, the
memory 122 ofprocessing node 102 includes anoperating system 135 andmemory 124 includes anapplication 130. As is known,operating system 135 may be configured to manage the computer hardware provided by processingnodes - For example, as described in greater detail below,
application 135 may be a multithreaded application running oncomputer system 100 in parallel with other applications andoperating system 135 may include a thread dispatcher configured to select a thread ready for execution and to dispatch it to one of theprocessing nodes operating system 135 may be configured to assign a preferred home node to a given thread, as well as assign a home node to a given file fromfile system 125. That is, the operating system may select to create nodal affinities between the processing nodes, threads, and files from system objects, as is appropriate in particular case. -
FIG. 2 is a block diagram 200 further illustrating aspects of the computer system ofFIG. 1 , according to one embodiment of the invention. As shown, diagram 200 further illustrates aspects offile system 125,operating system 135, andapplications 130. - Illustratively,
file system 125 includes a plurality of file system objects 222 as well asmetadata 230 indicating a default home node to assign to file system objects 222. Each of the file system objects 222 represents a file which may store data accessed byapplication programs 130 andoperating systems 135. For example,application programs 130 may read portions of afile system object 222 into memory as well as write data to the file. As is known, afile system 122 provides a system for organizing files on a storage device (e.g., a disk hard drive). - As shown, each
file system object 222 may include a set offile data 225 as well asfile metadata 220 used by theoperating system 135 to manage the file system objects 222. In one embodiment,file metadata 220 may include a current home node assigned to thefile 222, a preferred home node assigned to thefile 222, a history of what processing nodes thefile 222 has been assigned, access control information, and an indication of what threads are currently accessing thefile 222. Of course, theparticular file metadata 220 defined forfiles 222 offile system 125 may be tailored to suit the needs of a particular case.File data 225 represents the substantive content of afile system object 222. - In one embodiment, file system objects 222 may be assigned a home node determined, at least in part, from
metadata 220. For example, afile system object 222 may have a preferred home node attribute which may be manually set much like other file system attributes. When set, the system may increase the relative weight of the preferred home node when determining the home node assigned to a givenfile system object 222. Similarly, the system may track the most recent home nodes assigned to afile system object 222 in the home node history offile system metadata 220. In such a case, the system may increase the relative weight of the node(s) with the highest usage when determining the actual home node to assign to thefile system object 222. If there is no node with a highest usage, (which could occur with commonly used file system objects, e.g., a root directory object), the system may increase the relative weight of thedefault home node 230. As the name implies, thedefault home node 230 provides a default node to assign as the home node for file system objects 222. - Further, in one embodiment, access control information may be used to assign a home node to a given file system object in the appropriate case. For example, if a thread requests exclusive or cached access to a file system object 222 (e.g. a request to open a file with no sharing or to set a current working directory, etc.) then the home node of that thread would be the logical choice to assign as the home node of the requested file.
- As stated,
applications 130 may includemultiple threads 235. Eachthread 235 provides a unit of program execution that may be executed independently of one another. Each thread may include acode segment 245 andthread metadata 240 used by theoperating system 135 to manage thethread 235. In one embodiment, thethread metadata 240 may include a home node assigned to thethread 235, node dispatch statistics for thethread 235, and a list of files accessed by thethread 235. The thread dispatch statistics may indicate which processing nodes thethread 235 has been dispatched whileapplication 130 executes on a computer system. - Also as shown,
operating system 135 includes a thread dispatcher 210 and metadata 215. Of course, one of ordinary skill in the art will recognize thatoperating system 135 is expected to include a variety of additional components used to manage the execution ofapplications 130 on a given computer system. In this example, metadata 215 is used to indicate a default home node to assign a giventhread 235. In one embodiment, the default home node may be static, e.g., if a thread lacks an assigned home node, always assign the thread to a particular processing node, but may also be dynamic, e.g., if a thread lacks an assigned home node, assign the thread to a particular processing node using a round-robin scheduling protocol. - In one embodiment, the thread dispatcher 210 may provide a component of the
operating system 135 configured to select athread 235 ready for execution and to dispatch the selected thread to a processing node where the thread may then be scheduled for execution. For example, in a NUMA based system, the thread dispatcher 210 may dispatch a thread to execute on a processing node which the thread has been assigned a nodal affinity. i.e., a thread home node. -
FIG. 3 illustrates amethod 300 for determining a current home node for a file system object on a computer system configured using a NUMA architecture, according to one embodiment of the invention. As shown, themethod 300 begins atstep 305 where the storage manager (or other operating system component) determines whether a file system object has a current home node. For example, when a thread requests to open a file and load a portion of the file in memory, the operating system may determine that the file already a current home node. If so, then themethod 300 proceeds to steps discussed below in conjunction withFIG. 4 . - Otherwise, where the requested file does not have a current home node, the
method 300 proceeds to step 310. Atstep 310, the operating system may determine whether the requested file has a preferred home node. In one embodiment, a file may include a preferred home as an attribute that may be set by users, applications, and/or the operating system. For example, a programmer may develop a computer program with a thread configured to access data in a particular file. In such a case, the programmer may set a preferred home node attribute of such a file to a home node assigned to that thread by the operating system. That is, the program may be configured to set the nodal affinity of the file to mirror the nodal affinity of the thread itself. This result is reflected instep 330 where the operating system sets the current home node of the file to the preferred home node of the file. In such a case, when the file is then accessed (e.g., by a requesting thread), portions of the file may be read into the memory associated with the preferred home node. - If the file does not have a preferred home node, then at
step 315, the operating system may determine whether the file is accessed globally, i.e., whether the file is often accessed by other threads or applications running on the computer system such as the case for file system root directories. If so, then atstep 325, the operating system may set the current home node of the file to the default home node. The default home node may be set as a parameter of the file system or operating system and provide a node accessible by each node of the computer system. In one embodiment, the default home node may be determined by evaluating a home node assigned to threads accessing the file, i.e., the nodal affinity of the file may be determined from the nodal affinity of threads accessing the file. This approach allows the file to “gravitate” toward the node from which it is most frequently accessed. Otherwise, atstep 320, the current home node of the file may be set to the home node of the thread accessing the file. This may occur, for example, when a thread requests exclusive access to a given file, when this is the first access to a given file or when the given file isn't usually accessed globally such as the case for a user's personal files and directories. - After setting the current home node of the requested file (
steps step 340, the storage manager may determine whether the current home node is available for use by the file system object. For example, because the memory on any given note is finite, depending on the size of the file system object, there may simply not be enough memory available for store the object on the assigned current home node. In such a case, the node may be assigned to the next best and available compute node (step 345). - At
step 335, the operating system may update a history of home nodes assigned to the file system object. Once a current home node is assigned, when the file is then accessed (e.g., by a requesting thread), portions of the file may be read into the memory associated with the current home node. -
FIG. 4 illustrates amethod 400 for evaluating and updating the node assigned as a home node for a file system object on a computer system configured using a NUMA architecture, according to one embodiment of the invention. Illustratively,method 400 represents a sequence of steps that may be performed followingstep 305 ofFIG. 3 when the operating system determines that a file has a current home node when a thread requests access to that file. - As shown, the
method 400 begins atstep 405 where the operating system determines whether a thread requesting access to a file has requested exclusive or cached access to the file. If so, atstep 410, the operating system may set a selected home node for the file to the home node associated with the requesting thread. The selected home node is then evaluated relative to the current home node of the file to determine whether the current home node of the file should be updated to the selected one. Specifically, in one embodiment, the operating system may determine whether the selected home node and the current home node of the file are the same (step 445). If so, atstep 460, the storage manager may determine whether the current home node is available for use by the file system object. For example, because the memory on any given note is finite, depending on the size of the file system object, there may simply not be enough memory available for store the object on the assigned current home node. In such a case, the file's current home node may be set to the next best and available compute node (step 465). - Otherwise, if the operating system determines that selected home node and the current home node of the file are not the same, then at
step 450, the current home node of the file is set to the selected home node. That is, the current home node of the file is set to the home node of the thread requesting exclusive access to the file. Once set, the method proceeds to step 460 to determine whether the newly assigned current home node is available to store the file system object, as discussed above. Atstep 455, the operating system may update the file history of home nodes to reflect the compute node assigned as the current home node of the file atstep 465. - Returning to step 405, if the operating system determines that the thread is not requesting exclusive access to the file, then at
step 415, the operating system may evaluate nodes of the computing system which may be set as the current home node of the requested file. That is, the operating system may score the available nodes of the computing system relative to one another and set the current home node of the file to the one having the highest score. For example, in one embodiment, the operating system may calculate a relative weight for the current home node of the file, other nodes which had previously been set as current home node of the file, and a relative weight for the home node of the thread requesting access to the file. This approach allows the file to “gravitate” toward the node from which it is most frequently accessed. The relative weights are calculated by assigning a weight factor A, B and C to the Thread Home Nodes, History of Home Nodes and Current Home Node respectively. The weight factors are applied to each node to determine the total weight for a node. For example, ifnode 1 is the home node of two threads that are accessing the file,node 1 is found in the history of Home Nodes three times andnode 1 is the file's current home node, then the relative weight ofnode 1 is 2*A+3*B+C. The values of A, B and C could be static or could be dynamic and configurable to adjust for different operating system and file system configurations. - At
step 420, if the file is accessed globally, i.e., if the file is frequently accessed by threads on multiple nodes, then atstep 425, the relative weight of the default home node of the file may be increased. Doing so allows the file to be assigned a current home node different from the node requesting access in cases where overall system performance may be improved. Atstep 430, if the requested file has a preferred home node, then atstep 435 the relative weight of the preferred home node of the file may be increased. Of course, one of skill in the art will recognize that the relative weighs assigned atstep 415 may be adjusted to account for a variety of circumstances in addition to the ones reflected insteps 425 an 430. - At
step 440, the operating system may select a home node having the highest relative weight. The node selected atstep 440 is then evaluated relative to the current home node of the file to determine whether the current home node of the file should be updated to the selected one. Specifically, atstep 445, the operating system may determine whether the home node selected atstep 440 is the same as the current home node of the file. If so, themethod 400 ends, leaving the current home node of the file unchanged. Otherwise, atstep 450, the current home node of the file is set to the home node selected atstep 440. Atstep 455, the operating system may update the file history of home nodes to reflect the home node assigned as the current home node atstep 450. - In one embodiment, during thread execution, a thread control block may be configured to note when the thread is about to request access to a file system object. In such a case, if the thread gets interrupted prior to completing the file access, (e.g., if the thread ends up waiting on a mutex lock or an I/O request), then the node which the thread is subsequently dispatched to may be based on both the home node of the thread and the home node of the file system object. For example, assume thread X is executing on a currently assigned home node (node 1) and is about to read data from file system object Y and that thread X is interrupted (or preempted) prior to accessing object Y. Assume further that the home node of object Y is
node 2. In such a case, then thread X may be re-dispatched onnode 2 instead ofnode 1 during this opportune moment. After thread X has accessed data from file Y (while executing on node 2), thread X could get re-dispatched back to its home node. Additionally, if a thread is frequently dispatched to nodes different from an assigned home node for reasons such as the above example, then the home node assigned to that thread may be adjusted accordingly. -
FIG. 5 illustrates amethod 500 for adjusting thread execution on a NUMA based computer system, according to one embodiment of the invention. As shown, themethod 500 begins atstep 502 where the operating system determines whether a thread is accessing a file. For example, the thread may have been accessing file on one of the compute nodes, at some point been interrupted, and is now being dispatched for further execution. In such a case,method 500 specifies that the operating system should preferentially dispatch the thread to a home node of that thread. Thus, if the thread is not accessing a file system object, and if the thread has a nodal affinity for a given processing node, the operating system first determines whether that node is available to execute the thread (step 505). If so, atstep 510, a thread dispatcher (or other operating system component) may dispatch the thread to the home node assigned to the thread. Once dispatched, the thread may be scheduled for execution on the CPU of the processing node to which the thread was dispatched. Atstep 515, the operating system may update a set of thread dispatch statistics to reflect that the thread was dispatched to the home node of that thread. Atstep - Otherwise, if the thread dispatcher determines that the home node for the thread is not available (step 505), then the thread dispatcher may dispatch the thread to a default home node (step 540), and update thread dispatch statistics (step 515), as well as evaluate whether to change the home node of the thread just dispatched (
steps 520 and 525). - Returning to step 502, if the tread is currently accessing a file, then at
step 530, the thread dispatcher may retrieve the current home node of the file being accessed by the thread, and at step 535, the thread dispatcher may determine whether the thread may be dispatched to the current home node of the file. That is, if the file system object has a particular nodal affinity for a given node, the operating system may determine whether the thread may be dispatched to that node. Doing so may improve system performance as it is frequently easier for a thread to get dispatched to a given node than for a file (i.e. data) to get loaded into memory. Thus, if the object is being accessed from a given node, it may be beneficial to dispatch the thread to that node (i.e., to send the thread to the file system object) rather than requiring the thread to load the object on some other node. Nevertheless, if the home node of the file is not currently available for the thread to be dispatched to, then another node may be selected (steps 505, 510). - However, if the current home node of the file is available, then the thread may be dispatched to that node (step 545). Once dispatched, (i.e., after one of
steps 540 and 545) the thread may be scheduled for execution on the CPU of the processing node to which the thread was dispatched. Atstep 515, the operating system may update a set of thread dispatch statistics to reflect what node the thread was dispatched to, and, as described above, the operating system may evaluate the thread dispatch statistics to determine whether performance may be improved by changing the home node of the thread (steps 520-525). - Advantageously, by intelligently assigning a home node to file system objects and using the assigned node during thread execution, embodiments of the invention improve locality of reference for a thread and thus performance for applications and operations that perform a significant number of file system object accesses.
- While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
Claims (23)
1. A method of improving locality of reference for thread access to a file system object on a computing system, comprising:
identifying the file system object, wherein the file system object is accessible by threads executing on a plurality of processing nodes of the computing system;
receiving, from a first thread executing on a first one of the plurality of processing nodes, a request to access the file system object;
determining whether a current home node attribute of the file system object is set to identify one of the plurality of processing nodes;
upon determining the current home node attribute is not set for the file system object, selecting a second one of the plurality of processing nodes to set as the current home node attribute of the file system object; and
setting the current home node attribute of the file system object to identify the second processing node.
2. The method of claim 1 , further comprising, loading the file system object into a memory associated with the second processing node, wherein the thread accesses the memory of the second processing node to access data stored by the file system object.
3. The method of claim 1 , further comprising, updating a history of home nodes assigned to the file system object.
4. The method of claim 1 , wherein the file system object includes a preferred home node attribute, and wherein the current home node attribute is set to a processing node identified by the preferred home node attribute.
5. The method of claim 1 , wherein the request to access the file system object is a request for exclusive access to the file system object, and wherein the selected home node is the first home node.
6. The method of claim 1 , further comprising:
upon determining the current home node attribute is set for the file system object, determining that the current home node attribute identifies the first processing node; and
loading the file system object into a memory associated with the first processing node.
7. The method of claim 1 , wherein the first processing node and the second processing node are the same node.
8. The method of claim 1 , wherein the plurality of processing nodes is configured according to a Non uniform memory access (NUMA) architecture.
9. A computer-readable storage medium containing a program which, when executed, performs an operation for improving locality of reference for thread access to a file system object on a computing system, the operation comprising:
identifying the file system object, wherein the file system object is accessible by threads executing on a plurality of processing nodes of the computing system;
receiving, from a first thread executing on a first one of the plurality of processing nodes, a request to access the file system object;
determining whether a current home node attribute of the file system object is set to identify one of the plurality of processing nodes;
upon determining the current home node attribute is not set for the file system object, selecting a second one of the plurality of processing nodes to set as the current home node attribute of the file system object; and
setting the current home node attribute of the file system object to identify the second processing node.
10. The computer-readable storage medium of claim 9 , wherein the operation further comprises, loading the file system object into a memory associated with the second processing node, wherein the thread accesses the memory of the second processing node to access data stored by the file system object.
11. The computer-readable storage medium of claim 9 , wherein the operation further comprises, updating a history of home nodes assigned to the file system object.
12. The computer-readable storage medium of claim 9 , wherein the file system object includes a preferred home node attribute, and wherein the current home node attribute is set to a processing node identified by the preferred home node attribute.
13. The computer-readable storage medium of claim 9 , wherein the request to access the file system object is a request for exclusive access to the file system object, and wherein the selected home node is the first home node.
14. The computer-readable storage medium of claim 9 , wherein the operation further comprises:
upon determining the current home node attribute is set for the file system object, determining that the current home node attribute identifies the first processing node; and
loading the file system object into a memory associated with the first processing node.
15. The computer-readable storage medium of claim 9 , wherein the first processing node and the second processing node are the same node.
16. The computer-readable storage medium of claim 9 , wherein the plurality of processing nodes is configured according to a Non uniform memory access (NUMA) architecture.
17. A system, comprising:
a plurality of processing nodes, each having a respective processor and a memory, wherein the plurality of processing nodes are communicatively coupled to a common bus; and
an operating system configured to manage a plurality of threads executing on the plurality of processing nodes, wherein the operating system is configured to perform an operation for improving locality of reference for thread access to a file system object, the operation comprising:
identifying a file system object, wherein the file system object is accessible by threads executing on the plurality of processing nodes of the computing system,
receiving, from a first thread executing on a first one of the plurality of processing nodes, a request to access the file system object,
determining whether a current home node attribute of the file system object is set to identify one of the plurality of processing nodes,
upon determining the current home node attribute is not set for the file system object, selecting a second one of the plurality of processing nodes to set as the current home node attribute of the file system object, and
setting the current home node attribute of the file system object to identify the second processing node.
18. The system of claim 17 , wherein the operation further comprises, loading the file system object into a memory associated with the second processing node, wherein the thread accesses the memory of the second processing node to access data stored by the file system object.
19. The system of claim 17 , wherein the operation further comprises, updating a history of home nodes assigned to the file system object.
20. The system of claim 17 , wherein the file system object includes a preferred home node attribute, and wherein the current home node attribute is set to a processing node identified by the preferred home node attribute.
21. The system of claim 17 , wherein the request to access the file system object is a request for exclusive access to the file system object, and wherein the selected home node is the first home node.
22. The system of claim 17 , wherein the operation further comprises:
upon determining the current home node attribute is set for the file system object, determining that the current home node attribute identifies the first processing node; and
loading the file system object into a memory associated with the first processing node.
23. The system of claim 17 , wherein the first processing node and the second processing node are the same node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/142,391 US20090320036A1 (en) | 2008-06-19 | 2008-06-19 | File System Object Node Management |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/142,391 US20090320036A1 (en) | 2008-06-19 | 2008-06-19 | File System Object Node Management |
Publications (1)
Publication Number | Publication Date |
---|---|
US20090320036A1 true US20090320036A1 (en) | 2009-12-24 |
Family
ID=41432650
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/142,391 Abandoned US20090320036A1 (en) | 2008-06-19 | 2008-06-19 | File System Object Node Management |
Country Status (1)
Country | Link |
---|---|
US (1) | US20090320036A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150347509A1 (en) * | 2014-05-27 | 2015-12-03 | Ibrahim Ahmed | Optimizing performance in cep systems via cpu affinity |
US9223712B2 (en) | 2011-08-04 | 2015-12-29 | Huawei Technologies Co., Ltd. | Data cache method, device, and system in a multi-node system |
EP2975520A1 (en) * | 2014-07-18 | 2016-01-20 | Fujitsu Limited | Information processing device, control method of information processing device and control program of information processing device |
US11210263B1 (en) * | 2017-09-27 | 2021-12-28 | EMC IP Holding Company LLC | Using persistent memory technology as a host-side storage tier for clustered/distributed file systems, managed by cluster file system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6049853A (en) * | 1997-08-29 | 2000-04-11 | Sequent Computer Systems, Inc. | Data replication across nodes of a multiprocessor computer system |
US20030028819A1 (en) * | 2001-05-07 | 2003-02-06 | International Business Machines Corporation | Method and apparatus for a global cache directory in a storage cluster |
US20030079087A1 (en) * | 2001-10-19 | 2003-04-24 | Nec Corporation | Cache memory control unit and method |
US20030217115A1 (en) * | 2002-05-15 | 2003-11-20 | Broadcom Corporation | Load-linked/store conditional mechanism in a CC-NUMA system |
US20070005614A1 (en) * | 2005-07-01 | 2007-01-04 | Dan Dodge | File system having deferred verification of data integrity |
US20070073974A1 (en) * | 2005-09-29 | 2007-03-29 | International Business Machines Corporation | Eviction algorithm for inclusive lower level cache based upon state of higher level cache |
US20070078972A1 (en) * | 2005-10-04 | 2007-04-05 | Fujitsu Limited | Computer-readable recording medium with system managing program recorded therein, system managing method and system managing apparatus |
US20090030868A1 (en) * | 2007-07-24 | 2009-01-29 | Dell Products L.P. | Method And System For Optimal File System Performance |
US7558930B2 (en) * | 2005-07-25 | 2009-07-07 | Hitachi, Ltd. | Write protection in a storage system allowing both file-level access and volume-level access |
-
2008
- 2008-06-19 US US12/142,391 patent/US20090320036A1/en not_active Abandoned
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6049853A (en) * | 1997-08-29 | 2000-04-11 | Sequent Computer Systems, Inc. | Data replication across nodes of a multiprocessor computer system |
US20030028819A1 (en) * | 2001-05-07 | 2003-02-06 | International Business Machines Corporation | Method and apparatus for a global cache directory in a storage cluster |
US20030079087A1 (en) * | 2001-10-19 | 2003-04-24 | Nec Corporation | Cache memory control unit and method |
US20030217115A1 (en) * | 2002-05-15 | 2003-11-20 | Broadcom Corporation | Load-linked/store conditional mechanism in a CC-NUMA system |
US20070005614A1 (en) * | 2005-07-01 | 2007-01-04 | Dan Dodge | File system having deferred verification of data integrity |
US7558930B2 (en) * | 2005-07-25 | 2009-07-07 | Hitachi, Ltd. | Write protection in a storage system allowing both file-level access and volume-level access |
US20070073974A1 (en) * | 2005-09-29 | 2007-03-29 | International Business Machines Corporation | Eviction algorithm for inclusive lower level cache based upon state of higher level cache |
US20070078972A1 (en) * | 2005-10-04 | 2007-04-05 | Fujitsu Limited | Computer-readable recording medium with system managing program recorded therein, system managing method and system managing apparatus |
US20090030868A1 (en) * | 2007-07-24 | 2009-01-29 | Dell Products L.P. | Method And System For Optimal File System Performance |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9223712B2 (en) | 2011-08-04 | 2015-12-29 | Huawei Technologies Co., Ltd. | Data cache method, device, and system in a multi-node system |
US20150347509A1 (en) * | 2014-05-27 | 2015-12-03 | Ibrahim Ahmed | Optimizing performance in cep systems via cpu affinity |
US9921881B2 (en) * | 2014-05-27 | 2018-03-20 | Sybase, Inc. | Optimizing performance in CEP systems via CPU affinity |
US20180196701A1 (en) * | 2014-05-27 | 2018-07-12 | Sybase, Inc. | Optimizing Performance in CEP Systems via CPU Affinity |
US10503556B2 (en) * | 2014-05-27 | 2019-12-10 | Sybase, Inc. | Optimizing performance in CEP systems via CPU affinity |
EP2975520A1 (en) * | 2014-07-18 | 2016-01-20 | Fujitsu Limited | Information processing device, control method of information processing device and control program of information processing device |
US20160019150A1 (en) * | 2014-07-18 | 2016-01-21 | Fujitsu Limited | Information processing device, control method of information processing device and control program of information processing device |
JP2016024578A (en) * | 2014-07-18 | 2016-02-08 | 富士通株式会社 | Information processing apparatus, control method thereof, and control program thereof |
US9697123B2 (en) * | 2014-07-18 | 2017-07-04 | Fujitsu Limited | Information processing device, control method of information processing device and control program of information processing device |
US11210263B1 (en) * | 2017-09-27 | 2021-12-28 | EMC IP Holding Company LLC | Using persistent memory technology as a host-side storage tier for clustered/distributed file systems, managed by cluster file system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9430388B2 (en) | Scheduler, multi-core processor system, and scheduling method | |
JP6924820B2 (en) | Working set and thread association | |
US9665404B2 (en) | Optimization of map-reduce shuffle performance through shuffler I/O pipeline actions and planning | |
US8954969B2 (en) | File system object node management | |
US8756379B2 (en) | Managing concurrent accesses to a cache | |
US9442760B2 (en) | Job scheduling using expected server performance information | |
Cho et al. | Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters | |
RU2569805C2 (en) | Virtual non-uniform memory architecture for virtual machines | |
US7805582B2 (en) | Method of managing memory in multiprocessor system on chip | |
US11403224B2 (en) | Method and system for managing buffer device in storage system | |
WO2018096316A1 (en) | Data processing | |
US9501313B2 (en) | Resource management and allocation using history information stored in application's commit signature log | |
US10404823B2 (en) | Multitier cache framework | |
US11366689B2 (en) | Hardware for supporting OS driven observation and anticipation based on more granular, variable sized observation units | |
JP2015504541A (en) | Method, program, and computing system for dynamically optimizing memory access in a multiprocessor computing system | |
US9934147B1 (en) | Content-aware storage tiering techniques within a job scheduling system | |
US20090320036A1 (en) | File System Object Node Management | |
US20150160973A1 (en) | Domain based resource isolation in multi-core systems | |
Shrivastava et al. | Supporting transaction predictability in replicated DRTDBS | |
EP3702911B1 (en) | Hardware for supporting os driven load anticipation based on variable sized load units | |
US8201173B2 (en) | Intelligent pre-started job affinity for non-uniform memory access computer systems | |
US20180336131A1 (en) | Optimizing Memory/Caching Relative to Application Profile | |
CN115640312A (en) | Intelligent query plan cache size management | |
Scolari | Partitioning Deep Cache Hierarchies in Software for Predictable Performance | |
Nuhić et al. | AbacusFS integrated storage and computational platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RIES, JOAN MARIE;THEIS, RICHARD MICHAEL;REEL/FRAME:021122/0047;SIGNING DATES FROM 20080611 TO 20080612 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |