CN102844734A - Optimizing a file system for different types of applications in a compute cluster using dynamic block size granularity - Google Patents

Optimizing a file system for different types of applications in a compute cluster using dynamic block size granularity Download PDF

Info

Publication number
CN102844734A
CN102844734A CN2011800186359A CN201180018635A CN102844734A CN 102844734 A CN102844734 A CN 102844734A CN 2011800186359 A CN2011800186359 A CN 2011800186359A CN 201180018635 A CN201180018635 A CN 201180018635A CN 102844734 A CN102844734 A CN 102844734A
Authority
CN
China
Prior art keywords
data
file system
node
storer
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011800186359A
Other languages
Chinese (zh)
Other versions
CN102844734B (en
Inventor
P·萨卡尔
P·邦德伊
H·普查
M·A·沙赫
R·特瓦利
K·古普塔
R·阿南塔纳拉亚南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN102844734A publication Critical patent/CN102844734A/en
Application granted granted Critical
Publication of CN102844734B publication Critical patent/CN102844734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

Embodiments of the invention relate to optimizing a file system for different types of applications in a compute cluster using dynamic block size granularity. An exemplary embodiment includes reserving a predetermined number of storage allocation regions for each node in a cluster, wherein each storage allocation region comprises a set of contiguous data blocks on a single storage disk of locally attached storage; using a contiguous set of data blocks on a single storage disk of locally attached storage as a file systems block allocation granularity for striping data to be stored in the file system for a compute operation in the cluster; and using a single data block of a shared storage subsystem as the file systems block allocation granularity for striping data to be stored in the file system for a data access operation in the cluster.

Description

Use the dynamic block size granularity to come the dissimilar optimizing application file system of concentrating as calculated group
Technical field
Various embodiments of the present invention relate to the field of data storage, in particular to using the dynamic block size granularity to come the dissimilar optimizing application file system of concentrating as calculated group.
Background technology
File system is the method that is used to store with constituent act and data.File system uses storage subsystem to keep file and data.File system is that storage subsystem is applied logical organization so that the management structure of the file of the data on client computer establishment, storage and the access stored subsystem.Distributed file system is the file system of sharing through the memory resource of network support file and a plurality of clients.Clustered file systems is a kind of distributed file system that makes the identical data of a plurality of computing nodes while access stored on shared storage subsystem in trooping.
It is the system that has a plurality of nodes that calculating is trooped, and said a plurality of nodes carry out alternately each other, data, application is provided, reaches other system resources as single entity to give FTP client FTP.
Calculating is trooped provides scalability and reliability through in allowing node and shared storage to be added to troop.File system is used to the storage of management data in calculating is trooped.Be assigned to through the specific region with storer and want stored data, storer is distributed by the file system that calculated group is concentrated.Computing node in trooping is regarded as having the local resource to the direct visit of the shared storage subsystem of clustered file systems with file system.
Cloud computing is the computation model of remote dummy computational resource as service to client to be provided.Cloud computing also provides resource as service as required through trustship with through network remote, to client the software and hardware resource is provided.Thereby the final user is allowed to utilize as required computational resource, need not the investment to foundation structure and management.The infrastructure of cloud computing generally includes the big distributed cluster of the server of work in concert.
Summary of the invention
Correspondingly; In first aspect; The invention provides and use the dynamic block size granularity to come method: the storage allocation zone that is each node reservation predetermined quantity in trooping as the concentrated dissimilar optimizing application file system of calculated group; Wherein, each storage allocation zone all comprises one group of continuous data block on the single memory disc of local attached storer; Continuous data block set on the single memory disc of the storer that use this locality is attached is as the piece partition size of file system, so that striping will be stored in the data in the file system for the calculating operation in said the trooping; And, use the individual data piece of sharing storage subsystem, as the piece partition size of file system, so that striping will be stored in the data in the file system for the data access operation in trooping.
This method can further include at least one to the storage allocation zone of reserving of the data allocations that is used for calculating operation.This method can further include; When the sum in the storage allocation zone of the said reservation of the node in said the trooping during less than said predetermined quantity; Reserve at least one other storage allocation zone, the sum up to the storage allocation zone of the reservation of said node equals said predetermined quantity.This method can further include calculating operation is sent to the node that said calculated group is concentrated, and the data of said calculating operation are assigned to the local attached storer of said node.This method can further include the local attached storer that each continuous data block set of the local attached storer of distributing to node is copied to the Section Point in said the trooping.This method can further include the data allocations that is used for data access operation to sharing storage subsystem.This method can further include the position of following the tracks of each data block of distributing to the local attached storer in said shared storage subsystem and the said file system.Preferably, said data access operation is from comprise following group, to select: operation is operated and obtained in advance to book-keeping operation, data transfer operation, cache management.This method can further include, if the calculating operation of node has fault, then is copied in the data that are associated with said calculating operation and restarts said calculating operation on the Section Point wherein.Preferably, troop and be Terminal Server Client trustship calculation services.
In second aspect; Provide and used the dynamic block size granularity to come the system as the concentrated dissimilar optimizing application file system of calculated group, having comprised: calculate and troop, said calculating is trooped and is comprised a plurality of nodes; Wherein, In said a plurality of node each all comprises local attached storer, is coupled to each the shared storage subsystem in said a plurality of node, and is coupled to each the file system manager in said shared storage subsystem and the said a plurality of node; Wherein, Said file system manager: each in said a plurality of nodes of concentrating for said calculated group is reserved the storage allocation zone of predetermined quantity, and wherein, each storage allocation zone all comprises one group of continuous data block on the single memory disc of local attached storer; Continuous data block set on the single memory disc of the storer that use this locality is attached; As the piece partition size of file system,, and use the individual data piece of sharing storage subsystem so that striping will be stored in the data in the file system for the calculating operation that said calculated group is concentrated; As the piece partition size of file system, so that striping will be stored in the data in the file system for the data access operation that said calculated group is concentrated.
Preferably, file system manager sends to the node that said calculated group is concentrated with calculating operation, and the data of said calculating operation are assigned to the local attached storer of said node.
Preferably; When the sum in the storage allocation zone of the said reservation of said node during less than said predetermined quantity; Said file system manager is reserved at least one other storage allocation zone, and the sum up to the storage allocation zone of the said reservation of said node equals said predetermined quantity.Preferably; File system manager: each the continuous data block set that will distribute to the local attached storer of node copies to the local attached storer of the concentrated Section Point of said calculated group; And if the calculating operation of said node has fault, then be copied to and restart said calculating operation on the Section Point wherein in the data that are associated with said calculating operation.
In the third aspect; Provide and comprised the computer program that is stored in the computer program code on the computer-readable medium; In being loaded into computer system and when carrying out, cause said computer system to be carried out in steps above that according to the institute of the described method of first aspect.
This computer program can present the form of the computer program that is used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group; Comprise: the computer-readable recording medium that wherein has computer-readable program code; Said computer-readable program code comprises: computer-readable recording medium comprises the computer-readable program code in the storage allocation zone of each node reservation predetermined quantity that is configured in trooping; Wherein, Each storage allocation zone all comprises one group of continuous data block on the single memory disc of local attached storer; Computer-readable recording medium also comprises computer-readable program code; These program codes are configured to use the continuous data block set on the single memory disc of local attached storer; As the piece partition size of file system,, and be configured to use the individual data piece of sharing storage subsystem so that striping will be stored in the data in the file system for the calculating operation in trooping; As the piece partition size of file system, so that striping will be stored in the readable program code of data computing machine in the file system for the data access operation in trooping.
Preferably, computer-readable program code also comprises: be configured to calculating operation is sent to the node that said calculated group is concentrated, the data of said calculating operation are assigned to the computer-readable program code of the local attached storer of said node.Preferably; Computer-readable program code also comprises: computer-readable program code; Said computer-readable program code is configured to; When the sum in the storage allocation zone of the said reservation of said node during less than said predetermined quantity, said file system manager is reserved at least one other storage allocation zone, and the sum up to the storage allocation zone of the said reservation of said node equals said predetermined quantity.Preferably; Computer-readable program code also comprises: computer-readable program code, said computer-readable program code are configured to follow the tracks of the position of distributing to local attached each data block of storer in said shared storage subsystem and the said file system.Preferably; Computer-readable program code also comprises: computer-readable program code, said computer-readable program code are configured to each continuous data block set of the local attached storer of distributing to node is copied to the local attached storer of the Section Point in said the trooping.Preferably; Computer-readable program code also comprises: computer-readable program code; Said computer-readable program code is configured to; If the calculating operation of said node has fault, then be copied to and restart said calculating operation on the Section Point wherein in the data that are associated with said calculating operation.
Various embodiments of the present invention relate to uses the dynamic block size granularity to come the dissimilar optimizing application file system of concentrating as calculated group.One side of the present invention comprises the method that is used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group.This method can comprise, is the storage allocation zone that each node in trooping is reserved predetermined quantity, and wherein, each storage allocation zone all comprises one group of continuous data block on the single memory disc of local attached storer; Continuous data block set on the single memory disc of the storer that use this locality is attached is as the piece partition size of file system, so that striping will be stored in the data in the file system for the calculating operation in said the trooping; And use the individual data piece of sharing storage subsystem, as the piece partition size of file system, so that striping will be stored in the data in the file system for the data access operation in trooping.
Another embodiment of the present invention comprises the system that is used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group.This system can comprise calculating troops, and this calculating is trooped and comprised: a plurality of nodes, wherein; In said a plurality of node each all comprises local attached storer; Be coupled to each the shared storage subsystem in said a plurality of node, and, be coupled to each the file system manager in said shared storage subsystem and the said a plurality of node; Wherein, Said file system manager: each in said a plurality of nodes of concentrating for said calculated group is reserved the storage allocation zone of predetermined quantity, and wherein, each storage allocation zone all comprises one group of continuous data block on the single memory disc of local attached storer; Continuous data block set on the single memory disc of the storer that use this locality is attached; As the piece partition size of file system,, and use the individual data piece of sharing storage subsystem so that striping will be stored in the data in the file system for the calculating operation that said calculated group is concentrated; As the piece partition size of file system, so that striping will be stored in the data in the file system for the data access operation that said calculated group is concentrated.
Another embodiment of the present invention comprises the computer program that is used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group; Comprise: the computer-readable recording medium that wherein has computer-readable program code; Said computer-readable program code comprises: computer-readable recording medium comprises the computer-readable program code in the storage allocation zone of each node reservation predetermined quantity that is configured in trooping; Wherein, Each storage allocation zone all comprises one group of continuous data block on the single memory disc of local attached storer; Computer-readable recording medium also comprises computer-readable program code; These program codes are configured to use the continuous data block set on the single memory disc of local attached storer; Piece as file system distributes fine granularity, so that striping will be stored in the data in the file system for the calculating operation in trooping, and is configured to use the individual data piece of sharing storage subsystem; As the piece partition size of file system, so that striping will be stored in the readable program code of data computing machine in the file system for the data access operation in trooping.
Description of drawings
Referring now to each accompanying drawing, only as an example, the preferred embodiments of the present invention are described, wherein:
It is that the calculating of the file system of dissimilar optimizing application is trooped that Fig. 1 shows according to using the dynamic block size granularity having of an embodiment;
Fig. 2 shows the process flow diagram of method that is used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group according to an embodiment;
Fig. 3 shows the piece allocative decision that striping will be stored in the data of file system for the calculating operation that calculated group is concentrated that is used for according to an embodiment;
Fig. 4 shows the piece allocative decision that striping will be stored in the data of file system for the data access operation that calculated group is concentrated that is used for according to an embodiment; And
Fig. 5 shows the block diagram of system of process that wherein can realize being used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group according to an embodiment.
Embodiment
Below description only be used to General Principle of the present invention is described that and restriction is made in the inventive concept of not narrated in the claims to following requirement.In addition, special characteristic described herein can with various possible combinations and displacement in each in other described characteristics combination ground use.Unless specifically stated otherwise, all terms all should be endowed their the most possible explanation, comprise from the hint implication of instructions and by implication that those skilled in the art understood and/or like dictionary defined implication in paper or the like.
This description can openly be used to use a plurality of preferred embodiments of the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group, with and operation and/or component part.Although will describe following description the present invention is placed the context on the exemplary embodiment with storage allocation process and memory device; But should remember that the instruction that comprises claim here can have application widely to system, equipment and the application (comprising system, equipment and application in the cloud computing environment) of other types.
Various embodiments of the present invention relate to uses the dynamic block size granularity to come the dissimilar optimizing application file system of concentrating as calculated group.Use the clustered file systems of shared storage not support to calculate transmission, support to handle the required characteristic of data-intensive application (for example, data analysis application) of large data sets to data.In addition, the default tile size of the clustered file systems of use shared storage is very little, causes the high task expense of the data-intensive application of the predetermined task of each data block.The basic storage architecture of service data intensive applications is based on the file system of the Internet scale of the portable operating system interface (POSIX) of the interface that is used for Unix that standard is not provided.But the file system of the Internet scale is to be suitable for data-intensive application the special file system of not supporting the performance requirement of traditional application.
In one embodiment, modification is provided, with traditional application and the data-intensive application of supporting that single calculated group is concentrated to the piece allocative decision of file system.For example, the data allocations of revised file system is traditional based on the cloud analysis on the clustered file systems of POSIX to allow the support of POSIX stack to be implemented in.In one embodiment, calculate the bulk granularity of the data that the file system troop can be associated with calculating operation for data-intensive application choice and the fritter granularity of the data that are associated with data access operation for traditional application choice.In an exemplary embodiment, the piece allocative decision of revised file systemic circulation is to use the set of continuous piece (bulk), as the partition size that is used to calculating operation striping data.In a further exemplary embodiment, file system is for inner default tile size granularity, the block sizes used of all data access operation that are used for traditional application.
It is that the calculating of the file system of dissimilar optimizing application is trooped that Fig. 1 shows according to using the dynamic block size granularity having of an embodiment.Calculating is trooped and 102 is comprised a plurality of computing node 104a, and 104b...104 is also referred to as application node.In an exemplary embodiment, calculating troops 102 comprises and can expand to thousands of computing node 104a, the hardware architecture of 104b...104n.
Each computing node 104a, 104b...104n are coupled to local attached storer 106a, 106b...106n.For example, local attached storer 106a, 106b...106n can be computing node 104a physically, 104b...104n inner and/or outside physically, use Disk Array directly attached.In one embodiment, local attached storer 106a, 106b...106n comprise through interface standard and directly are attached to computing node 104a, the memory device of 104b...104n.For example, interface standard includes but are not limited to, optical-fibre channel (FC), small computer system interface (SCSI) or integrated drive electronic circuit (IDE).In an exemplary embodiment, each computing node 104a, 104b...104n comprise local attached storer 106a, and four 750GB serial advanced technology attachments of 106b...106n connect (SAT A) driver.
Calculating is trooped and 102 is comprised and be configured to the troop file system manager 108 of 102 file system of Management Calculation.For example, calculate 102 the file system of trooping and to include but are not limited to the IBM general parallel file system TM(GPFS TM).In an exemplary embodiment, file system manager 108 is embodied to software, and can be from the calculating any isolated node operation 102 of trooping.
Calculating is trooped and 102 is also comprised and share storage subsystem 114.For example, share storage subsystem 114 and can include but are not limited to storage area network (SAN) equipment.Share storage subsystem 114 and be coupled to storage switch 112. Computing node 104a, 104b...106 are coupled to storage switch 112 so that storage subsystem 114 is shared in visit.File system manager 108 is coupled to storage switch 112, comes troop 102 file system of Management Calculation so that use to share storage subsystem 114.Share storage subsystem 114 and be configured to computing node 104a, 104b...104n provides the concurrent access to identical data.Share storage subsystem 114 and allow to stride computing node 104a, 104b...104n shares and writes bandwidth.In one embodiment, sharing storage subsystem 114 is designed to be used in the basic data resist technology and evades hardware fault.For example, sharing storage subsystem 114 can use Redundant Array of Inexpensive Disc (RAID) technology that data protection is provided.
Calculating is trooped and 102 is also comprised exchange network 110.The assembly that exchange network 110 is configured in the file system provides interconnection.In one embodiment, exchange network 110 is configured to computing node 104a, and 104b...104n and file system manager 108 provide interconnection.In an exemplary embodiment, exchange network 110 is the gigabit ethernet switchs for each node support, with the link between the switch of Linux running software 1 gigabit/per second (gbps).In another embodiment, exchange network 110 further is configured to through network 116 visit to client node 118 is provided.For example, network 116 includes but are not limited to wide area network (WAN).
In one embodiment, calculate to troop and 102 remotely be client 118 hosted datas and calculation services.For example, calculating troops 102 launches the cloud computing service, so that be that Terminal Server Client is in data station hosted data and calculation services.In an exemplary embodiment, calculating troops 102 is configured to the operation of hosted data analytical applications and through the network 116 long-range data that are associated with data analysis application for client 118 storages.Correspondingly, calculating troops 102 launches concurrency and scalability with the cloud mode, has the data-intensive application of large data sets with operation.For example, data-intensive application comprises the data analysis application that the computation task is decomposed into one group of less parallelization calculating.
Fig. 2 shows the process flow diagram of method 200 that is used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group according to an embodiment.202; File system manager 108 each computing node 104a for trooping in 102; 104b...104n all reserved local attached storer 106a, the storage allocation zone of the predetermined quantity that comprises one group of continuous data block on the single memory disc of 106b...106n.In one embodiment, for calculating each the computing node 104a that troops in 102,104b...104n obtains the pond in continuous storage allocation zone in advance in advance for file system manager 108.For example, obtain the regional pond of continuous storage allocation in advance in advance and prevent network delay and application performance influence.Correspondingly, each computing node 104a, 104b...104n will be ready to the pond in continuous storage allocation zone, will can in the path of I/O (I/O) request, not produce network delay.
In an exemplary embodiment, be calculating each the computing node 104a in 102 that troops, the quantity in the predetermined storage allocation zone that 104b...104n reserves is 10.For example, file system is for calculating each the computing node 104a troop in 102, and 104b...104n obtains the pond in ten storage allocation zones in advance, comprises one group of continuous data block.In other embodiments, be calculating each the computing node 104a in 102 that troops, the predetermined quantity in the storage allocation zone that 104b...104n reserves can but be not limited only between 5-200 the storage allocation zone.Be each the computing node 104a that troops in 102; 104b...104n the predetermined quantity in the storage allocation of reserving zone can be based on the rank of the adjacency that will be used for data block; In the troop type of the application supported in 102 of calculating; The file system of using, and performance requirement and the variation range of calculating the application in 102 of trooping.
204; File system manager 108 uses local attached storer 106a; 106b...106n single memory disc on continuous data block set; Be used as the piece partition size of file system, from the calculating computing node 104a 102 that troops, 104b...104n carries out the data of calculating operation so that striping will be stored in the file system.In one embodiment; File system manager 108 uses local attached storer 106a; 106b...106n single memory disc on continuous data block set; Big data block or bulk size granularity will be the computing node 104a that troops from calculating 102, the partition size of the data of the calculating operation storage of 104b...104n as distributing.For example, file system manager 108 use the bulk size granularity to distribute will be as the data of calculating operation storage, because block sizes can cause the high task expense of data-intensive application.Correspondingly, the data allocations of file system and calculating 102 the data layout information of trooping is modified to use the bulk size granularity, to support the requirement of data intensive applications.
Single block size granularity can change.Depend on performance requirement and compromise and employed file system, single default tile size can change.For example, single block size granularity can but be not limited only to, change between the 16MB at 8KB.In an exemplary embodiment, single block size granularity is 1MB.For example, the data block of 1MB fixed size prevents the segmentation in the file system, keeps optimal ordering to read and write performance, and allows other locality of node level.In other exemplary embodiments, single block size granularity is 256KB and 512KB.
In one embodiment, be used for that striping will to be stored in the storage allocation zone of data of file system adjacent with predetermined size for calculating operation.For example, be used for striping and will can be but be not limited only to for the storage allocation zone that calculating operation is stored in the data of file system, continuous from 8MB to 256MB.In an exemplary embodiment, 64 1MB pieces are grouped into and are used for the 64MB storage allocation zone that striping will be stored in the data of file system for calculating operation.For example, being used for striping will can be the 64MB data block that comprises 64 continuous single 1MB data blocks for the bulk granularity that calculating operation is stored in the data of file system.In other embodiments, being used for striping will can require to change for the block size granularity that calculating operation is stored in the data of file system along with employed file system and application performance.
File system manager 108 uses and calculates each the computing node 104a that troops in 102, the local attached storer 106a of 104b...104n, and 106b...106n, and be not that to share that storage subsystem 114 comes be that calculating operation is stored data.For example, since local attached storer 106a, the bandwidth constraints of the low-cost and shared storage subsystem 114 of 106b, and local attached storer 106a, 106b...106n are used to calculating operation storage data.For example; The memory layer of data-intensive application is implemented on the commodity assembly, local attached storer 106a, 106b...10n; To minimize carrying cost; Allow to calculate the 102 scalable thousands of computing node 104a of arriving that troop, 104b...104n is so that the application of highly-parallel can lots of data.In addition; The memory layer of data-intensive application also uses local attached storer 106a; 106b...106n make up,, come to be supported in effectively big file common in the data-intensive application to be not that data are sent to calculating operation through can calculating operation being sent to data.
In an exemplary embodiment, computing node 104a, 104b...104n move and support from the application of the data-intensive classification that has wherein started calculating operation.In one embodiment, the application of data-intensive classification includes but are not limited to, but computation is decomposed into the application of set of the calculating of less parallelization.Common characteristic in the data-intensive application is that they walk abreast, and their data access bandwidth requires other resource requirements of domination.For example; Data-intensive application is used mapping graph through partitioned data set; Support to calculate and to be broken down into less parallel computation, and through can be at the calculating computation node 104a in 102 that troops, the key/value of last parallelization of 104b...104n and execution be to reducing function.
In an exemplary embodiment, data-intensive application comprises the analytical applications based on cloud computing.For example; Analytical applications based on cloud computing includes but are not limited to; Handle the Scientific Application of a large amount of data that change continuously, comprise the application of satellite image pattern match, be used for finding the application of biological function from genome sequence; Be used for from the application of the uranology data of telescope image derivation, and the brain figure (patters) that uses magnetic resonance imaging (MRI) data.In another embodiment, data-intensive application also comprises the data handling utility of the Internet scale such as web search application and data directory and excavation application.
206, file system manager 108 will be used for the storage allocation zone of the data allocations of calculating operation at least one reservation.In one embodiment, file system manager 108 is striden the striping that file system realizes data, and wherein big file is divided into the piece of equal sizes, and with the round-robin mode continuous piece is placed on the different disks.In an exemplary embodiment, file system is used wide striping, strides local attached storer 106a with the round-robin mode, and 106b...106n comes the set of striping continuous data block.For example, the striping technology includes but are not limited to, wide striping, narrow striping and do not have striping.
208; As computing node 104a; 104b...104n the sum in storage allocation zone of reservation during less than predetermined threshold; File system manager 108 is reserved at least one other storage allocation zone, and up to computing node 104a, the sum in the storage allocation of the reservation of 104b...104n zone equals predetermined threshold.In one embodiment; File system manager 108 is computing node 104a; 104b...104n reserved local attached storer 106a; 106b...106n single memory disc at least one the other storage allocation zone that comprises one group of continuous data block, up to node 104a, the sum in the storage allocation of the reservation of 104b...104n zone equals predetermined threshold.For example, as computing node 104a, the radix in the pond of 104b...104n is less than 10 o'clock, and file system manager 108 is for calculating the computing node 104a that troops in 102, and 104b...104n has reserved the other storage allocation zone that comprises the continuous data block set.
210; File system manager 108 sends to calculating operation and calculates the computing node 104a troop in 102,104b...104n, and the data of said calculating operation are assigned to computing node 104a; 104b...104n local attached storer 106a, 106b...106n.In one embodiment, calculating operation and task be sent to the data that are used for calculating operation resident computing node 104a, 104b...104n.For example, calculation task is sent to data reduced network overhead, make to calculate to be processed very soon.In an exemplary embodiment, through using the ioctl of file system, the piece positional information of file system is exposed to using, calculation task is arranged in file system with data.Correspondingly, file system manager 108 use the piece positional informations that calculation task is sent to the data that are associated with calculation task resident have local attached storer 106a, the computing node 104a of 106b...106n, 104b...104n.
212; File system manager 108 will be distributed to computing node 104a; 104b...104n local attached storer 106a; 106b...106n the set of each continuous data block copy to and calculate at least one the other computing node 104a that troops in 102, the local attached storer 106a of 104b...104n, 106b...106n.For example, data-intensive application need can be from the fault recovery the underlying commodity assembly.Correspondingly, needing data to be striden under a plurality of nodes a plurality of nodes that duplicate and the situation of coiling fault, data-intensive application need can recover and advance, so that under the situation of node or dish fault, calculating can be restarted on different nodes.In one embodiment, if the calculating operation of node has fault, then be copied on the Section Point wherein and restart calculating operation in the data that are associated with calculating operation.
The mechanism of duplicating can change with trooping assembly based on employed file system.In an exemplary embodiment, file system is used the single source reconstructed model, and write device is forwarded to copy in all copies.In a further exemplary embodiment; File system uses streamline to duplicate, and wherein, the outside bandwidth on the write device is not striden a plurality of streams and shared; And write data can be when data be written to node, with a node from streamline to the sequence of next node by pipelining.
214; File system manager 108 uses the individual data piece of sharing storage subsystem 114; As the piece partition size of file system, so that striping will be the calculating computing node 104a in 102 that troops, the data access operation of 104b...104n and be stored in the data in the file system.In one embodiment, file system manager 108 uses the individual data piece of sharing storage subsystem 114, and fritter is as the piece partition size that is used to traditional application distribute data.In an exemplary embodiment, file system manager 108 uses the device that is written in parallel to of same file, allows for traditional application and strides a plurality of nodes sharing and write bandwidth.
In an exemplary embodiment; File system manager 108 is that the data access operation of traditional application is used the individual data piece of sharing storage subsystem 114; So that carry out effective cache management, and reduce and obtain expense in advance, because application records can be across a plurality of on the different dishes.For example, the internal data accessing operation can include but are not limited to, book-keeping operation, data transfer operation, cache management operation, and obtain operation in advance.Correspondingly, file system manager 108 is for visiting and obtain in advance and manipulate the fritter granularity for the dish of traditional optimizing application.216, file system manager 108 is from computing node 104a, and 104b...104n is to the data access operation distribute data of sharing storage subsystem 114.
218, file system manager 108 is followed the tracks of to distribute to and is shared storage subsystem 114 and calculate each the computing node 104a that troops in 102, the local attached storer 106a of 104b...104n, the position of each data block of 106b...106n.In one embodiment; File system manager 108 uses distribution diagrams to follow the tracks of to distribute to be shared storage subsystem 114 and calculates each the computing node 104a that troops in 102; 104b...104n local attached storer 106a, the position of each data block of 106b...106n.In one embodiment, file system manager 108 is for calculating each the computing node 104a that troops in 102, and 104b...104n provides the visit to distribution diagram.For example, computing node 104a, 104b...104n use distribution diagram to confirm to distribute to shared storage subsystem 114 and local attached storer 106a, the position of each piece of 106b...106n.
In one embodiment, distribution diagram is divided into a large amount of lockable ranges of distribution, and the n node calculates to troop 102 has n zone, with the parallel renewal of permission to assignment bit map.In an exemplary embodiment; Each zone in the distribution diagram all comprises the distribution state of the dish piece of the 1/n on each dish in the clustered file systems 102; Point at any given time; Each computing node 104a, 104b...104n have the entitlement in x zone, and use these zones to attempt to satisfy all request for allocation.For example, bitmap layout makes file system distribute through an accessing individual range of distribution once to stride all and coil the suitably disk space of striping.Correspondingly, locking violation is minimized, because computing node 104a, 104b...104n can be from different region allocation spaces.
Fig. 3 shows and is used for that striping will be stored in file system so that in the piece allocative decision of the data of calculated group concentrative implementation calculating operation according to an embodiment.In one embodiment; The piece allocative decision is used local attached storer 106a; 106b...106n single memory disc on continuous data block set; Be used as to the computing node 104a that troops from calculating in 102, the calculating operation of 104b...104n distributes the piece partition size of wanting stored data.
Show local attached storer 106a, a plurality of memory disc 302a of 106b...104n, 302b...302n.File 304 is divided into the data block of a plurality of fixed sizes.In an exemplary embodiment, file 304 is divided into the data block of a plurality of 1MB.A plurality of single 1MB data blocks are grouped into a plurality of continuous data block set.For example, for other adjacency of 64MB level, the continuous data block of 64 1MB fixed sizes is grouped into a plurality of 64MB data block set.Each single 64MB data block set is assigned to local attached storer 106a, the single disc of 106b...106n with the round-robin mode.
For example, continuous data block set 306a is assigned to local attached storer 106a, the single memory disc 302a of 106b...106n.Continuous data block set 306a is assigned to local attached storer 106a, the single memory disc 302a of 106b...106n.Continuous data block set 306b is assigned to local attached storer 106a, the single memory disc 302b of 106b...106n.Continuous data block set 306c is assigned to local attached storer 106a, the single memory disc 302c of 106b...106n.Continuous data block set 306n is assigned to local attached storer 106a, the single memory disc 302n of 106b...106n.
Continuous data block set 308a is assigned to local attached storer 106a, the single memory disc 302a of 106b...106n.Continuous data block set 308b is assigned to local attached storer 106a, the single memory disc 302b of 106b...106n.Continuous data block set 306c is assigned to local attached storer 106a, the single memory disc 308c of 106b...106n.Continuous data block set 308n is assigned to local attached storer 106a, the single memory disc 302n of 106b...106n.
Continuous data block set 310a is assigned to local attached storer 106a, the single memory disc 302a of 106b...106n.Continuous data block set 310b is assigned to local attached storer 106a, the single memory disc 302b of 106b...106n.Continuous data block set 310c is assigned to local attached storer 106a, the single memory disc 302c of 106b...106n.Continuous data block set 306n is assigned to local attached storer 106a, the single memory disc 310 of 106b...106n.
Fig. 4 shows being used for striping and will being stored in the piece allocative decision of the data in the file system for troop 102 data access operation of calculating according to an embodiment.In an exemplary embodiment, file system manager 108 uses single on the single memory disc of sharing storage subsystem 114, as the piece partition size that is used to the data access operation distribute data that is used for traditional application.For example, data access operation comprises inner book-keeping operation, data transfer operation, cache management operation, and obtains operation in advance for traditional optimizing application.File 404 is divided into the data block of a plurality of fixed sizes.In an exemplary embodiment, file 404 is divided into the data block of a plurality of 1MB.Each single 1MB data block set is assigned to the single disc of sharing storage subsystem 114 with the round-robin mode.For example, data block 406a is assigned to the single memory disc 402a that shares storage subsystem 114.Data block 406a is assigned to the memory disc 402a that shares storage subsystem 114, and data block 406b is assigned to memory disc 402b, and data block 406c is assigned to memory disc 402c, and data block 406n is assigned to memory disc 402n.Data block 408a is assigned to the memory disc 402a that shares storage subsystem 114, and data block 408b is assigned to memory disc 402b, and data block 408c is assigned to memory disc 402c, and data block 408n is assigned to memory disc 402n.Data block 410a is assigned to the memory disc 402a that shares storage subsystem 114, and data block 410b is assigned to memory disc 402b, and data block 410c is assigned to memory disc 402c, and data block 410n is assigned to memory disc 402n.Data block 412a is assigned to the memory disc 402a that shares storage subsystem 114, and data block 412b is assigned to memory disc 402b, and data block 412c is assigned to memory disc 402c, and data block 412n is assigned to memory disc 402n.Data block 414a is assigned to the memory disc 402a that shares storage subsystem 114, and data block 414b is assigned to memory disc 402b, and data block 414c is assigned to memory disc 402c, and data block 414n is assigned to memory disc 402n.Data block 416a is assigned to the memory disc 402a that shares storage subsystem 114, and data block 416b is assigned to memory disc 402b, and data block 416c is assigned to memory disc 402c, and data block 416n is assigned to memory disc 402n.
Fig. 5 shows the block diagram of system of process that wherein can realize being used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group according to an embodiment.System 500 comprises the one or more client devices 501 that are connected to one or more server computing systems 530.Server 530 comprises bus 502 or other communication mechanisms that is used to the information of transmitting, and that be used for process information and processors (CPU) 504 bus 502 couplings.Server 530 also comprises the primary memory 506 that is coupled to bus 502, like random-access memory (ram) or other dynamic memories, is used for canned data and the instruction of treating to be carried out by processor 504.Primary memory 506 can also be used for implementation storage temporary variable or other intermediate informations in the instruction of treating to be carried out by processor 504.
Server computer system 530 further comprises ROM (read-only memory) (ROM) 508 or other static storage devices that are coupled to bus 502, is used to processor 504 storage static information and instructions.Memory device 510 is provided,, and has been coupled to bus 502, be used for canned data and instruction like disk or CD.Bus 502 can comprise, and for example is used for 32 address wires of addressing VRAM or primary memory 506.Bus 502 also can comprise, for example is used for 32 bit data bus of transmission data between such as CPU 504, primary memory 506, VRAM and storer 510.Can alternatively can use multiplexed data/address wire to replace independent data line and address wire.
Server 530 can be coupled to display 512 through bus 502, so that to computer user's display message.The input equipment 514 that comprises alphanumeric key and other keys is coupled to bus 502, so that to processor 504 transmission information and command selection.Another kind of user input device comprises cursor control device 516, like mouse, trace ball or cursor direction key, be used for to processor 504 direction of transfer information and command selection, and the cursor that is used to control on the display 512 moves.
In response to one or more sequences of the one or more instructions that comprise in the processor 504 execution primary memorys 506, carry out function of the present invention by server 530.Can read the primary memory 506 from the instruction another computer-readable medium (like memory device 510) is incited somebody to action.Carrying out the instruction sequence that comprises in the primary memory 506 can make processor 504 carry out treatment step described herein.Can also use the one or more processors in the multiprocessing layout to carry out the instruction sequence that comprises in the primary memory 506.In alternative embodiment, the circuit that can use rigid line to connect replaces software instruction or combines with software instruction, realizes the present invention.So, various embodiments of the present invention are not limited to any specific combined of hardware circuit and software.
The various forms of computer-readable medium can relate to the one or more sequences that transmit one or more instructions to processor 504 to be carried out for it.For example, instruction can be positioned on the disk of remote computer at first.Remote computer can be with instruction load in its dynamic storage, and uses modulator-demodular unit to send instruction through telephone line.Server 530 local modulator-demodular units can receive data through telephone wire, and use infrared transmitter that data-switching is infrared signal.The infrared detector that is coupled to bus 502 can receive the data of carrying in the infrared signal, and data are put in the bus 502.Bus 502 transmits data to primary memory 506, processor 504 search instruction and carry out these instructions from primary memory 506.Before or after being carried out by processor 504, the instruction that receives from primary memory 506 can be stored in the memory device 510 alternatively.
Server 530 also comprises the communication interface 518 that is coupled to bus 502.Communication interface 518 provides the bidirectional data communication that is coupled to network link 520, and network link 520 is connected to the worldwide packet data communication network that is commonly called the Internet 528 now.The Internet 528 uses the signal of electricity, electromagnetism or the light that has carried digital data stream.Transmit the signal that passes through diverse network of numerical datas and the signal on the network link 520 to server 530 and from server 530, and the signal of process communication interface 518, be the exemplary forms of the carrier wave of mail message.
In another embodiment of server 530, interface 518 is connected to network 522 through communication link 520.For example, communication interface 518 can be the Integrated Service Digital Network card that the data communication of the telephone line (part that can comprise network link 520) that is used to be provided to corresponding types connects, or modulator-demodular unit.As another example, communication interface 518 can provide the Local Area Network card that the data communication of the LAN of compatibility connects.Can also realize Radio Link.In any such realization, the signal of electricity, electromagnetism or the light that has carried the digital data stream of representing various information is all sent and received to communication interface 518.
Network link 520 is provided to the data communication of other data equipments usually through one or more networks.For example, network link 520 can be provided to host computer 524 or arrive the connection by the data equipment of ISP (ISP) 526 operations through LAN 522.ISP526 provides data communication service through the Internet 528 again.LAN 522 and the Internet 528 boths use the signal of electricity, electromagnetism or the light that has carried digital data stream.Transmit the signal that passes through diverse network of numerical datas and the signal on the network link 120 to server 530 and from server 530, and the signal of process communication interface 518, be the exemplary forms of the carrier wave of mail message.
Server 530 can pass through network, network link 520 and communication interface 518 transmission/reception message and data, comprises Email, program code.In addition; Communication interface 518 can comprise the USB/ tuner, and network link 520 can be to be used for server 530 is connected to wire signal provider, satellite provider or other land transmission systems so that receive the antenna or the cable of message, data and program code from another source.
Exemplary version of the present invention described herein may be implemented as the logical operation in the distributed processing system(DPS) such as the system that comprises server 530 500.Logical operation of the present invention can be implemented as in server 530 sequence of the step of carrying out, and as the machine module of the interconnection in the system 500.Realization is the selection problem, and can depend on the performance requirement of realizing system 500 of the present invention.So, the logical operation that constitutes said example versions of the present invention is called as, for example operation, step or module.
Be similar to the described server 530 of preceding text; Client device 501 can comprise processor, storer, memory device, display, input equipment and be used for client device is connected to the Internet 528, ISP 526 or LAN 522; Be used for the communication interface (for example, email interface) that communicates with server 530.
System 500 can also comprise with the computing machine (for example, personal computer, computing node) 505 of client device 501 same way as operations, wherein, the user can use one or more computing machine 505 to come the data in the management server 530.
Generally speaking, be meant that like term as used herein " computer-readable medium " participation provides any medium of instruction for its execution to processor 504.Such medium can present many forms, includes but are not limited to non-volatile media, Volatile media and transmission medium.Non-volatile media comprises, for example CD or disk are like memory device 510.Volatile media comprises dynamic storage, like primary memory 506.Transmission medium comprises concentric cable, copper cash and optical fiber, comprises the circuit that constitutes bus 502.
So, disclose and use the dynamic block size granularity to come to be the dissimilar optimizing application file system that calculated group is concentrated.As understood by one of ordinary skill in the art, each side of the present invention can be used as system, method or computer program and realizes.One side of the present invention comprises the computer program that is used to use the dissimilar optimizing application file system that the dynamic block size granularity concentrates as calculated group.This computer program comprises the computer-readable recording medium that comprises computer readable program code on it.
Computer-readable recording medium comprises the computer readable program code in the storage allocation zone of each node reservation predetermined quantity that is configured in trooping; Wherein, each storage allocation zone all comprises one group of continuous data block on the single memory disc of local attached storer.Computer-readable recording medium also comprises computer readable program code; These program codes are configured to use the continuous data block set on the single memory disc of local attached storer; As the piece partition size of file system, so that striping will be stored in the data in the file system for the calculating operation in trooping.Computer-readable recording medium also comprises computer readable program code; These program codes are configured to use the individual data piece of sharing storage subsystem; As the piece partition size of file system, so that striping will be stored in the data in the file system for the data access operation in trooping.
Therefore; Each side of the present invention can be taked can be called as prevailingly complete hardware embodiment, the complete software implementation example (comprising firmware, resident software, microcode or the like) of " circuit ", " module " or " system " here or make up the form of the embodiment of software and hardware aspect.In addition, each side of the present invention can also be taked the form with the one or more computer-readable medium embodied computer program products that embodied computer-readable program code above that.
Can use any combination of one or more computer-readable mediums.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer-readable medium can be, for example but be not limited only to, and any suitable combination of each item of electricity, magnetic, light, electromagnetism, infrared or semiconductor system, device or equipment or front.The example of computer-readable recording medium (non-exhaustive list) comprises following more specifically: have electrical connection, portable computer diskette, hard disk, random-access memory (ram), ROM (read-only memory) (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber, portable compact disk ROM (read-only memory) (CD-ROM), light storage device, the magnetic storage apparatus of one or more leads, or any suitable combination of each item of front.In the context of this document, computer-readable recording medium can be any tangible medium that can comprise or store the program that supplies instruction execution system, equipment or device to use or use with them.
The computer-readable signal media can comprise the data-signal of the propagation that wherein has computer readable program code, for example, no matter is in base band, still is as the part of carrier wave.Such transmitting signal can present any form in the various forms, includes but are not limited to electromagnetism, optics or its any suitable combination.The computer-readable signal media can be a computer-readable recording medium, and can transmit, propagates or transmit any computer-readable medium of the program that supplies instruction execution system, equipment or device use or be used in combination with them.
The program code that on computer-readable medium, embodies can use any suitable medium to transmit, and includes but are not limited to wireless, Wireline, optical cable, RF or the like, or any suitable combination of each item of front.
The computer program code that is used to carry out the operation of each side of the present invention can be write with any combination of one or more programming languages; Comprise the Object-Oriented Programming Language such as Java, Smalltalk, C++ or the like, and the conventional process programming language such as " C " programming language or similar programming language.Program code can be used as the stand alone software bag fully on user's computer; Partly on user's computer, carry out; Partly on remote computer, carrying out on the user's computer and partly, perhaps on remote computer or server, carrying out fully.Under latter event, remote computer can be connected to user's computer through any network type, comprises LAN or wide area network WAN, perhaps can be connected to outer computer (for example, using the ISP, through the Internet).
The preceding text reference has been described various aspects of the present invention according to the flowchart illustrations and/or the block diagram of method, equipment (system) and the computer program of various embodiments of the present invention.Be appreciated that each frame of process flow diagram and/or block diagram, the combination of the frame in process flow diagram and/or the block diagram can realize through computer program instructions.These computer program instructions can be provided for multi-purpose computer, special purpose computer; Or the processor of other programmable data processing device; Producing machine, so that the instruction of carrying out through the processor of computing machine or other programmable data processing device produces the device that is used to be implemented in the specified function/action of this process flow diagram and/or block diagram frame.
These computer program instructions can also be stored in the computer-readable medium; This medium can instruct computer or other programmable data processing device or other equipment turn round with ad hoc fashion; So that the instruction that is stored in the computer-readable medium produces a kind of product that comprises instruction, these instructions have realized in this process flow diagram and/or the specified function/action of block diagram frame.
Computer program instructions can also be loaded into computing machine, other programmable data processing device; Or in other equipment; So that the sequence of operations step on computers, on other programmable devices or carry out on other equipment; Producing computer implemented process, so that the instruction of on computing machine or other programmable devices, carrying out is provided for being implemented in the process of the specified function/action of process flow diagram and/or block diagram frame.

Claims (15)

1. one kind is used to use the dynamic block size granularity to come the method as the concentrated dissimilar optimizing application file system of calculated group, comprising:
Be that each node in trooping reserves the storage allocation zone of predetermined quantity, wherein, each storage allocation zone all comprises the continuous data block set on the single memory disc of local attached storer;
Continuous data block set on the single memory disc of the storer that use this locality is attached is as the piece partition size of file system, so that striping will be stored in the data in the file system for the calculating operation in said the trooping; And
Use the individual data piece of sharing storage subsystem, as the piece partition size of file system, so that striping will be stored in the data in the file system for the data access operation in trooping.
2. the method for claim 1 further comprises:
The data allocations that will be used for calculating operation is at least one of the storage allocation zone of said reservation.
3. according to claim 1 or claim 2 method further comprises:
When the sum in the storage allocation zone of the reservation of the node in said the trooping during less than said predetermined quantity, reserve at least one other storage allocation zone, the sum up to the storage allocation zone of the reservation of said node equals said predetermined quantity.
4. according to the described method of aforementioned any one claim, further comprise:
Calculating operation is sent to the node in said the trooping, and the data of said calculating operation are assigned to the local attached storer of said node.
5. according to the described method of aforementioned any one claim, further comprise:
Each continuous data block set of the local attached storer of distributing to node is copied to the local attached storer of the Section Point in said the trooping.
6. according to the described method of aforementioned any one claim, further comprise:
The data allocations that will be used for data access operation is to said shared storage subsystem.
7. according to the described method of aforementioned any one claim, further comprise:
The position of each data block of the local attached storer in said shared storage subsystem and the said file system is distributed in tracking.
8. according to the described method of aforementioned any one claim, wherein, said data access operation is from comprise following group, to select: operation is operated and obtained in advance to book-keeping operation, data transfer operation, cache management.
9. method as claimed in claim 6 further comprises:
If the calculating operation of node has fault, then be copied to and restart said calculating operation on the Section Point wherein in the data that are associated with said calculating operation.
10. according to the described method of aforementioned any one claim, wherein, said trooping is Terminal Server Client trustship calculation services.
11. one kind is used to use the dynamic block size granularity to come the system as the concentrated dissimilar optimizing application file system of calculated group, comprises:
Calculating is trooped, and said calculating is trooped and comprised:
A plurality of nodes, wherein, each in said a plurality of nodes all comprises local attached storer,
Be coupled to each the shared storage subsystem in said a plurality of node, and
Be coupled to each the file system manager in said shared storage subsystem and the said a plurality of node, wherein, said file system manager:
In said a plurality of nodes of concentrating for said calculated group each is reserved the storage allocation zone of predetermined quantity, and wherein, each storage allocation zone all comprises the continuous data block set on the single memory disc of local attached storer,
Continuous data block set on the single memory disc of the storer that use this locality is attached, as the piece partition size of file system, so that striping will be stored in the data in the file system for the calculating operation that said calculated group is concentrated, and
Use the individual data piece of sharing storage subsystem, as the piece partition size of file system, so that striping will be stored in the data in the file system for the data access operation that said calculated group is concentrated.
12. system as claimed in claim 11, wherein, said file system manager sends to the node that said calculated group is concentrated with calculating operation, and the data of said calculating operation are assigned to the local attached storer of said node.
13. like claim 11 or 12 described systems; Wherein, When the sum in the storage allocation zone of the reservation of node during less than said predetermined quantity; Said file system manager is reserved at least one other storage allocation zone, and the sum up to the storage allocation zone of the said reservation of said node equals said predetermined quantity.
14. like the described system of any one claim in the claim 11 to 13, wherein, said file system manager:
Each continuous data block of the local attached storer of distributing to node is gathered the local attached storer that copies to the concentrated Section Point of said calculated group, and
If the calculating operation of node has fault, then be copied to and restart said calculating operation on the Section Point wherein in the data that are associated with said calculating operation.
15. one kind comprises the computer program that is stored in the computer program code on the computer-readable medium; In being loaded into computer system and when carrying out, cause said computer system to be carried out in steps above that according to the institute of the described method of arbitrary claim in the claim 1 to 10.
CN201180018635.9A 2010-04-14 2011-04-08 For the method and system for calculating the dissimilar optimizing application file system in trooping Active CN102844734B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/760,476 2010-04-14
US12/760,476 US9021229B2 (en) 2010-04-14 2010-04-14 Optimizing a file system for different types of applications in a compute cluster using dynamic block size granularity
PCT/EP2011/055496 WO2011128257A1 (en) 2010-04-14 2011-04-08 Optimizing a file system for different types of applications in a compute cluster using dynamic block size granularity

Publications (2)

Publication Number Publication Date
CN102844734A true CN102844734A (en) 2012-12-26
CN102844734B CN102844734B (en) 2016-02-10

Family

ID=44140786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180018635.9A Active CN102844734B (en) 2010-04-14 2011-04-08 For the method and system for calculating the dissimilar optimizing application file system in trooping

Country Status (6)

Country Link
US (1) US9021229B2 (en)
JP (1) JP5643421B2 (en)
CN (1) CN102844734B (en)
DE (1) DE112011101317T5 (en)
GB (1) GB2492870B (en)
WO (1) WO2011128257A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015024240A1 (en) * 2013-08-23 2015-02-26 华为技术有限公司 Network resource control method, apparatus and system
CN108647228A (en) * 2018-03-28 2018-10-12 中国电力科学研究院有限公司 Visible light communication big data real-time processing method and system
US10339100B2 (en) 2014-04-22 2019-07-02 Huawei Technologies Co., Ltd. File management method and file system
CN110334057A (en) * 2013-11-08 2019-10-15 德州仪器公司 File access method and its system
CN112349315A (en) * 2019-08-07 2021-02-09 爱思开海力士有限公司 Memory system, memory controller and operation method

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9323775B2 (en) 2010-06-19 2016-04-26 Mapr Technologies, Inc. Map-reduce ready distributed file system
US11726955B2 (en) 2010-06-19 2023-08-15 Hewlett Packard Enterprise Development Lp Methods and apparatus for efficient container location database snapshot operation
US9411517B2 (en) 2010-08-30 2016-08-09 Vmware, Inc. System software interfaces for space-optimized block devices
CA2843886C (en) * 2011-08-02 2020-09-22 Ajay JADHAV Cloud-based distributed persistence and cache data model
CN102426552B (en) 2011-10-31 2014-06-04 华为数字技术(成都)有限公司 Storage system service quality control method, device and system
US8850133B1 (en) * 2011-12-20 2014-09-30 Emc Corporation Using dynamic block sizes in data transfer operations
US9405692B2 (en) * 2012-03-21 2016-08-02 Cloudera, Inc. Data processing performance enhancement in a distributed file system
WO2014037957A1 (en) * 2012-09-06 2014-03-13 Hewlett-Packard Development Company, L.P. Scalable file system
CN103793425B (en) 2012-10-31 2017-07-14 国际商业机器公司 Data processing method and device for distributed system
US9122398B2 (en) * 2012-10-31 2015-09-01 International Business Machines Corporation Generalized storage allocation for multiple architectures
DE102014210233A1 (en) * 2014-05-28 2015-12-03 Siemens Aktiengesellschaft Sharing of computing resources
US9836419B2 (en) 2014-09-15 2017-12-05 Microsoft Technology Licensing, Llc Efficient data movement within file system volumes
US10298709B1 (en) * 2014-12-31 2019-05-21 EMC IP Holding Company LLC Performance of Hadoop distributed file system operations in a non-native operating system
CN106547055B (en) * 2015-09-23 2019-04-16 青岛海信宽带多媒体技术有限公司 A kind of optical detection mould group and optical module
KR20170048721A (en) * 2015-10-27 2017-05-10 삼성에스디에스 주식회사 Method and apparatus for big size file blocking for distributed processing
CN106657182B (en) 2015-10-30 2020-10-27 阿里巴巴集团控股有限公司 Cloud file processing method and device
US11106625B2 (en) 2015-11-30 2021-08-31 International Business Machines Corporation Enabling a Hadoop file system with POSIX compliance
US11455097B2 (en) 2016-01-28 2022-09-27 Weka.IO Ltd. Resource monitoring in a distributed storage system
US10228855B2 (en) * 2016-03-30 2019-03-12 International Business Machines Corporation Tuning memory across database clusters for distributed query stability
CN109669640B (en) * 2018-12-24 2023-05-23 浙江大华技术股份有限公司 Data storage method, device, electronic equipment and medium
US10963378B2 (en) 2019-03-19 2021-03-30 International Business Machines Corporation Dynamic capacity allocation of stripes in cluster based storage systems
CN112650441B (en) * 2019-10-11 2022-11-04 杭州海康威视数字技术股份有限公司 Stripe cache allocation method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996025801A1 (en) * 1995-02-17 1996-08-22 Trustus Pty. Ltd. Method for partitioning a block of data into subblocks and for storing and communicating such subblocks
US20060149915A1 (en) * 2005-01-05 2006-07-06 Gennady Maly Memory management technique
CN101410783A (en) * 2006-01-26 2009-04-15 网络装置公司 Content addressable storage array element

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5442752A (en) 1992-01-24 1995-08-15 International Business Machines Corporation Data storage method for DASD arrays using striping based on file length
JP4128206B2 (en) 1993-06-21 2008-07-30 株式会社日立製作所 Computer system and secondary storage device
US5526504A (en) * 1993-12-15 1996-06-11 Silicon Graphics, Inc. Variable page size translation lookaside buffer
US5678024A (en) 1995-05-08 1997-10-14 International Business Machines Corporation Method and system for dynamic performance resource management within a computer based system
JP3563907B2 (en) 1997-01-30 2004-09-08 富士通株式会社 Parallel computer
JPH10283230A (en) 1997-03-31 1998-10-23 Nec Corp File data storage device and machine-readable recording medium with program recorded
US7032119B2 (en) * 2000-09-27 2006-04-18 Amphus, Inc. Dynamic power and workload management for multi-server system
JP2002132548A (en) 2000-10-23 2002-05-10 Toshiba Corp Storage device and method
EP1330907B1 (en) 2000-10-26 2005-05-25 Prismedia Networks, Inc. Method and apparatus for real-time parallel delivery of segments of a large payload file
JP2004199535A (en) 2002-12-20 2004-07-15 Hitachi Ltd Computer system and management method of its storage
US20050039049A1 (en) * 2003-08-14 2005-02-17 International Business Machines Corporation Method and apparatus for a multiple concurrent writer file system
US7694072B2 (en) * 2005-09-22 2010-04-06 Xyratex Technology Limited System and method for flexible physical-logical mapping raid arrays
US7945726B2 (en) * 2006-05-08 2011-05-17 Emc Corporation Pre-allocation and hierarchical mapping of data blocks distributed from a first processor to a second processor for use in a file system
JP2007328727A (en) 2006-06-09 2007-12-20 Hitachi Ltd Distributed file management method and information processor
US7945716B2 (en) * 2007-09-27 2011-05-17 Integrated Device Technology, Inc. Serial buffer supporting virtual queue to physical memory mapping
US8478715B2 (en) * 2008-05-16 2013-07-02 Microsoft Corporation Extending OLAP navigation employing analytic workflows
US8890880B2 (en) * 2009-12-16 2014-11-18 Intel Corporation Graphics pipeline scheduling architecture utilizing performance counters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996025801A1 (en) * 1995-02-17 1996-08-22 Trustus Pty. Ltd. Method for partitioning a block of data into subblocks and for storing and communicating such subblocks
US20060149915A1 (en) * 2005-01-05 2006-07-06 Gennady Maly Memory management technique
CN101410783A (en) * 2006-01-26 2009-04-15 网络装置公司 Content addressable storage array element

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
B.GONZATLEZ等: "The Hydra Filesystem: A Distrbuted Storage Famework", 《LINUX CLUSTERS INTERNATIONAL》, vol. 2006, 31 December 2006 (2006-12-31), pages 1 - 12 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015024240A1 (en) * 2013-08-23 2015-02-26 华为技术有限公司 Network resource control method, apparatus and system
CN104584627A (en) * 2013-08-23 2015-04-29 华为技术有限公司 Network resource control method, apparatus and system
CN104584627B (en) * 2013-08-23 2018-05-18 华为技术有限公司 A kind of network resource control method, device and system
CN110334057A (en) * 2013-11-08 2019-10-15 德州仪器公司 File access method and its system
CN110334057B (en) * 2013-11-08 2023-10-20 德州仪器公司 File access method and system thereof
US10339100B2 (en) 2014-04-22 2019-07-02 Huawei Technologies Co., Ltd. File management method and file system
CN108647228A (en) * 2018-03-28 2018-10-12 中国电力科学研究院有限公司 Visible light communication big data real-time processing method and system
CN112349315A (en) * 2019-08-07 2021-02-09 爱思开海力士有限公司 Memory system, memory controller and operation method
CN112349315B (en) * 2019-08-07 2023-08-01 爱思开海力士有限公司 Memory system, memory controller, and method of operation

Also Published As

Publication number Publication date
CN102844734B (en) 2016-02-10
DE112011101317T5 (en) 2013-01-31
JP2013527524A (en) 2013-06-27
JP5643421B2 (en) 2014-12-17
GB201210249D0 (en) 2012-07-25
US9021229B2 (en) 2015-04-28
US20110258378A1 (en) 2011-10-20
GB2492870B (en) 2018-05-16
WO2011128257A1 (en) 2011-10-20
GB2492870A (en) 2013-01-16

Similar Documents

Publication Publication Date Title
CN102844734B (en) For the method and system for calculating the dissimilar optimizing application file system in trooping
US11947423B2 (en) Data reconstruction in distributed storage systems
US11444641B2 (en) Data storage system with enforced fencing
US11237772B2 (en) Data storage system with multi-tier control plane
US9823980B2 (en) Prioritizing data reconstruction in distributed storage systems
US11467732B2 (en) Data storage system with multiple durability levels
US9456049B2 (en) Optimizing distributed data analytics for shared storage
CN1244055C (en) Non-uniform memory access data handling system with shared intervention support
US20190250992A1 (en) Distributing Data on Distributed Storage Systems
US7506009B2 (en) Systems and methods for accessing a shared storage network using multiple system nodes configured as server nodes
US10268716B2 (en) Enhanced hadoop framework for big-data applications
US20140068224A1 (en) Block-level Access to Parallel Storage
US7839788B2 (en) Systems and methods for load balancing storage system requests in a multi-path environment based on transfer speed of the multiple paths
CN105373342A (en) Heterogeneous unified memory
CN102693230B (en) For the file system of storage area network
US20220229815A1 (en) Hybrid model of fine-grained locking and data partitioning
US20140181175A1 (en) Data-Intensive Computer Architecture
US20020087673A1 (en) Recovery of dynamic maps and data managed thereby
CN103329105A (en) Application recovery in file system
KR20160121380A (en) Distributed file system using torus network and method for configuring and operating of the distributed file system using torus network
CN106354428B (en) Storage sharing system of multi-physical layer partition computer system structure
Yang et al. Automatic and Scalable Data Replication Manager in Distributed Computation and Storage Infrastructure of Cyber-Physical Systems.
US11934679B2 (en) System and method for segmenting volumes across a multi-node storage system
WO2002029601A2 (en) Simd system and method
US9811530B1 (en) Cluster file system with metadata server for storage of parallel log structured file system metadata for a shared file

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant