US20200311035A1 - Hybrid file system architecture, file storage, dynamic migration, and application thereof - Google Patents
Hybrid file system architecture, file storage, dynamic migration, and application thereof Download PDFInfo
- Publication number
- US20200311035A1 US20200311035A1 US16/831,964 US202016831964A US2020311035A1 US 20200311035 A1 US20200311035 A1 US 20200311035A1 US 202016831964 A US202016831964 A US 202016831964A US 2020311035 A1 US2020311035 A1 US 2020311035A1
- Authority
- US
- United States
- Prior art keywords
- file
- file system
- distributed
- distributed file
- migration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013508 migration Methods 0.000 title claims abstract description 129
- 230000005012 migration Effects 0.000 title claims abstract description 129
- 238000000034 method Methods 0.000 claims abstract description 43
- 238000003672 processing method Methods 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims description 29
- 238000003066 decision tree Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 19
- 238000002474 experimental method Methods 0.000 claims description 12
- 238000013473 artificial intelligence Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 4
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000013138 pruning Methods 0.000 claims description 3
- 230000015556 catabolic process Effects 0.000 abstract description 2
- 238000006731 degradation reaction Methods 0.000 abstract description 2
- 238000002184 dynamic force spectroscopy Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
- G06F16/183—Provision of network file services by network file servers, e.g. by using NFS, CIFS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/185—Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/119—Details of migration of file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
Definitions
- the present disclosure relates to a technical field of distributed file systems, and more particularly, to a hybrid file system architecture having a plurality of distributed file systems hybridized therein, file storage, dynamic migration, and application thereof.
- HDFS has high read and write performance with respect to large files.
- Experimental analysis shows that it has better read and write performance when files are larger than 8M; while Glusterfs has better I ⁇ O performance with respect to files smaller than 8M; and so on.
- One of the technical problems to be solved by the present disclosure is: in a case where a variety of high-performance file systems coexist, how to make full use of performance advantages of various file systems, integrate a variety of file systems, make full use of their respective advantages, improve storage efficiency, improve overall performance, and comprehensively process various situations to achieve optimal overall performance of the file systems.
- a file storage processing method applied in a hybrid file system architecture including a plurality of different types of distributed file systems, for determining in which distributed file system a file to be stored is stored the file storage processing method comprising: acquiring storage attributes of the file to be stored, wherein, the storage attributes at least include a size of the file; determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored; and storing the file to be stored in the determined distributed file system.
- the storage rule is an intelligent storage model obtained through learning by using an artificial intelligence learning algorithm based on a training sample set; and features of each training sample of the training sample set include the storage attributes of the file and a label of the file system to which the file has been determined to be assigned.
- the storage attributes of the file further include: access mode type, access permission level, and associated users of the file, wherein, the access mode type is selected from one of: read-only, write-only, read-write, and executable.
- the hybrid file system architecture includes a metadata manage server, wherein, the storage rule is stored in a non-volatile storage medium, and meanwhile maintained in a metadata manage server memory; and the storage rule is dynamically updated, wherein, the determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored includes: reading the storage rule from the metadata manage server, and determining, according the read storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored.
- the storage rule is further maintained in a remote standby node.
- the artificial intelligence learning algorithm is a decision tree
- the intelligent storage model is a decision tree model constructed based on training data.
- optimization processing including pruning and cross-validation is performed in construction of the decision tree model.
- the file storage processing method further comprises: receiving, by the metadata manage server, from a client a request to read a file from the hybrid file system architecture or update a file therein; acquiring, by the metadata manage server, path information of the file to be read or updated, to further obtain storage location information of the file; returning, by the metadata manage server, the storage location of the file to be read or updated to the client; and communicating, by the client, with a corresponding distributed file system according to the returned storage location, to perform actual read operation or update operation.
- I/O performance of the file on each of the distributed file systems is determined experimentally as follows: acquiring a read throughput rate F irt and a write throughput rate F iwt of the file on each distributed file system through experiments, the read throughput rate F irt being a data size of the file read per second, and the write throughput rate F iwt being a data size of the file written per second; and calculating a sum of the read throughput rate F irt and the write throughput rate F iwt of the file in each distributed file system as the I/O performance of the file on each of the distributed file systems.
- the file storage processing method further comprises: determining a distributed file system that needs file migration; determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration; and migrating the file that has been determined to be migrated.
- the determining a distributed file system that needs file migration includes: calculating a difference in usage rate between any two distributed file systems; and determining that a distributed file system with a higher usage rate needs file migration, when the difference in usage rate is greater than a predetermined threshold.
- the determining a file to be migrated on the distributed file system, for the distributed file system that needs file migration includes: calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems; and determining the file to be migrated and the migration destination of the file based on sorting of migration gains of migrating respective files to other distributed file systems.
- the calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems includes: referring to the distributed file system that needs file migration as a distributed file system i, referring to any one of the other distributed file systems as a distributed file system j, and referring to the file on the distributed file system i as a file x; obtaining read throughput and write throughput of the file x on the distributed file system i, and predicting read throughput and write throughput of the file x on the distributed file system j; obtaining a read frequency and a write frequency of the file x on the distributed file system i; and calculating a migration gain of migrating the file x from the distributed file system i to the distributed file system j, at least based on the size of the file x, the read frequency and the write frequency of the file x on the distributed file system i, the read throughput and the write throughput of the file x on the distributed file system i, as well as the read through
- the migration gain of migrating the file x from the distributed file system i to the distributed file system j is calculated based on a formula below:
- DFS i and DFS j represent the distributed file systems i, j;
- F xrt (DFS i ) and F xrt (DFS j ) are respectively read throughput rates of the file x in the distributed file systems i, j;
- F xwt (DFS i ) and F xwt (DFS j ) are write throughput rates of the file x in the distributed file systems i, j;
- a throughput rate is a size of a file read and written per second; the read throughput rate and the write throughput rate are functions of the file size;
- F xrf and F xwf are respectively the read frequency and the write frequency of the file x in the distributed file system i; and
- s x is a size of the file x to be migrated in the file system.
- the predicting read throughput and write throughput of the file x on the distributed file system j includes: predicting by using a predetermined regression model, the regression model being selected from one of:
- the predetermined regression model is determined through a fitting process and a selecting process below: inputting file training data to different types of regression models; calculating unknown parameters by using a least square method; fitting to obtain the different types of regression models after the fitting; and selecting a regression model with a best fitting effect from the different types of regression models after the fitting as the predetermined regression model.
- the obtaining a read frequency and a write frequency of the file x on the distributed file system i includes: obtaining the read frequency and the write frequency of the file x on the distributed file system i by querying the metadata manage server.
- a file dynamic migration method applied in a hybrid file system architecture including a plurality of different types of distributed file systems comprising: determining a distributed file system that needs file migration; determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration; and migrating the file that has been determined to be migrated.
- the determining a distributed file system that needs file migration includes: calculating a difference in usage rate between any two distributed file systems; and determining that a distributed file system with a higher usage rate needs file migration, when the difference in usage rate is greater than a predetermined threshold.
- the determining a file to be migrated on the distributed file system, for the distributed file system that needs file migration includes: calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems; and determining the file to be migrated and the migration destination of the file based on sorting of migration gains of migrating respective files to other distributed file systems.
- the calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems includes: referring to the distributed file system that needs file migration as a distributed file system i, referring to any one of the other distributed file systems as a distributed file system j, and referring to the file on the distributed file system i as a file x; obtaining read throughput and write throughput of the file x on the distributed file system i, and predicting read throughput and write throughput of the file x on the distributed file system j; obtaining a read frequency and a write frequency of the file x on the distributed file system i; and calculating a migration gain of migrating the file x from the distributed file system i to the distributed file system j, at least based on the size of the file x, the read frequency and the write frequency of the file x on the distributed file system i, the read throughput and the write throughput of the file x on the distributed file system i, as well as the read through
- the migration gain of migrating the file x from the distributed file system i to the distributed file system j is calculated based on a formula below:
- DFS i and DFS j represent the distributed file systems i, j;
- F xrt (DFS i ) and F xrt (DFS j ) are respectively read throughput rates of the file x in the distributed file systems i, j;
- F xwt (DFS i ) and F xwt (DFS j ) are write throughput rates of the file x in the distributed file systems i, j;
- a throughput rate is a size of a file read and written per second; the read throughput rate and the write throughput rate are functions of the file size;
- F xrf and F xwf are respectively the read frequency and the write frequency of the file x in the distributed file system i; and
- s x is a size of the file x to be migrated in the file system.
- the predicting read throughput and write throughput of the file x on the distributed file system j includes:
- Predicting by using a predetermined regression model the regression model being selected from one of:
- the predetermined regression model is determined through a fitting process below: inputting file training data to different regression models; calculating unknown parameters by using a least square method; and obtaining a curve with a best fitting effect as the predetermined regression model.
- the obtaining a read frequency and a write frequency of the file x on the distributed file system i includes: obtaining the read frequency and the write frequency of the file x on the distributed file system i by querying the metadata manage server.
- a file storage processing device comprising a memory and a processor, the memory having computer-executable instructions stored thereon, and when executed by a controller, the computer-executable instructions being operable to execute the above-described file storage processing method.
- a file migration processing system comprising a memory and a processor, the memory having computer-executable instructions stored thereon, and when executed by a controller, the computer-executable instructions being operable to execute the above-described file dynamic migration method.
- a computer-readable storage medium having computer-executable instructions stored thereon, and when executed by a computing device, the computer-executable instructions being operable to execute the above-described file storage processing method.
- a computer-readable storage medium having computer-executable instructions stored thereon, and when executed by a computing device, the computer-executable instructions being operable to execute the above-described file dynamic migration method.
- a metadata manage server in a hybrid file system architecture system, which interacts with a client and a plurality of distributed file systems, the metadata manage server maintaining a pre-configured storage rule below, and being configured to perform a method below: acquiring storage attributes of a file to be stored, wherein, the storage attributes at least include a size of the file; determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored; determining a distributed file system that needs file migration; determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration; and migrating the file that has been determined to be migrated.
- a hybrid file system architecture system comprising a metadata manage server and a plurality of different types of distributed file systems.
- the file intelligent storage policy according to the embodiment of the present disclosure is adopted to make full use of storage features of a variety of file systems, integrate a variety of file systems, and intelligently select the file underlying storage policy according to the file feature attributes, to optimize file read and write performances.
- the intelligent storage policy is the decision tree model; the training data is acquired through previous experiments, then the decision tree model is obtained by training, subsequently the stored file attributes are used as input of the decision tree model, and output thereof is just the file storage location, so as to make the file read and write characteristics the best.
- file dynamic migration policy is adopted.
- file system load equalization is used as an evaluation index of the file system, and it is decided whether to migrate the file and to which file system the file is migrated, according to storage space usage rates of different underlying file systems, read and write I/O of different files in different file systems, as well as different read and write frequencies of different files, so as to satisfy usage equalization of different file systems and also minimize performance degradation.
- the high-performance hybrid file system architecture structure, the file storage processing method, the file dynamic migration method and the metadata manage server make comprehensive use of the performance advantages of a variety of distributed file systems to process various file storage problems, which, committed to improving a universal high-performance file system, can cope with storage problems of files of various types under various complex environments, and all have high performance.
- FIG. 1 shows a structural schematic diagram of a hybrid file system architecture according to an embodiment of the present disclosure
- FIG. 2 shows a flow chart of an applied file storage processing method in a hybrid file system architecture according to an embodiment of the present disclosure
- FIGS. 3A to 3E show schematic diagrams of an exemplary process of constructing an intelligent storage policy decision tree
- FIG. 4 shows a sequence chart of writing a file in a hybrid file system architecture according to an embodiment of the present disclosure
- FIG. 5 shows a sequence chart of corresponding operations caused by a file read request or update request from a client after a file has been stored in a hybrid file system architecture
- FIG. 6 shows an overall flow chart of a file dynamic migration method according to an embodiment of the present disclosure.
- FIG. 7 shows a schematic diagram of comparison between a throughput fit curve and an actual curve of respective distributed file systems obtained through experiments according to an embodiment of the present disclosure.
- FIG. 1 shows a structural schematic diagram of a hybrid file system architecture 1000 according to an embodiment of the present disclosure, mainly comprising three parts: an underlying storage system 1100 , a metadata manage server 1200 , and a client 1300 .
- the diagram shows that the underlying storage system 1100 includes various types of distributed file systems DFS- 1 , DFS- 2 . . .
- DFS-n such as Ceph, HDFS, GlusterFs, etc.
- the client 1300 is for users to read and write data, and provides a variety of frequently-used file system universal interfaces
- the metadata manage server 1200 is a core module of the hybrid file system architecture; according to one embodiment, the metadata manage server 1200 stores an intelligent storage decision policy 1210 and a dynamic migration policy 1230 , and at a same time, may store a part of metadata 1220 ; the metadata manage server 1200 , in response to the client's file write request, determines a file storage location according to the file intelligent storage decision policy 1210 , and feeds back the same to the client; and the metadata manage server 1200 monitors usage situation of respective distributed file systems DFS- 1 , . . . , DFS-n, and performs file migration between distributed file systems according to the file dynamic migration policy when severe dis
- FIG. 2 shows a flow chart of an applied file storage processing method 200 in a hybrid file system architecture according to an embodiment of the present disclosure.
- step S 210 acquiring storage attributes of a file to be stored, wherein, the storage attributes at least include a size of the file.
- the storage attributes of the file further include: access mode type, access permission level, and associated users of the file, wherein, the access mode type is selected from one of: read-only, write-only, read-write, and executable.
- a metadata manage server obtains the storage attributes of the file to be stored from a client, stores and maintains the same as metadata in its own memory, as shown in FIG. 1 .
- Step S 220 determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored.
- the storage rule is an intelligent storage model obtained through learning by using an artificial intelligence learning algorithm based on a training sample set; and features of each training sample of the training sample set include the storage attributes of the file and a label of the file system to which the file has been determined to be assigned.
- the label of the file system to which the file has been determined to be assigned is determined based on experimentally determined I/O performance of the file on each of the distributed file systems, and the I/O performance includes a read throughput rate and/or a write throughput rate.
- the storage rule for example, may be stored in a non-volatile storage medium such as a hard disk while the decision tree model is maintained and stored in the memory.
- the storage rule is simultaneously sent to a remote standby node.
- the storage rule is dynamically updated, for example, according to a certain period; through learning by using the artificial intelligence learning algorithm again, a newly learned storage rule is updated to the metadata manage server; and the storage rule stored in the hard disk and/or the remote node is updated synchronously.
- the determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored includes: reading the storage rule from the metadata manage server, and determining, according the read storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored.
- the artificial intelligence learning algorithm is a decision tree
- the intelligent storage model is a decision tree model constructed based on training data. Subsequently, an example of a process of constructing the decision tree model will be described in detail with reference to the drawings.
- the metadata manage server 1200 determines in which distributed file system the file is stored, by using the intelligent storage model 1210 , based on the storage attributes of the file obtained from the client, and returns the same to the client 1300 .
- Step S 230 storing the file to be stored in the determined distributed file system.
- the client 1300 directly communicates with the distributed file system 1100 , and the distributed file system stores the file in the determined specific distributed file system.
- the specific distributed file system is selected according to the attributes of the file based on the predetermined storage rule, so as to, for example, improve storage performance and efficiency, and solve the technical problem of how to use different file systems for storage to improve storage efficiency.
- a variety of distributed file systems are integrated, and system performance is comprehensively improved, by acquiring performance characteristics of various types of distributed file systems for various files through, for example, machine learning in advance, and by comprehensively utilizing advantages of different distributed file systems in a file access process.
- processing attributes of files with different attributes when stored on these distributed file systems are obtained in advance, for example, I/O performances of files of different sizes on different distributed file systems may be obtained; rules may be established according to the knowledge obtained in advance; and these rules are used when a file is stored subsequently.
- file of different sizes are selected as experimental data, tested and assessed in a variety of distributed file systems, to acquire a read throughput rate and a write throughput rate F irt , F iwt of different files in different distributed file systems; and then one with a maximum result is selected as a training data label according a formula below.
- the storage attributes of the file are extracted, including file size, access mode, access permission, and owner; a training data label of each file determined through the above-described experiment is obtained; and data shown in Table 1 is acquired as the training data.
- a simplified training data form is used, to acquire a 3-tiple dataset including size, permission, and target DFS; each sample includes features such as size, permission, and target DFS; and the training dataset is as shown in FIG. 3A .
- optimization processing including pruning and cross-validation, etc. is performed in construction of the decision tree model.
- the decision tree is provided as a preferred example, not as a limitation; on the contrary, other artificial intelligence learning algorithm may also be selected, for example, a deep neural network, a support vector machine, nearest neighbor learning, etc.
- File operations on the file system include initial storage operation (write operation), and subsequent read and possible update operations.
- FIG. 4 shows a sequence chart of writing a file in a hybrid file system architecture according to an embodiment of the present disclosure.
- step S 410 a client sends a file write access request to a metadata manage server.
- step S 420 the metadata manage server acquires file attribute information.
- step S 430 the metadata manage server acquires a decision tree model maintained by the metadata manage server.
- step S 440 the metadata manage server obtains a storage location of the file to be written, based on the file storage attribute information and the decision tree model.
- step S 450 the metadata manage server returns the storage location of the file to the client.
- step S 460 the client communicates with a corresponding distributed file system according to the returned storage location, to perform an actual file write operation.
- FIG. 5 shows a sequence chart of corresponding operations caused by a file read request or update request from a client after a file has been stored in a hybrid file system architecture.
- step S 510 the client sends a file read request or update request to a metadata manage server.
- step S 520 the metadata manage server acquires a file path from the read request or the update request.
- step S 530 the metadata manage server queries a metadata database, to acquire a storage location of the file to be read or updated.
- step S 540 the metadata manage server feeds back the storage location of the file to the client.
- step S 550 the client communicates with a corresponding distributed file system according to the returned storage location, and performs actual file read or update operations.
- file migration may also be performed, that is, a file stored in one distributed file system is migrated to another distributed file system, so that storage capacity of the system may be further improved through migration, to promote load equalization between respective distributed file systems.
- Step S 610 determining a distributed file system that needs file migration.
- usage situation of respective distributed file systems may also be continuously monitored, to judge whether file migration is needed.
- Usage rates of the respective distributed file systems may be investigated, to determine a situation of load equalization, or say, usage equalization between the respective distributed file systems; and in a case where severe disequilibrium in usage rate occurs, file migration, specifically, file emigration, is performed on a distributed file system with an excessively high usage rate.
- the determining a distributed file system that needs file migration includes: calculating a difference in usage rate between any two distributed file systems; and determining that a distributed file system with a higher usage rate needs file migration, when the difference in usage rate is greater than a predetermined threshold.
- a usage rate of a distributed file system A is 90% while a usage rate of a distributed file system B is only 10%, it is obvious that severe load disequilibrium occurs, then a file migration operation may be performed on the distributed file system A.
- a usage rate of a distributed file system represents that the file system usage rate is a ratio of actual use capacity of the file system to original capacity.
- Step S 620 determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration.
- the determining a file to be migrated on the distributed file system, for the distributed file system that needs file migration includes: calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems; and determining the file to be migrated and the migration destination of the file based on sorting of migration gains of migrating respective files to other distributed file systems.
- the calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems may be performed as follows:
- the migration gain of migrating the file x from the distributed file system i to the distributed file system j is calculated based on a formula below:
- DFS i and DFS j represent the distributed file systems i, j;
- F xrt (DFS i ) and F xrt (DFS j ) are respectively read throughput rates of the file x in the distributed file systems i, j;
- F xwt (DFS i ) and F xwt (DFS j ) are write throughput rates of the file x in the distributed file systems i, j;
- a throughput rate is a size of a file read and written per second; the read throughput rate and the write throughput rate are functions of the file size;
- F xrf and F xwf are respectively the read frequency and the write frequency of the file x in the distributed file system i; and
- s x is a size of the file x to be migrated in the file system.
- a first part of the summation on the right side of the equal sign represents an overall performance improvement made by migrating the file x from the distributed file system i to the distributed file system j, or say, a comprehensive migration gain in file size and read performance, in consideration of file size (a factor of file system usage rate level), read performance throughput rate, and read frequency; and a second part of the summation represents an overall performance improvement made by migrating the file x from the distributed file system i to the distributed file system j, or say, a comprehensive migration gain in file size and write performance, in consideration of file size, write performance throughput rate, and write frequency.
- Formula (1) indicates that, the larger the file size, the higher the read and write frequencies, the greater the throughput rate of the file on the distributed file system j, and the higher the migration gain of migrating the file to the distributed file system j with respect to the distributed file system i.
- the read frequency and the write frequency of the file x in the distributed file system i may be obtained by querying the metadata manage server.
- Formula (1) is a preferred example of calculating a migration gain of a file, but it is not a limitation; and other calculation formulas may also be designed according to needs.
- the read throughput and the write throughput of the file x on the distributed file system i may be obtained by, for example, actual observation, or may also be obtained by prediction; while the read throughput and the write throughput of the file x on the distributed file system j may only be obtained by prediction.
- predicting the read throughput and the write throughput of the file x on a distributed file system may be performed, for example, by using a predetermined regression model, and the regression model is selected from one of:
- the predetermined regression model may be determined through a fitting process and a selecting process below: inputting file training data to different types of regression model formulas; calculating unknown parameters by using a least square method; fitting to obtain the different types of regression models after the fitting; and selecting a regression model with a best fitting effect from the different types of regression models after the fitting as the predetermined regression model.
- FIG. 7 shows a schematic diagram of comparison between a throughput fit curve and an actual curve of respective distributed file systems obtained through experiments according to an embodiment of the present disclosure.
- an abscissa represents different file sizes
- an ordinate represents throughput rates.
- Target distributed file systems as experimental objects are respectively Ceph, HDFS and GlusterFs.
- the file sizes are substituted into the respective regression model formulas shown in Table 2, and an error is calculated by using a least square method; when the overall error is minimal, a curve fitting effect is optimal, wherein, read and write curves of several types of distributed file systems are fitted respectively, and it can be seen from FIG. 7 that, it is only necessary to perform first-order fitting on HDFS write with Ceph Write and Ceph Read to achieve an optimal effect, while other types require higher-order fitting.
- Table 3 shows throughput rate fit curves of different distributed file systems based on experiments and fitting calculations.
- the target file systems are respectively Ceph, HDFS and GlusterFs; and it is found through experiments that, HDFS write, Ceph Write, and CephRead achieve optimal effects with only the first-order fitting, while other types require higher-order fitting.
- GlusterFS write y(k) 8.43731 + 0.10894e ⁇ 0.04518k cos( ⁇ 38.07854k) ⁇ curve 1.89347e ⁇ 0.04518k sin( ⁇ 38.07854k) + 1.49443e ⁇ 0.61613k cos(33.75146k) ⁇ 0.05625e ⁇ 0.61613k sin(33.75146k)
- Table 4 is a physical environment configuration example of a high-performance hybrid file system architecture experiment as an example; and as shown below, in order to meet architecture requirements, the physical environment of the experiment is mainly divided into one node for a client and 6 nodes for underlying storage servers, as well as one metadata manage server node, wherein, the underlying physical storage node may be expanded and hidden from the client, and all node operating systems are ubuntu14.04, with 1T capacity.
- a file to be migrated may be determined; the migration gain is an expected gain of migrating the file from the file system where it is located to a certain distributed file system, and thus, a destination distributed file system to which the file is to be migrated is also determined.
- Step S 630 migrating the file that has been determined to be migrated.
- file migration can be performed in order from a file with a largest migration gain, until a usage rate difference between file systems meets requirements, and the migration is complete.
- the migration process is a C-D process, that is, copying and then deleting, wherein, mandatory locks are added in a file operation process.
- a first “for” loop is to determine a difference in usage rate between any two file systems; when there is a difference in usage rate between two file systems that is greater than p0, that is, when load disequilibrium occurs to the file system architecture, a migration procedure is enabled; line 14 is to calculate a migration degree of all files of a file system that needs migration and other file systems; and line 15 is to sort according to the calculated migration degree.
- Lines 16 to 23 are to migrate: firstly copy the file to the target file system, and then delete the file from the original file system, until the difference in usage rate between file systems meets conditions.
- a file storage processing system comprising a memory and a processor, the memory having computer-executable instructions stored thereon, and when executed by a controller, the computer-executable instructions being operable to execute the above-described file storage processing method.
- a file migration processing system comprising a memory and a processor, the memory having computer-executable instructions stored thereon, and when executed by a controller, the computer-executable instructions being operable to execute the above-described file dynamic migration method.
- a computer-readable storage medium having computer-executable instructions stored thereon, and when executed by a computing device, the computer-executable instructions being operable to execute the above-described file storage processing method.
- a computer-readable storage medium having computer-executable instructions stored thereon, and when executed by a computing device, the computer-executable instructions being operable to execute the above-described file dynamic migration method.
- a metadata manage server in a hybrid file system architecture system, which interacts with a client and a plurality of distributed file systems, the metadata manage server maintaining a pre-configured storage rule below, and being configured to perform a method below: acquiring storage attributes of a file to be stored, wherein, the storage attributes at least include a size of the file; determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored; determining a distributed file system that needs file migration; determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration; and migrating the file that has been determined to be migrated.
- a hybrid file system architecture system comprising the above-described metadata manage server and a plurality of different types of distributed file systems.
- processors There may be one or more of the above-described processors, which may be concentrated on one physical address or distributed on a plurality of physical addresses.
- Each of the one or more processors may be a device that can execute machine-readable and executable instructions, for example, a computer, a microprocessor, a microcontroller, an integrated circuit, a microchip, or any other computing device.
- the one or more processors may be coupled to a communication path that provides signal interconnection between different devices, components and/or modules.
- the communication path may cause any number of processors to be communicatively coupled to each other, and may allow modules coupled to the communication path to operate in a distributed computing environment. Specifically, each module may be operated as a node that can send and/or receive data.
- “being communicatively coupled” refers to that mutually coupled components may exchange data with each other, for example, in a form of electrical signals, electromagnetic signals, and optical signals.
- the above-described memory may include one or more memory modules.
- the memory module may be configured to include a volatile memory, for example, a Static Random Access Memory (S-RAM) and a Dynamic Random Access Memory (D-RAM), as well as a non-volatile memory, for example, a flash memory, a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) and an Electrically Erasable Programmable Read-Only Memory (EEPROM).
- a volatile memory for example, a Static Random Access Memory (S-RAM) and a Dynamic Random Access Memory (D-RAM)
- non-volatile memory for example, a flash memory, a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) and an Electrically Erasable Programmable Read-Only Memory (EEPROM).
- ROM Read-Only Memory
- EPROM Erasable Programmable Read-Only Memory
- EEPROM Electrically Erasable Programmable
- the machine-readable and executable instructions may be logics or algorithms written in any programming language, for example, a machine language that can be directly executed by a processor, or an assembly language that can be compiled or assembled into machine-readable instructions and stored in the memory module, an Object-Oriented Programming (OOP) language, Javascript language, a microcode, etc.
- OOP Object-Oriented Programming
- the machine-readable and executable instructions may also be written in a hardware description language, for example, logics implemented by a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), etc.
- FPGA Field Programmable Gate Array
- ASIC Application Specific Integrated Circuit
- the high-performance hybrid file system architecture structure, the file storage processing method, the file dynamic migration method and the metadata manage server make comprehensive use of the performance advantages of a variety of distributed file systems to process various file storage problems, which, committed to improving a universal high-performance file system, can cope with storage problems of files of various types under various complex environments, and all have high performance.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Library & Information Science (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- The present disclosure relates to a technical field of distributed file systems, and more particularly, to a hybrid file system architecture having a plurality of distributed file systems hybridized therein, file storage, dynamic migration, and application thereof.
- In the research field of distributed file systems, with respect to different fields and application scenarios, different research institutes as well as enterprises and institutions may design distributed file systems of different architectures to meet specific needs, for example, the Taobao File System (TFS) meets users' storage needs while meeting Taobao's massive picture storage optimization, HDFS is mainly applied to distributed computing and has good processing performance for large data streams, Glusterfs adopts a non-metadata server idea to optimize small file storage and operations involving large amounts of metadata, FaceBook has mainly improved HDFS according to a size range of stored files and content requirements, Ceph is committed to proposing a highly available distributed file system and designing a plurality of metadata servers to improve metadata performance. In view of the different design objectives of the above-described different file systems, universality of the file systems is relatively poor. For example, HDFS has high read and write performance with respect to large files. Experimental analysis shows that it has better read and write performance when files are larger than 8M; while Glusterfs has better I\O performance with respect to files smaller than 8M; and so on.
- In the prior art, there is no related solution for how to use different file systems for storage to improve storage efficiency.
- One of the technical problems to be solved by the present disclosure is: in a case where a variety of high-performance file systems coexist, how to make full use of performance advantages of various file systems, integrate a variety of file systems, make full use of their respective advantages, improve storage efficiency, improve overall performance, and comprehensively process various situations to achieve optimal overall performance of the file systems.
- In this regard, the present disclosure is proposed.
- According to one aspect of the present disclosure, there is provided a file storage processing method applied in a hybrid file system architecture including a plurality of different types of distributed file systems, for determining in which distributed file system a file to be stored is stored, the file storage processing method comprising: acquiring storage attributes of the file to be stored, wherein, the storage attributes at least include a size of the file; determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored; and storing the file to be stored in the determined distributed file system.
- Optionally, the storage rule is an intelligent storage model obtained through learning by using an artificial intelligence learning algorithm based on a training sample set; and features of each training sample of the training sample set include the storage attributes of the file and a label of the file system to which the file has been determined to be assigned.
- Optionally, the storage attributes of the file further include: access mode type, access permission level, and associated users of the file, wherein, the access mode type is selected from one of: read-only, write-only, read-write, and executable.
- Optionally, the hybrid file system architecture includes a metadata manage server, wherein, the storage rule is stored in a non-volatile storage medium, and meanwhile maintained in a metadata manage server memory; and the storage rule is dynamically updated, wherein, the determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored includes: reading the storage rule from the metadata manage server, and determining, according the read storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored.
- Optionally, the storage rule is further maintained in a remote standby node.
- Optionally, the artificial intelligence learning algorithm is a decision tree, and the intelligent storage model is a decision tree model constructed based on training data.
- Optionally, optimization processing including pruning and cross-validation is performed in construction of the decision tree model.
- Optionally, the file storage processing method further comprises: receiving, by the metadata manage server, from a client a request to read a file from the hybrid file system architecture or update a file therein; acquiring, by the metadata manage server, path information of the file to be read or updated, to further obtain storage location information of the file; returning, by the metadata manage server, the storage location of the file to be read or updated to the client; and communicating, by the client, with a corresponding distributed file system according to the returned storage location, to perform actual read operation or update operation.
- Optionally, I/O performance of the file on each of the distributed file systems is determined experimentally as follows: acquiring a read throughput rate Firt and a write throughput rate Fiwt of the file on each distributed file system through experiments, the read throughput rate Firt being a data size of the file read per second, and the write throughput rate Fiwt being a data size of the file written per second; and calculating a sum of the read throughput rate Firt and the write throughput rate Fiwt of the file in each distributed file system as the I/O performance of the file on each of the distributed file systems.
- Optionally, the file storage processing method further comprises: determining a distributed file system that needs file migration; determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration; and migrating the file that has been determined to be migrated.
- Optionally, the determining a distributed file system that needs file migration includes: calculating a difference in usage rate between any two distributed file systems; and determining that a distributed file system with a higher usage rate needs file migration, when the difference in usage rate is greater than a predetermined threshold.
- Optionally, the determining a file to be migrated on the distributed file system, for the distributed file system that needs file migration includes: calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems; and determining the file to be migrated and the migration destination of the file based on sorting of migration gains of migrating respective files to other distributed file systems.
- Optionally, the calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems includes: referring to the distributed file system that needs file migration as a distributed file system i, referring to any one of the other distributed file systems as a distributed file system j, and referring to the file on the distributed file system i as a file x; obtaining read throughput and write throughput of the file x on the distributed file system i, and predicting read throughput and write throughput of the file x on the distributed file system j; obtaining a read frequency and a write frequency of the file x on the distributed file system i; and calculating a migration gain of migrating the file x from the distributed file system i to the distributed file system j, at least based on the size of the file x, the read frequency and the write frequency of the file x on the distributed file system i, the read throughput and the write throughput of the file x on the distributed file system i, as well as the read throughput and the write throughput of the file x on the distributed file system j.
- Optionally, the migration gain of migrating the file x from the distributed file system i to the distributed file system j is calculated based on a formula below:
-
diffx(DFSi,DFSj)=(s x /F xrt(DFSi)−s x /F xrt(DFSj))*F xrf+(s x /F xwt(DFSi)−s x /F xwt(DFSj))*F xwf (1) - DFSi and DFSj represent the distributed file systems i, j; Fxrt(DFSi) and Fxrt(DFSj) are respectively read throughput rates of the file x in the distributed file systems i, j; Fxwt(DFSi) and Fxwt(DFSj) are write throughput rates of the file x in the distributed file systems i, j; a throughput rate is a size of a file read and written per second; the read throughput rate and the write throughput rate are functions of the file size; Fxrf and Fxwf are respectively the read frequency and the write frequency of the file x in the distributed file system i; and sx is a size of the file x to be migrated in the file system.
- Optionally, the predicting read throughput and write throughput of the file x on the distributed file system j includes: predicting by using a predetermined regression model, the regression model being selected from one of:
-
Model Regression equation First-order model y(k) = a0 + a1e−pk Second-order model y(k) = a0 + a1e−pk + a2e−p 2 kThird-order model y(k) = a0 + a1e−pk + be−δwk cos (w√{square root over (1 − δ2)}k) + ce−δwk sin (w√{square root over (1 − δ2)}k) Fourth-order model y(k) = a0 + b1e−δ 1 w 1k cos (w1√{square root over (1 − δ1 2)}k) +c1e−δ 1 w 1k sin (w1√{square root over (1 − δ1 2)}k) +b2e−δ 2 w 2k cos (w2√{square root over (1 − δ2)}2k) +c2e−δ 2 w 2k sin (w2√{square root over (1 − δ2 2)}k) - The predetermined regression model is determined through a fitting process and a selecting process below: inputting file training data to different types of regression models; calculating unknown parameters by using a least square method; fitting to obtain the different types of regression models after the fitting; and selecting a regression model with a best fitting effect from the different types of regression models after the fitting as the predetermined regression model.
- Optionally, the obtaining a read frequency and a write frequency of the file x on the distributed file system i includes: obtaining the read frequency and the write frequency of the file x on the distributed file system i by querying the metadata manage server.
- According to another aspect of the present disclosure, there is provided a file dynamic migration method applied in a hybrid file system architecture including a plurality of different types of distributed file systems, comprising: determining a distributed file system that needs file migration; determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration; and migrating the file that has been determined to be migrated.
- Optionally, the determining a distributed file system that needs file migration includes: calculating a difference in usage rate between any two distributed file systems; and determining that a distributed file system with a higher usage rate needs file migration, when the difference in usage rate is greater than a predetermined threshold.
- Optionally, the determining a file to be migrated on the distributed file system, for the distributed file system that needs file migration includes: calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems; and determining the file to be migrated and the migration destination of the file based on sorting of migration gains of migrating respective files to other distributed file systems.
- Optionally, the calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems includes: referring to the distributed file system that needs file migration as a distributed file system i, referring to any one of the other distributed file systems as a distributed file system j, and referring to the file on the distributed file system i as a file x; obtaining read throughput and write throughput of the file x on the distributed file system i, and predicting read throughput and write throughput of the file x on the distributed file system j; obtaining a read frequency and a write frequency of the file x on the distributed file system i; and calculating a migration gain of migrating the file x from the distributed file system i to the distributed file system j, at least based on the size of the file x, the read frequency and the write frequency of the file x on the distributed file system i, the read throughput and the write throughput of the file x on the distributed file system i, as well as the read throughput and the write throughput of the file x on the distributed file system j.
- Optionally, the migration gain of migrating the file x from the distributed file system i to the distributed file system j is calculated based on a formula below:
-
diffx(DFSi,DFSj)=(s x /F xrt(DFSi)−s x /F xrt(DFSj))*F xrf+(s x /F xwt(DFSi)−s x /F xwt(DFSj))*F xwf (1) - DFSi and DFSj represent the distributed file systems i, j; Fxrt(DFSi) and Fxrt(DFSj) are respectively read throughput rates of the file x in the distributed file systems i, j; Fxwt(DFSi) and Fxwt(DFSj) are write throughput rates of the file x in the distributed file systems i, j; a throughput rate is a size of a file read and written per second; the read throughput rate and the write throughput rate are functions of the file size; Fxrf and Fxwf are respectively the read frequency and the write frequency of the file x in the distributed file system i; and sx is a size of the file x to be migrated in the file system.
- Optionally, the predicting read throughput and write throughput of the file x on the distributed file system j includes:
- Predicting by using a predetermined regression model, the regression model being selected from one of:
-
Model Regression equation First-order model y(k) = a0 + a1e−pk Second-order model y(k) = a0 + a1e−pk + a2e−p 2 kThird-order model y(k) = a0 + a1e−pk + be−δ6wk cos (w√{square root over (1 − δ2)}k) + ce−δwk sin (w√{square root over (1 − δ2)}k) Fourth-order model y(k) = a0 + b1e−δ 1 w 1k cos (w1√{square root over (1 − δ1 2)}k) +c1e−δ 1 w 1k sin (w1√{square root over (1 − δ1 2)}k) +b2e−δ 2 w 2k cos (w2√{square root over (1 − δ2 2)}k) +c2e−δ 2 w 2k sin (w2√{square root over (1 − δ2 2)}k) - The predetermined regression model is determined through a fitting process below: inputting file training data to different regression models; calculating unknown parameters by using a least square method; and obtaining a curve with a best fitting effect as the predetermined regression model.
- Optionally, the obtaining a read frequency and a write frequency of the file x on the distributed file system i includes: obtaining the read frequency and the write frequency of the file x on the distributed file system i by querying the metadata manage server.
- According to another aspect of the present disclosure, there is provided a file storage processing device, comprising a memory and a processor, the memory having computer-executable instructions stored thereon, and when executed by a controller, the computer-executable instructions being operable to execute the above-described file storage processing method.
- According to another aspect of the present disclosure, there is provided a file migration processing system, comprising a memory and a processor, the memory having computer-executable instructions stored thereon, and when executed by a controller, the computer-executable instructions being operable to execute the above-described file dynamic migration method.
- According to another aspect of the present disclosure, there is provided a computer-readable storage medium, having computer-executable instructions stored thereon, and when executed by a computing device, the computer-executable instructions being operable to execute the above-described file storage processing method.
- According to another aspect of the present disclosure, there is provided a computer-readable storage medium, having computer-executable instructions stored thereon, and when executed by a computing device, the computer-executable instructions being operable to execute the above-described file dynamic migration method.
- According to another aspect of the present disclosure, there is provided a metadata manage server in a hybrid file system architecture system, which interacts with a client and a plurality of distributed file systems, the metadata manage server maintaining a pre-configured storage rule below, and being configured to perform a method below: acquiring storage attributes of a file to be stored, wherein, the storage attributes at least include a size of the file; determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored; determining a distributed file system that needs file migration; determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration; and migrating the file that has been determined to be migrated.
- According to another aspect of the present disclosure, there is provided a hybrid file system architecture system, comprising a metadata manage server and a plurality of different types of distributed file systems.
- The file intelligent storage policy according to the embodiment of the present disclosure is adopted to make full use of storage features of a variety of file systems, integrate a variety of file systems, and intelligently select the file underlying storage policy according to the file feature attributes, to optimize file read and write performances.
- Preferably, the intelligent storage policy is the decision tree model; the training data is acquired through previous experiments, then the decision tree model is obtained by training, subsequently the stored file attributes are used as input of the decision tree model, and output thereof is just the file storage location, so as to make the file read and write characteristics the best.
- Further, a file dynamic migration policy is adopted. Preferably, file system load equalization is used as an evaluation index of the file system, and it is decided whether to migrate the file and to which file system the file is migrated, according to storage space usage rates of different underlying file systems, read and write I/O of different files in different file systems, as well as different read and write frequencies of different files, so as to satisfy usage equalization of different file systems and also minimize performance degradation.
- By means of experimental comparison, it is concluded that the present disclosure can greatly improve performances of different underlying files.
- The high-performance hybrid file system architecture structure, the file storage processing method, the file dynamic migration method and the metadata manage server according to the embodiments of the present disclosure, make comprehensive use of the performance advantages of a variety of distributed file systems to process various file storage problems, which, committed to improving a universal high-performance file system, can cope with storage problems of files of various types under various complex environments, and all have high performance.
-
FIG. 1 shows a structural schematic diagram of a hybrid file system architecture according to an embodiment of the present disclosure; -
FIG. 2 shows a flow chart of an applied file storage processing method in a hybrid file system architecture according to an embodiment of the present disclosure; -
FIGS. 3A to 3E show schematic diagrams of an exemplary process of constructing an intelligent storage policy decision tree; -
FIG. 4 shows a sequence chart of writing a file in a hybrid file system architecture according to an embodiment of the present disclosure; -
FIG. 5 shows a sequence chart of corresponding operations caused by a file read request or update request from a client after a file has been stored in a hybrid file system architecture; -
FIG. 6 shows an overall flow chart of a file dynamic migration method according to an embodiment of the present disclosure; and -
FIG. 7 shows a schematic diagram of comparison between a throughput fit curve and an actual curve of respective distributed file systems obtained through experiments according to an embodiment of the present disclosure. - The following is to disclose the present disclosure so as to enable those skilled in the art to implement the present disclosure. Preferred embodiments as described below are merely exemplary, and those skilled in the art may conceive of other obvious modifications. Basic principles the present disclosure defined in the following description may be used in other embodiments, modifications, improvements, equivalents, and other technical solutions without departing from the spirit and the scope of the present disclosure.
- The terms and words used in the following description and claims are not limited to literal meanings, but are merely used by the inventor to enable a clear and consistent understanding of the present disclosure. Accordingly, it should be apparent to those skilled in the art that the following description of various embodiments of the present disclosure is provided for illustration only, rather than limiting the present disclosure as defined by the appended claims and their equivalents.
- The terminology used herein is for describing various embodiments only and is not intended to limit the same. As used herein, a singular form is intended to include a plural form as well, unless otherwise clearly indicated by the context. It will be further understood that the terms “including” and/or “having”, as used in the specification, specify presence of features, numbers, steps, operations, components, items or combinations thereof as described, and do not exclude presence or addition of one or more features, numbers, steps, operations, components, items or combinations thereof.
- The technical terms or scientific terms here should be of general meaning as understood by those ordinarily skilled in the art, as long as the terms are not defined differently. It should be understood that the terms defined in commonly used dictionaries have meanings that are consistent with the meanings of terms in the prior art.
- Hereinafter, the present disclosure will be further described in detail in conjunction with the accompanying drawings and specific embodiments.
-
FIG. 1 shows a structural schematic diagram of a hybridfile system architecture 1000 according to an embodiment of the present disclosure, mainly comprising three parts: anunderlying storage system 1100, a metadata manageserver 1200, and aclient 1300. The diagram shows that theunderlying storage system 1100 includes various types of distributed file systems DFS-1, DFS-2 . . . DFS-n, such as Ceph, HDFS, GlusterFs, etc., which are used to actually store data and are hidden from, or say, transparent to users, but the users do not know in which distributed file system the data they care about is stored; theclient 1300 is for users to read and write data, and provides a variety of frequently-used file system universal interfaces; the metadata manageserver 1200 is a core module of the hybrid file system architecture; according to one embodiment, the metadata manageserver 1200 stores an intelligentstorage decision policy 1210 and adynamic migration policy 1230, and at a same time, may store a part ofmetadata 1220; the metadata manageserver 1200, in response to the client's file write request, determines a file storage location according to the file intelligentstorage decision policy 1210, and feeds back the same to the client; and the metadata manageserver 1200 monitors usage situation of respective distributed file systems DFS-1, . . . , DFS-n, and performs file migration between distributed file systems according to the file dynamic migration policy when severe disequilibrium in usage rate occurs between file systems, so as to maintain relative equalization in usage rate between the hybrid distributed file systems. -
FIG. 2 shows a flow chart of an applied filestorage processing method 200 in a hybrid file system architecture according to an embodiment of the present disclosure. - As shown in
FIG. 2 , step S210: acquiring storage attributes of a file to be stored, wherein, the storage attributes at least include a size of the file. - In one example, the storage attributes of the file further include: access mode type, access permission level, and associated users of the file, wherein, the access mode type is selected from one of: read-only, write-only, read-write, and executable.
- In one example, a metadata manage server obtains the storage attributes of the file to be stored from a client, stores and maintains the same as metadata in its own memory, as shown in
FIG. 1 . - Step S220: determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored.
- In one example, the storage rule is an intelligent storage model obtained through learning by using an artificial intelligence learning algorithm based on a training sample set; and features of each training sample of the training sample set include the storage attributes of the file and a label of the file system to which the file has been determined to be assigned.
- In one example, the label of the file system to which the file has been determined to be assigned is determined based on experimentally determined I/O performance of the file on each of the distributed file systems, and the I/O performance includes a read throughput rate and/or a write throughput rate.
- In one example, in consideration of problems of metadata server node failure and memory data loss, the storage rule, for example, may be stored in a non-volatile storage medium such as a hard disk while the decision tree model is maintained and stored in the memory. In another example, for more security reasons, the storage rule is simultaneously sent to a remote standby node.
- In one example, the storage rule is dynamically updated, for example, according to a certain period; through learning by using the artificial intelligence learning algorithm again, a newly learned storage rule is updated to the metadata manage server; and the storage rule stored in the hard disk and/or the remote node is updated synchronously.
- In one example, the determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored includes: reading the storage rule from the metadata manage server, and determining, according the read storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored.
- In one example, the artificial intelligence learning algorithm is a decision tree, and the intelligent storage model is a decision tree model constructed based on training data. Subsequently, an example of a process of constructing the decision tree model will be described in detail with reference to the drawings.
- For example, in conjunction with the hybrid file system architecture of
FIG. 1 , the metadata manageserver 1200 determines in which distributed file system the file is stored, by using theintelligent storage model 1210, based on the storage attributes of the file obtained from the client, and returns the same to theclient 1300. - Step S230: storing the file to be stored in the determined distributed file system.
- Specifically, for example, the
client 1300 directly communicates with the distributedfile system 1100, and the distributed file system stores the file in the determined specific distributed file system. - By using the file storage processing method according to the embodiment of the present disclosure, the specific distributed file system is selected according to the attributes of the file based on the predetermined storage rule, so as to, for example, improve storage performance and efficiency, and solve the technical problem of how to use different file systems for storage to improve storage efficiency. In order to improve universality of the file systems, a variety of distributed file systems are integrated, and system performance is comprehensively improved, by acquiring performance characteristics of various types of distributed file systems for various files through, for example, machine learning in advance, and by comprehensively utilizing advantages of different distributed file systems in a file access process. Specifically, for example, for different distributed file systems, processing attributes of files with different attributes when stored on these distributed file systems are obtained in advance, for example, I/O performances of files of different sizes on different distributed file systems may be obtained; rules may be established according to the knowledge obtained in advance; and these rules are used when a file is stored subsequently.
- Hereinafter, the construction method of the decision tree model will be described in conjunction with one embodiment.
- Before the construction method of the decision tree model is described, it is firstly explained how to obtain the training sample dataset.
- In one example, file of different sizes are selected as experimental data, tested and assessed in a variety of distributed file systems, to acquire a read throughput rate and a write throughput rate Firt, Fiwt of different files in different distributed file systems; and then one with a maximum result is selected as a training data label according a formula below.
-
dfs=max(F irt +F iwt),i=1,2 . . . ,m(m file systems) - In a specific embodiment, the storage attributes of the file are extracted, including file size, access mode, access permission, and owner; a training data label of each file determined through the above-described experiment is obtained; and data shown in Table 1 is acquired as the training data.
-
TABLE 1 Training data File Access Label Size Access Model Permission Owner (R + W)/2 5K Read-only 0777 Root DFS1 50K Read-only 0777 User1 DFS2 500K Read-only 0777 User2 DFS3 5M Write-only 0777 User1 DFS1 5M Write-only 077 User2 DFS1 5M Read-write 0777 Root DFS2 10M Exec 0777 User1 DFS3 - An example of a simplified decision tree construction process is given below with reference to
FIGS. 3A to 3E . - In the example of
FIGS. 3A to 3E , a simplified training data form is used, to acquire a 3-tiple dataset including size, permission, and target DFS; each sample includes features such as size, permission, and target DFS; and the training dataset is as shown inFIG. 3A . - Then, on a principle of maximum information entropy, a size that has greatest impact on classification is selected as a classification node to construct the decision tree in
FIG. 3C , and the training data is divided into m groups according to the size (the file sizes are divided into m categories) based on the decision tree, m is an integer greater than or equal to 2; in the example, m=3, so inFIG. 3B , the data is divided into 3 groups, respectively, 1M, 5M and 9M inFIG. 3B , which are further divided into 3 branches as shown inFIG. 3D with permission selected as a classification node, on the principle of maximum information entropy again. At this time, all data has been classified. Finally, part of leaf nodes are combined and constructed to obtainFIG. 3E , and thus the decision tree is constructed. - In one example, optimization processing including pruning and cross-validation, etc. is performed in construction of the decision tree model.
- It should be noted that, in the disclosure, as the artificial intelligence learning algorithm for determining the distributed file system in which the file should be stored according to the file attributes, the decision tree is provided as a preferred example, not as a limitation; on the contrary, other artificial intelligence learning algorithm may also be selected, for example, a deep neural network, a support vector machine, nearest neighbor learning, etc.
- File operations on the file system include initial storage operation (write operation), and subsequent read and possible update operations.
-
FIG. 4 shows a sequence chart of writing a file in a hybrid file system architecture according to an embodiment of the present disclosure. - As shown in
FIG. 4 , in step S410, a client sends a file write access request to a metadata manage server. - In step S420, the metadata manage server acquires file attribute information.
- In step S430, the metadata manage server acquires a decision tree model maintained by the metadata manage server.
- In step S440, the metadata manage server obtains a storage location of the file to be written, based on the file storage attribute information and the decision tree model.
- In step S450, the metadata manage server returns the storage location of the file to the client.
- In step S460, the client communicates with a corresponding distributed file system according to the returned storage location, to perform an actual file write operation.
-
FIG. 5 shows a sequence chart of corresponding operations caused by a file read request or update request from a client after a file has been stored in a hybrid file system architecture. - In step S510, the client sends a file read request or update request to a metadata manage server.
- In step S520, the metadata manage server acquires a file path from the read request or the update request.
- In step S530, the metadata manage server queries a metadata database, to acquire a storage location of the file to be read or updated.
- In step S540, the metadata manage server feeds back the storage location of the file to the client.
- In step S550, the client communicates with a corresponding distributed file system according to the returned storage location, and performs actual file read or update operations.
- In the storage process, with increase of file storage, storage efficiency of storage space of some distributed file systems will decrease; in order to solve the problem, in an optional implementation mode, file migration may also be performed, that is, a file stored in one distributed file system is migrated to another distributed file system, so that storage capacity of the system may be further improved through migration, to promote load equalization between respective distributed file systems.
- Hereinafter, an embodiment of a
method 600 for migrating the file between distributed file systems will be described in conjunction withFIG. 6 . - Step S610: determining a distributed file system that needs file migration.
- In one example, it is determined every preset period whether there is a distributed file system that needs file migration.
- Alternatively, usage situation of respective distributed file systems may also be continuously monitored, to judge whether file migration is needed.
- Usage rates of the respective distributed file systems may be investigated, to determine a situation of load equalization, or say, usage equalization between the respective distributed file systems; and in a case where severe disequilibrium in usage rate occurs, file migration, specifically, file emigration, is performed on a distributed file system with an excessively high usage rate.
- Specifically, in one example, the determining a distributed file system that needs file migration includes: calculating a difference in usage rate between any two distributed file systems; and determining that a distributed file system with a higher usage rate needs file migration, when the difference in usage rate is greater than a predetermined threshold.
- For example, if a usage rate of a distributed file system A is 90% while a usage rate of a distributed file system B is only 10%, it is obvious that severe load disequilibrium occurs, then a file migration operation may be performed on the distributed file system A.
- In the disclosure, a usage rate of a distributed file system represents that the file system usage rate is a ratio of actual use capacity of the file system to original capacity.
- Step S620: determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration.
- In one example, the determining a file to be migrated on the distributed file system, for the distributed file system that needs file migration includes: calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems; and determining the file to be migrated and the migration destination of the file based on sorting of migration gains of migrating respective files to other distributed file systems.
- In one example, the calculating a migration gain of migrating each file in the distributed file system that needs file migration to any one of other distributed file systems may be performed as follows:
- For convenience of description, referring to the distributed file system that needs file migration as a distributed file system i, referring to any one of the other distributed file systems as a distributed file system j, and referring to the file on the distributed file system i as a file x;
- Obtaining read throughput and write throughput of the file x on the distributed file system i, and predicting read throughput and write throughput of the file x on the distributed file system j;
- Obtaining a read frequency and a write frequency of the file x on the distributed file system i; and
- Calculating a migration gain of migrating the file x from the distributed file system i to the distributed file system j, at least based on the size of the file x, the read frequency and the write frequency of the file x on the distributed file system i, the read throughput and the write throughput of the file x on the distributed file system i, as well as the read throughput and the write throughput of the file x on the distributed file system j.
- In a preferred example, the migration gain of migrating the file x from the distributed file system i to the distributed file system j is calculated based on a formula below:
-
diffx(DFSi,DFSj)=(s x /F xrt(DFSi)−s x /F xrt(DFSj))*F xrf+(s x /F xwt(DFSi)−s x /F xwt(DFSj))*F xwf (1) - DFSi and DFSj represent the distributed file systems i, j; Fxrt(DFSi) and Fxrt(DFSj) are respectively read throughput rates of the file x in the distributed file systems i, j; Fxwt(DFSi) and Fxwt(DFSj) are write throughput rates of the file x in the distributed file systems i, j; a throughput rate is a size of a file read and written per second; the read throughput rate and the write throughput rate are functions of the file size; Fxrf and Fxwf are respectively the read frequency and the write frequency of the file x in the distributed file system i; and sx is a size of the file x to be migrated in the file system.
- In the above-described Formula (1), a first part of the summation on the right side of the equal sign represents an overall performance improvement made by migrating the file x from the distributed file system i to the distributed file system j, or say, a comprehensive migration gain in file size and read performance, in consideration of file size (a factor of file system usage rate level), read performance throughput rate, and read frequency; and a second part of the summation represents an overall performance improvement made by migrating the file x from the distributed file system i to the distributed file system j, or say, a comprehensive migration gain in file size and write performance, in consideration of file size, write performance throughput rate, and write frequency.
- Formula (1) indicates that, the larger the file size, the higher the read and write frequencies, the greater the throughput rate of the file on the distributed file system j, and the higher the migration gain of migrating the file to the distributed file system j with respect to the distributed file system i.
- In one example, in the above-described Formula (1), the read frequency and the write frequency of the file x in the distributed file system i may be obtained by querying the metadata manage server.
- It should be noted that, Formula (1) is a preferred example of calculating a migration gain of a file, but it is not a limitation; and other calculation formulas may also be designed according to needs.
- Here, the read throughput and the write throughput of the file x on the distributed file system i may be obtained by, for example, actual observation, or may also be obtained by prediction; while the read throughput and the write throughput of the file x on the distributed file system j may only be obtained by prediction.
- In one example, predicting the read throughput and the write throughput of the file x on a distributed file system may be performed, for example, by using a predetermined regression model, and the regression model is selected from one of:
-
Model Regression equation First-order model y(k) = a0 + a1e−pk Second-order model y(k) = a0 + a1e−pk + a2e−p 2 kThird-order model y(k) = a0 + a1e−pk + be−δwk cos (w√{square root over (1 − δ2)}k) + ce−δwk sin (w√{square root over (1 − δ2)}k) Fourth-order model y(k) = a0 + b1e−δ 1 w 1k cos (w1√{square root over (1 − δ1 2)}k) +c1e−δ 1 w 1k sin (w1√{square root over (1 − δ1 2)}k) +b2e−δ 2 w 2k cos (w2√{square root over (1 − δ2 2)}k) +c2e−δ 2 w 2k sin (w2√{square root over (1 − δ2 2)}k) - As an example, the predetermined regression model may be determined through a fitting process and a selecting process below: inputting file training data to different types of regression model formulas; calculating unknown parameters by using a least square method; fitting to obtain the different types of regression models after the fitting; and selecting a regression model with a best fitting effect from the different types of regression models after the fitting as the predetermined regression model.
-
FIG. 7 shows a schematic diagram of comparison between a throughput fit curve and an actual curve of respective distributed file systems obtained through experiments according to an embodiment of the present disclosure. InFIG. 7 , an abscissa represents different file sizes, and an ordinate represents throughput rates. - Target distributed file systems as experimental objects are respectively Ceph, HDFS and GlusterFs. According to actual running results, the file sizes are substituted into the respective regression model formulas shown in Table 2, and an error is calculated by using a least square method; when the overall error is minimal, a curve fitting effect is optimal, wherein, read and write curves of several types of distributed file systems are fitted respectively, and it can be seen from
FIG. 7 that, it is only necessary to perform first-order fitting on HDFS write with Ceph Write and Ceph Read to achieve an optimal effect, while other types require higher-order fitting. - Table 3 shows throughput rate fit curves of different distributed file systems based on experiments and fitting calculations. In Table 3, as described above, the target file systems are respectively Ceph, HDFS and GlusterFs; and it is found through experiments that, HDFS write, Ceph Write, and CephRead achieve optimal effects with only the first-order fitting, while other types require higher-order fitting.
-
TABLE 3 Fitting parameters of target file systems Curves Fitting results HDFS write curve y(k) = 10.39065 − 6.38257e−0.54163k Ceph write curve y(k) = 8.79252 − 4.65085e−0.06894ky(k) GlusterFS write y(k) = 8.43731 + 0.10894e−0.04518kcos(−38.07854k) − curve 1.89347e−0.04518k sin(−38.07854k) + 1.49443e−0.61613k cos(33.75146k) − 0.05625e−0.61613k sin(33.75146k) HDFS read curve y(k) = 11.0027 − 49.0537e−97.8321k − 5.3826e−2.9596k cos(25.1327k) − 42.3298e−2.9596k sin(25.1327k) Ceph read curve y(k) = 11.128770 − 1.063236e−0.718258k GlusterFS read y(k) = −0.0433 + 0.1108e0.00013k − curve 6.2434e−4.3548k cos(0.000019k) + 17.2060e−4.3548k sin(0.000019k) - Table 4 is a physical environment configuration example of a high-performance hybrid file system architecture experiment as an example; and as shown below, in order to meet architecture requirements, the physical environment of the experiment is mainly divided into one node for a client and 6 nodes for underlying storage servers, as well as one metadata manage server node, wherein, the underlying physical storage node may be expanded and hidden from the client, and all node operating systems are ubuntu14.04, with 1T capacity.
-
TABLE 4 Physical environment for experiment Node number File system Host name Usage Notes Node 1MMS Master Metadata 1TB capacity management Node 2 HDFS HDFS1 Name node 1TB capacity Node 3 HDFS HDFS2 Datanode 1TB capacity Node 4 Ceph Ceph1 mds, mon, osd 1TB capacity Node 5 Ceph Ceph2 osd 1TB capacity Node 6 GlusterFS GlusterFS1 Glsuterfs server1 1TB capacity Node 7 GlusterFS GlusterFS2 Glsuterfs server2 1TB capacity Node 8 Client Client Client 1TB capacity - By using the curve of relationship between the throughput rate of the respective distributed file systems and the file size obtained by fitting in this way, throughput rates of the file on different distributed file systems may be predicted, in a case where file sizes of different files are known.
- After migration gains are sorted, a file to be migrated may be determined; the migration gain is an expected gain of migrating the file from the file system where it is located to a certain distributed file system, and thus, a destination distributed file system to which the file is to be migrated is also determined.
- Step S630: migrating the file that has been determined to be migrated.
- For the respective files sorted according to the migration gains, file migration can be performed in order from a file with a largest migration gain, until a usage rate difference between file systems meets requirements, and the migration is complete. The migration process is a C-D process, that is, copying and then deleting, wherein, mandatory locks are added in a file operation process.
- A pseudo code example that implements the dynamic migration process is given below.
-
Algorithm 1 The Dynamic File Migration FunctionInput: p0, DFSs Output: null 1: for i = 0 to DFSs.size( ) do 2: for j = i to DFSs.size( ) do 3: if (DFSs[i].usage − DFSs[j].usage) > p0 then 4: originalLoc = i 5: destinationLoc = j 6: stop 7: end if 8: end for 9: end for 10: if i=j then 11: return null 12: end if 13: files[ ] = DFSs[originalLoc].files 14: Throuhput [ ] = ClaculateThroughputDegrade (files[ ], DFSs[originalLoc],DFSs[destinationLoc]) 15: migrateList[ ] = sort(Throuhput [ ]) 16: for I = 0 to migrateList.size( ) do 17: data = readFile(migrateList[i], DFSs[originalLoc]) 18: writeFile(data,DFSs[destinatioLoc]) 19: deleteFile(migrateList[i], DFSs[originalLoc]) 20: if (DFSs[orig].usage − DFSs[des].usage )< p0 then 21: return null 22: end if 23: end for - In the above-described pseudo code, a first “for” loop is to determine a difference in usage rate between any two file systems; when there is a difference in usage rate between two file systems that is greater than p0, that is, when load disequilibrium occurs to the file system architecture, a migration procedure is enabled; line 14 is to calculate a migration degree of all files of a file system that needs migration and other file systems; and line 15 is to sort according to the calculated migration degree. Lines 16 to 23 are to migrate: firstly copy the file to the target file system, and then delete the file from the original file system, until the difference in usage rate between file systems meets conditions.
- Through the experiments, it is validated that, for the hybrid file system, dynamic file migration may be performed to achieve usage equalization of the different file systems, and better comprehensive performances that ensures better read and write performance throughput rates.
- According to another embodiment of the present disclosure, there is provided a file storage processing system, comprising a memory and a processor, the memory having computer-executable instructions stored thereon, and when executed by a controller, the computer-executable instructions being operable to execute the above-described file storage processing method.
- According to another embodiment of the present disclosure, there is provided a file migration processing system, comprising a memory and a processor, the memory having computer-executable instructions stored thereon, and when executed by a controller, the computer-executable instructions being operable to execute the above-described file dynamic migration method.
- According to another embodiment of the present disclosure, there is provided a computer-readable storage medium, having computer-executable instructions stored thereon, and when executed by a computing device, the computer-executable instructions being operable to execute the above-described file storage processing method.
- According to another embodiment of the present disclosure, there is provided a computer-readable storage medium, having computer-executable instructions stored thereon, and when executed by a computing device, the computer-executable instructions being operable to execute the above-described file dynamic migration method.
- According to another embodiment of the present disclosure, there is provided a metadata manage server in a hybrid file system architecture system, which interacts with a client and a plurality of distributed file systems, the metadata manage server maintaining a pre-configured storage rule below, and being configured to perform a method below: acquiring storage attributes of a file to be stored, wherein, the storage attributes at least include a size of the file; determining, according to a pre-configured storage rule and the attributes of the file to be stored, in which distributed file system the file to be stored is stored; determining a distributed file system that needs file migration; determining a file to be migrated on the distributed file system and a migration destination, for the distributed file system that needs file migration; and migrating the file that has been determined to be migrated.
- According to another embodiment of the present disclosure, there is provided a hybrid file system architecture system, comprising the above-described metadata manage server and a plurality of different types of distributed file systems.
- There may be one or more of the above-described processors, which may be concentrated on one physical address or distributed on a plurality of physical addresses. Each of the one or more processors may be a device that can execute machine-readable and executable instructions, for example, a computer, a microprocessor, a microcontroller, an integrated circuit, a microchip, or any other computing device. The one or more processors may be coupled to a communication path that provides signal interconnection between different devices, components and/or modules. The communication path may cause any number of processors to be communicatively coupled to each other, and may allow modules coupled to the communication path to operate in a distributed computing environment. Specifically, each module may be operated as a node that can send and/or receive data. In addition, “being communicatively coupled” refers to that mutually coupled components may exchange data with each other, for example, in a form of electrical signals, electromagnetic signals, and optical signals.
- In addition, the above-described memory may include one or more memory modules. The memory module may be configured to include a volatile memory, for example, a Static Random Access Memory (S-RAM) and a Dynamic Random Access Memory (D-RAM), as well as a non-volatile memory, for example, a flash memory, a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) and an Electrically Erasable Programmable Read-Only Memory (EEPROM). In the memory module, any form of machine-readable and executable instruction is stored for accessing by a processor. The machine-readable and executable instructions may be logics or algorithms written in any programming language, for example, a machine language that can be directly executed by a processor, or an assembly language that can be compiled or assembled into machine-readable instructions and stored in the memory module, an Object-Oriented Programming (OOP) language, Javascript language, a microcode, etc. Alternatively, the machine-readable and executable instructions may also be written in a hardware description language, for example, logics implemented by a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), etc.
- The high-performance hybrid file system architecture structure, the file storage processing method, the file dynamic migration method and the metadata manage server according to the embodiments of the present disclosure, make comprehensive use of the performance advantages of a variety of distributed file systems to process various file storage problems, which, committed to improving a universal high-performance file system, can cope with storage problems of files of various types under various complex environments, and all have high performance.
- It should be understood by those skilled in the art that the embodiments of the present disclosure as described above and shown in the drawings are only examples and do not limit the present disclosure. The objective of the present disclosure has been fully and effectively achieved. The functional and structural principles of the present disclosure have been shown and described in the embodiments; and any transformation or modification may be made to the implementing modes of the present disclosure without departing from the principles.
Claims (24)
diffx(DFSi,DFSj)=(s x /F xrt(DFSi)−s x /F xrt(DFSj))*F xrf+(s x /F xwt(DFSi)−s x /F xwt(DFSj))*F xwf (1)
diffx(DFSi,DFSj)=(s x /F xrt(DFSi)−s x /F xrt(DFSj))*F xrf+(s x /F xwt(DFSi)−s x /F xwt(DFSj))*F xwf (1)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/103907 WO2019061132A1 (en) | 2017-09-28 | 2017-09-28 | Hybrid file system architecture, file storage, dynamic migration, and application thereof |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/103907 Continuation WO2019061132A1 (en) | 2017-09-28 | 2017-09-28 | Hybrid file system architecture, file storage, dynamic migration, and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200311035A1 true US20200311035A1 (en) | 2020-10-01 |
US10810169B1 US10810169B1 (en) | 2020-10-20 |
Family
ID=65902527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/831,964 Active US10810169B1 (en) | 2017-09-28 | 2020-03-27 | Hybrid file system architecture, file storage, dynamic migration, and application thereof |
Country Status (3)
Country | Link |
---|---|
US (1) | US10810169B1 (en) |
CN (1) | CN111095233B (en) |
WO (1) | WO2019061132A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11409727B2 (en) * | 2019-09-18 | 2022-08-09 | International Business Machines Corporation | Concurrent execution of database operations |
US11606432B1 (en) * | 2022-02-15 | 2023-03-14 | Accenture Global Solutions Limited | Cloud distributed hybrid data storage and normalization |
CN115904263A (en) * | 2023-03-10 | 2023-04-04 | 浪潮电子信息产业股份有限公司 | Data migration method, system, equipment and computer readable storage medium |
CN117370310A (en) * | 2023-10-19 | 2024-01-09 | 中电云计算技术有限公司 | Distributed file system cross-cluster data increment migration method |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762504A (en) * | 2017-11-29 | 2021-12-07 | 华为技术有限公司 | Model training system, method and storage medium |
CN111177105B (en) * | 2019-12-29 | 2022-03-22 | 浪潮电子信息产业股份有限公司 | Mass file writing method, device, system and medium of distributed file system |
CN111581178A (en) * | 2020-05-12 | 2020-08-25 | 国网安徽省电力有限公司信息通信分公司 | Ceph system performance tuning strategy and system based on deep reinforcement learning |
CN112084156A (en) * | 2020-09-24 | 2020-12-15 | 中国农业银行股份有限公司上海市分行 | Hybrid storage system and self-adaptive backup method of file |
CN112181951B (en) * | 2020-10-20 | 2022-03-25 | 新华三大数据技术有限公司 | Heterogeneous database data migration method, device and equipment |
CN112596675A (en) * | 2020-12-22 | 2021-04-02 | 平安银行股份有限公司 | Multi-storage method and device of data, electronic equipment and computer storage medium |
CN113282538A (en) * | 2021-07-06 | 2021-08-20 | 中国工商银行股份有限公司 | File system management method, device, equipment, storage medium and program product |
CN113608876B (en) * | 2021-08-12 | 2024-03-29 | 中国科学技术大学 | Distributed file system metadata load balancing method based on load type perception |
CN113741823A (en) * | 2021-11-08 | 2021-12-03 | 杭州雅观科技有限公司 | Cloud mixed distributed file storage method |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6990606B2 (en) * | 2000-07-28 | 2006-01-24 | International Business Machines Corporation | Cascading failover of a data management application for shared disk file systems in loosely coupled node clusters |
JP4265245B2 (en) * | 2003-03-17 | 2009-05-20 | 株式会社日立製作所 | Computer system |
US7571168B2 (en) * | 2005-07-25 | 2009-08-04 | Parascale, Inc. | Asynchronous file replication and migration in a storage network |
US8135763B1 (en) * | 2005-09-30 | 2012-03-13 | Emc Corporation | Apparatus and method for maintaining a file system index |
US20070088717A1 (en) * | 2005-10-13 | 2007-04-19 | International Business Machines Corporation | Back-tracking decision tree classifier for large reference data set |
US7752206B2 (en) * | 2006-01-02 | 2010-07-06 | International Business Machines Corporation | Method and data processing system for managing a mass storage system |
JP4939152B2 (en) * | 2006-09-13 | 2012-05-23 | 株式会社日立製作所 | Data management system and data management method |
JP5238235B2 (en) * | 2007-12-07 | 2013-07-17 | 株式会社日立製作所 | Management apparatus and management method |
CN101944124B (en) * | 2010-09-21 | 2012-07-04 | 卓望数码技术(深圳)有限公司 | Distributed file system management method, device and corresponding file system |
CN102456049A (en) * | 2010-10-28 | 2012-05-16 | 无锡江南计算技术研究所 | Data migration method and device, and object-oriented distributed file system |
CN103593347B (en) * | 2012-08-14 | 2017-06-13 | 中兴通讯股份有限公司 | The method and distributed data base system of a kind of equally loaded |
US20160291877A1 (en) * | 2013-12-24 | 2016-10-06 | Hitachi, Ltd. | Storage system and deduplication control method |
CN103778222A (en) * | 2014-01-22 | 2014-05-07 | 浪潮(北京)电子信息产业有限公司 | File storage method and system for distributed file system |
US10534753B2 (en) * | 2014-02-11 | 2020-01-14 | Red Hat, Inc. | Caseless file lookup in a distributed file system |
US9489394B2 (en) * | 2014-04-24 | 2016-11-08 | Google Inc. | Systems and methods for prioritizing file uploads |
CN104994171A (en) * | 2015-07-15 | 2015-10-21 | 上海斐讯数据通信技术有限公司 | Distributed storage method and system |
US10733153B2 (en) * | 2016-02-29 | 2020-08-04 | Red Hat, Inc. | Snapshot management in distributed file systems |
CN105912612B (en) * | 2016-04-06 | 2019-04-05 | 中广天择传媒股份有限公司 | A kind of distributed file system and the data balancing location mode towards the system |
-
2017
- 2017-09-28 CN CN201780094545.5A patent/CN111095233B/en active Active
- 2017-09-28 WO PCT/CN2017/103907 patent/WO2019061132A1/en active Application Filing
-
2020
- 2020-03-27 US US16/831,964 patent/US10810169B1/en active Active
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11409727B2 (en) * | 2019-09-18 | 2022-08-09 | International Business Machines Corporation | Concurrent execution of database operations |
US11606432B1 (en) * | 2022-02-15 | 2023-03-14 | Accenture Global Solutions Limited | Cloud distributed hybrid data storage and normalization |
US11876863B2 (en) * | 2022-02-15 | 2024-01-16 | Accenture Global Solutions Limited | Cloud distributed hybrid data storage and normalization |
CN115904263A (en) * | 2023-03-10 | 2023-04-04 | 浪潮电子信息产业股份有限公司 | Data migration method, system, equipment and computer readable storage medium |
CN117370310A (en) * | 2023-10-19 | 2024-01-09 | 中电云计算技术有限公司 | Distributed file system cross-cluster data increment migration method |
Also Published As
Publication number | Publication date |
---|---|
CN111095233A (en) | 2020-05-01 |
US10810169B1 (en) | 2020-10-20 |
CN111095233B (en) | 2023-09-26 |
WO2019061132A1 (en) | 2019-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200311035A1 (en) | Hybrid file system architecture, file storage, dynamic migration, and application thereof | |
Fu et al. | Fast approximate nearest neighbor search with the navigating spreading-out graph | |
US9928113B2 (en) | Intelligent compiler for parallel graph processing | |
JP6160277B2 (en) | Method for executing reconciliation processing, control unit, program, and data storage system | |
JP2017037648A (en) | Hybrid data storage system, method, and program for storing hybrid data | |
CN110188210B (en) | Cross-modal data retrieval method and system based on graph regularization and modal independence | |
Lee et al. | Toward efficient multidimensional subspace skyline computation | |
Zhang et al. | MRMondrian: Scalable multidimensional anonymisation for big data privacy preservation | |
US9852182B2 (en) | Database controller, method, and program for handling range queries | |
CN108764726B (en) | Method and device for making decision on request according to rules | |
US10915534B2 (en) | Extreme value computation | |
CN112925821B (en) | MapReduce-based parallel frequent item set incremental data mining method | |
Li et al. | I/O efficient approximate nearest neighbour search based on learned functions | |
US20160246983A1 (en) | Remote rule execution | |
CN108647266A (en) | A kind of isomeric data is quickly distributed storage, exchange method | |
US11729268B2 (en) | Computer-implemented method, system, and storage medium for prefetching in a distributed graph architecture | |
Sun | Personalized music recommendation algorithm based on spark platform | |
US10235420B2 (en) | Bucket skiplists | |
Elmeiligy et al. | An efficient parallel indexing structure for multi-dimensional big data using spark | |
US11080301B2 (en) | Storage allocation based on secure data comparisons via multiple intermediaries | |
Jayachitra Devi et al. | Link prediction model based on geodesic distance measure using various machine learning classification models | |
Xie et al. | Study of canopy and K-means clustering algorithm based on mahout for E-commerce product quality analysis | |
Liu et al. | Social Network Community‐Discovery Algorithm Based on a Balance Factor | |
Kim et al. | (p, n)-core: Core Decomposition in Signed Networks | |
Hong Lin et al. | Towards publishing directed social network data with k‐degree anonymization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: RESEARCH INSTITUTE OF TSINGHUA UNIVERSITY IN SHENZHEN, CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHUNG, YEH-CHING;ZHANG, LIDONG;WU, YONGWEI;REEL/FRAME:052241/0973 Effective date: 20200310 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |