CN115437997A - Intelligent identification optimization system for data life cycle - Google Patents

Intelligent identification optimization system for data life cycle Download PDF

Info

Publication number
CN115437997A
CN115437997A CN202210879571.1A CN202210879571A CN115437997A CN 115437997 A CN115437997 A CN 115437997A CN 202210879571 A CN202210879571 A CN 202210879571A CN 115437997 A CN115437997 A CN 115437997A
Authority
CN
China
Prior art keywords
storage
data
analysis
strategy
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210879571.1A
Other languages
Chinese (zh)
Inventor
傅思雨
甘云锋
江敏
高雁冰
范图强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dtwave Technology Co ltd
Original Assignee
Hangzhou Dtwave Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dtwave Technology Co ltd filed Critical Hangzhou Dtwave Technology Co ltd
Priority to CN202210879571.1A priority Critical patent/CN115437997A/en
Publication of CN115437997A publication Critical patent/CN115437997A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1724Details of de-fragmentation performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1727Details of free space management performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The invention discloses an intelligent identification optimization system for a data life cycle, which comprises a storage management module and a strategy management module, wherein the storage management module is used for storing a plurality of data; the storage management comprises an analysis module and a management module, wherein the analysis module evaluates the storage health score of the system by analyzing the number of small files and the cold data capacity of the file system and the health degree of the storage nodes; the management module assigns corresponding storage strategies according to the health scores, realizes optimized storage through a migration tool, and comprehensively masters storage and management conditions through a statistical chart; the strategy management module supports management of a layered storage strategy, an analysis strategy and a compression strategy, and a user sets the layered storage strategy and the compression strategy for a directory to optimize file storage; setting an analysis strategy for small files and cold data to help data analysis; the invention provides an intelligent identification optimization system for a data life cycle, which can know the health condition of each directory and even files and perform optimized storage.

Description

Intelligent identification optimization system for data life cycle
Technical Field
The invention relates to the field of computer storage, in particular to an intelligent identification optimization system for a data life cycle.
Background
In the process of enterprise big data application, the storage occupation space of data such as HDFS is larger and larger, so that the operation efficiency is low, meanwhile, enterprises cannot comprehensively control the use condition of all data, and the data optimization is difficult to achieve.
So as enterprise big data clusters get used longer, more and more data is generated. The method not only increases the memory occupation and the read-write time consumption, but also influences the cluster expansion and reduces the operation efficiency. Therefore, it is important to know the overall situation of the data file, to accurately find the directory data file to be optimized, to manage the data, and so on.
Disclosure of Invention
The invention overcomes the defects of the prior art, and provides the intelligent identification and optimization system for the data life cycle, which can understand the health condition of each directory and even files and optimize storage.
The technical scheme of the invention is as follows:
an intelligent identification optimization system for a data life cycle comprises a storage management module and a policy management module;
the storage management comprises an analysis module and a management module, wherein the analysis module evaluates the storage health score of the system by analyzing the number of small files and the cold data capacity of the file system and the health degree of the storage nodes; the management module assigns corresponding storage strategies according to the health scores, realizes optimized storage through a migration tool, and comprehensively masters storage and management conditions through a statistical chart;
the strategy management module supports management of a layered storage strategy, an analysis strategy and a compression strategy, and a user sets the layered storage strategy and the compression strategy for a directory to optimize file storage; setting an analysis strategy for small files and cold data to help data analysis;
the bottom layer of the whole technical framework comprises MySQL, hive and HDFS, the Hive Client is connected with the Hive, webHDFS and dfsadin to access the HDFS so as to obtain data of the Hive and HDFS, and MyBatis is used for interacting with MySQL to store the data;
the method comprises the following specific steps:
101 Metadata acquisition step: acquiring HDFS metadata by adopting a fsimage analysis mode;
102 Metadata indexing step: analyzing the metadata file obtained in the step 101) to construct a multi-branch tree structure;
103 Data analysis step: counting the number and scale of all files and the number and scale of different data types under the catalog, and carrying out total quantity counting, ranking analysis and proportion analysis to obtain a storage health score;
104 Data policy configuration step: the method comprises a hierarchical storage strategy, an analysis strategy and a compression strategy; the hierarchical storage strategy, namely the heterogeneous storage strategy, stores the data on different storage media according to the data access heat, so that the storage of the HDFS can flexibly and efficiently cope with various application scenes; the analysis strategy sets the definition of the small files and the threshold of the number of the small files by setting a user, sets the definition of cold data and the threshold of the total amount of the cold data, sets the threshold of the capacity of a magnetic disk and sets the threshold of the scheduling time for the system to execute analysis; and the compression strategy sets erasure codes, so that all the erasure codes which can be selected currently are checked, data migration is guaranteed, and a migration log is checked and recorded.
Furthermore, the intermediate layer of the whole technical framework adopts Schedule to realize periodic scheduling, and a multi-branch tree is constructed to facilitate data analysis; the upper layer of the whole technical framework provides an external API call interface and a visual UI operation interface.
Further, the metadata includes: path-directory Path, replication-backup number, modificationTime-last modification time, accessTime-last Access time, preferredBlockSize-preferred Block size, blocksCount-number of blocks, fileSize-File size, NSQUOTA-name quota, DSQUOTA-space quota, permission-Authority, userName-user, and GroupName-user group;
specifically, the fsimage is obtained, then the fsimage is analyzed to be metadata in a specified format, and finally the oiv file is output.
Further, the data analysis comprises small file analysis, cold data analysis, hot data analysis, table analysis, damaged block analysis and disk memory analysis:
the small file analysis is used for counting the number and the scale of the small files according to the strategy setting.
The cold data analysis is used for counting the number and scale of the cold data according to the strategy setting.
The thermal data analysis is used for counting the number and scale of the thermal data according to strategy setting.
And the table analysis is used for counting the number and the scale of all small table files in the database according to the strategy setting.
The corrupted block analysis is used to count the number of corrupted file blocks.
The disk memory analysis is used for counting the total amount and the use condition of the disk.
Further, the scoring rules for storing the health score comprise a disk score, a small file score, a cold data score and a file block score;
the total score of the disks in the disk score is 30, and if the number of the nodes is n, the score of each node is
Figure BDA0003763682280000031
When is divided into w 1 Deducting when the usage of the node disk exceeds the threshold
Figure BDA0003763682280000032
Dividing; assuming that a node has m disks, the score of each disk is
Figure BDA0003763682280000033
And deducting the score when the total storage of the disks does not exceed the threshold value and the single disk exceeds the threshold value. w is a 2 Block disk over threshold, deduct
Figure BDA0003763682280000041
Dividing;
the total score of the small files is 30, the threshold number of the small files is set as t, when the number x of the small files exceeds the threshold value of 1-10%, 1 score is deducted, and when the number x of the small files exceeds 11-20%, 1 score is deducted again until the deduction is finished;
the score of the cold data is 30 in total, the cold data with yG is set to be unprocessed, namely, the data with a hierarchical storage strategy and an erasure code strategy are not set, and 1 point is deducted every 100G until the total points are deducted;
the total score of the file blocks in the file block scores is 10, if z file blocks are damaged, 10 file blocks are damaged, namely 1-10 file blocks are damaged, one file block is deducted, and 11-20 file blocks are damaged, one file block is deducted again until the total score is deducted;
therefore, the calculation formula of the storage health score S obtained as described above is as follows:
Figure BDA0003763682280000042
wherein, each deduction score can not exceed each total score.
Further, HDFS supports a variety of common storage types, including:
ARCHIVE: a storage medium with high storage density but low power consumption for storing cold data;
DISK: disk media, which is the default storage medium for HDFS;
SSD: a solid state disk storage medium;
RAM _ DISK: data is written into the memory and a copy is asynchronously written to the storage medium.
Further, the hierarchical policy includes provider, COLD, WARM, ONE _ SSD, ALL _ SSD, and LAZY _ policy;
the PROVIDED is used for storing external HDFS, and the storage medium is DISK;
COLD is that all copies are stored on the ARCHIVE storage, and the storage medium is ARCHIVE;
WARM adopts a copy to be stored on a DISK, the other copies are stored on an archiving storage, and the storage media are DISK and ARCHIVE;
HOT adopts all copies to store in the DISK, and the copies are a default storage strategy, and the storage medium is DISK;
the ONE _ SSD adopts ONE copy to be stored in the SSD, the other copies are stored in the DISK, and the storage media are the SSD and the DISK;
ALL _ SSD adopts ALL copies to be stored in SSD, and the storage medium is SSD;
LAZY _ PERSIST adopts one copy to be stored in a RAM _ DISK, the other copies are stored in a RAM _ DISK in a DISK, and a storage medium is a DISK.
The invention has the advantages that:
the invention can not only obtain the whole health condition of the system, or the specific condition of a certain directory, but also know the health condition of each directory and even files. The invention can accurately know the distribution position of the specific small files and count the largest file directories of the small files. The invention can manage data and realize optimized storage through a migration tool. According to the data analysis strategy, cold and hot data can be intelligently analyzed and displayed in a statistical manner. The invention supports statistical analysis and treatment of Hive base tables.
Drawings
FIG. 1 is a diagram of the product architecture of the present invention;
FIG. 2 is a flow chart of the operation of the present invention;
FIG. 3 is a technical framework diagram of the present invention;
FIG. 4 is a technical flow chart of the present invention;
FIG. 5 is a metadata acquisition flow diagram of the present invention;
FIG. 6 is a metadata index layout of the present invention;
fig. 7 is a diagram of a stored health score structure of the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention, and that elements not explicitly described in the present disclosure may be implemented using conventional techniques.
Possible terms are explained below:
HDFS is a Hadoop distributed file system. Hive is a data warehouse tool based on Hadoop and used for data extraction, conversion and loading. A small file refers to a file whose file size is significantly smaller than the block size (block) on HDFS (64 MB by default, 128MB by default in hadoop2. X). Cold data refers to data that is infrequent, not frequently accessed, or even never accessed, but still requires long-term retention. Hot data refers to very hot, frequently accessed data. Fsimage stores a file containing information of all directories and files of the whole HDFS file system in the HDFS, and is loaded when the HDFS is started. Oiv is a file format, which is called an offline image viewer. Erasure codes, namely erasurcoding erasure code technology, abbreviated as EC, are a data protection technology. Instead of multiple copies, less storage may be used to ensure the same level of fault tolerance.
As shown in fig. 1 to 7, an intelligent recognition optimization system for a data lifecycle includes a storage management module and a policy management module, which mainly support storage, governance, analysis and optimization of files (HDFS) and tables (HIVE).
The storage management comprises an analysis module and a governance module, wherein the analysis module evaluates the storage health score of the system by analyzing the small file number and the cold data capacity of the file system and the health degree of the storage nodes. The management module assigns corresponding storage strategies according to the health scores, realizes optimized storage through a migration tool, and comprehensively masters storage and management conditions through a statistical chart.
The strategy management module supports management of a layered storage strategy, an analysis strategy and a compression strategy, and a user sets the layered storage strategy and the compression strategy for a directory to optimize file storage. And setting an analysis strategy for small files and cold data to help data analysis.
The local cache is established in a fsimage obtaining mode on the whole, the client side issues an analysis request, strategy configuration is issued according to an analysis result, and finally the issued strategy is enabled to take effect through data migration.
The bottom layer of the whole technical framework comprises MySQL, hive and HDFS, the Hive Client is connected with the Hive, webHDFS and dfsadin to access the HDFS so as to obtain data of the Hive and HDFS, and MyBatis is used for interacting with MySQL to store the data. The middle layer of the whole technical framework adopts Schedule to realize periodic scheduling, and a multi-branch tree is constructed to facilitate data analysis. The upper layer of the whole technical framework provides an external API call interface and a visual UI operation interface.
The method comprises the following specific steps:
101 Metadata acquisition step: the HDFS metadata is obtained by analyzing the fsimage. The metadata includes: path-directory Path, replication-backup number, modificationTime-last modification time, access time-last access time, preferredLockSize-preferred Block size (byte), blocksCount-number of blocks, fileSize-File size (byte), NSQUOTA-name quota (limiting the number of files and directories allowed under a specified directory), DSQUOTA-space quota (limiting the number of bytes allowed under that directory), permission-Authority, userName-user, and GroupName-user group.
Specifically, the fsimage is obtained, then the fsimage is analyzed to be metadata in a specified format, and finally the oiv file is output.
102 Metadata indexing step: analyzing the metadata file acquired in the step 101) to construct a multi-branch tree structure.
103 Data analysis step: and counting the number and scale of all files and the number and scale of different data types under the catalog, and carrying out total quantity counting, ranking analysis and proportion analysis to obtain the storage health score. The specific data analysis comprises small file analysis, cold data analysis, hot data analysis, table analysis, damaged block analysis and disk memory analysis:
the small file analysis is used for counting the number and the scale of the small files according to the strategy setting.
The cold data analysis is used for counting the number and scale of the cold data according to the strategy setting.
The thermal data analysis is used for counting the number and scale of the thermal data according to the strategy setting.
And the table analysis is used for counting the number and the scale of all table small files in the database according to the strategy setting.
The corrupted block analysis is used to count the number of corrupted file blocks.
The disk memory analysis is used for counting the total amount and the use condition of the disk.
The scoring rules for storing health scores include a disk score, a doclet score, a cold data score, and a file chunk score.
The total score of the disks in the disk score is 30, and if the number of the nodes is n, the score of each node is
Figure BDA0003763682280000081
When is divided into w 1 When the usage of the node disk exceeds the threshold value, deducting
Figure BDA0003763682280000082
And (4) dividing. Assuming that a node has m disks, the score of each disk is
Figure BDA0003763682280000083
And deducting the score when the total storage of the disks does not exceed the threshold value and the single disk exceeds the threshold value. w is a 2 Block disk over threshold, deduct
Figure BDA0003763682280000084
And (4) dividing.
The total score of the small files is 30, the threshold number of the small files is set as t, when the number x of the small files exceeds the threshold value by 1-10%, 1 score is deducted, and when the number x of the small files exceeds 11-20%, 1 score is deducted again until the deduction is finished.
The cold data score is 30 in total, the cold data with yG is unprocessed, namely the data with the hierarchical storage strategy and the erasure code strategy are not set, and 1 point is deducted every 100G until the deduction is finished.
And the total score of the file blocks in the file block score is 10, if z file blocks are damaged, 10 file blocks are damaged, namely 1-10 file blocks are damaged, one point is deducted, and 11-20 file blocks are damaged, one point is deducted again until the total score is deducted.
Therefore, the calculation formula of the storage health score S obtained as described above is as follows:
Figure BDA0003763682280000091
wherein, each deduction score can not exceed each total score.
104 Data policy configuration step: including tiered storage policies, analysis policies, and compression policies.
The hierarchical storage strategy, namely the heterogeneous storage strategy, stores data on different storage media according to the data access heat, so that the storage of the HDFS can flexibly and efficiently cope with various application scenes. The basis for realizing data layering is based on that the HDFS supports heterogeneous storage and configures heterogeneous storage strategies.
HDFS supports a variety of common storage types, including:
ARCHIVE: a storage medium with high storage density but low power consumption is used for storing cold data.
DISK: disk media, which is the default storage medium for HDFS.
SSD: solid state disk storage media.
RAM _ DISK: data is written into the memory while a copy is asynchronously written to the storage medium.
Further, the hierarchical policies include PROVIDED, COLD, WARM, HOT, ONE _ SSD, ALL _ SSD, and LAZY _ PERSIST.
The provider is used for storing external HDFS, and the storage medium is DISK.
The COLD is maintained on ARCHIVE storage for all copies, and the storage medium is ARCHIVE.
WARM takes one copy to save on DISK, and the rest copies to save on ARCHIVE storage, and the storage media are DISK and ARCHIVE.
HOT adopts all copies to store in the DISK, and it is the default storage strategy, and the storage medium is DISK.
The ONE _ SSD adopts ONE copy to be stored in the SSD, the other copies are stored in the DISK, and the storage media are the SSD and the DISK.
ALL _ SSD adopts ALL copies to be stored in SSD, and the storage medium is SSD.
LAZY _ PERSIST adopts one copy to be stored in a RAM _ DISK, the other copies are stored in a RAM _ DISK in a DISK, and a storage medium is a DISK.
The analysis strategy sets the definition of the user to the small files, the threshold of the number of the small files, the definition of the cold data and the threshold of the total amount of the cold data, the threshold of the capacity of the disk and the threshold of the scheduling time for the system to execute the analysis.
And the compression strategy sets erasure codes, so that all currently selectable erasure codes are checked to guarantee data migration, and migration logs are checked and recorded. The specific erasure codes include: RS-10-4-1024k, RS-3-2-1024k, RS-6-3-1024k, RS-LEACY-6-3-1024 k and XOR-2-1-1024k.
In summary, the intelligent identification of the present invention includes a data life cycle of small files and cold data. And (4) evaluating the storage health score according to statistics and analysis of the data, and intelligently managing, compressing and migrating the data. The visual operation interface helps a user to clearly and intuitively manage and manage data and provides API calling interfaces for cluster statistical information, data analysis, data management, data migration, file management and the like.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the spirit of the present invention, and these modifications and decorations should also be regarded as being within the scope of the present invention.

Claims (7)

1. An intelligent recognition optimization system for data life cycle, characterized by: the system comprises a storage management module and a policy management module;
the storage management comprises an analysis module and a management module, wherein the analysis module evaluates the storage health score of the system by analyzing the number of small files and the cold data capacity of the file system and the health degree of the storage nodes; the management module assigns corresponding storage strategies according to the health scores, realizes optimized storage through a migration tool, and comprehensively masters storage and management conditions through a statistical chart;
the strategy management module supports management of a layered storage strategy, an analysis strategy and a compression strategy, and a user sets the layered storage strategy and the compression strategy for a directory to optimize file storage; setting an analysis strategy for the small files and the cold data to help data analysis;
the bottom layer of the whole technical framework comprises MySQL, hive and HDFS, the Hive Client is connected with the Hive, webHDFS and dfsadmin to access the HDFS so as to obtain data of the Hive and the HDFS, and MyBatis is used for interacting with MySQL to store the data;
the method comprises the following specific steps:
101 Metadata acquisition step: acquiring HDFS metadata by adopting a fsimage analysis mode;
102 Metadata indexing step: analyzing the metadata file obtained in the step 101) to construct a multi-branch tree structure;
103 Data analysis step: counting the number and scale of all files and the number and scale of different data types in the catalog, and carrying out total quantity counting, ranking analysis and proportion analysis to obtain a storage health score;
104 Data policy configuration step: the method comprises a layered storage strategy, an analysis strategy and a compression strategy; the hierarchical storage strategy, namely the heterogeneous storage strategy, stores the data on different storage media according to the data access heat, so that the storage of the HDFS can flexibly and efficiently cope with various application scenes; the analysis strategy sets the definition of the small files and the threshold of the number of the small files by setting a user, sets the definition of cold data and the threshold of the total amount of the cold data, sets the threshold of the capacity of a magnetic disk and sets the threshold of the scheduling time for the system to execute analysis; and the compression strategy sets erasure codes, so that all the erasure codes which can be selected currently are checked, data migration is guaranteed, and a migration log is checked and recorded.
2. The intelligent recognition optimization system for data lifecycle of claim 1, characterized by: the middle layer of the whole technical framework adopts Schedule to realize periodic scheduling, and a multi-branch tree is constructed to facilitate data analysis; the upper layer of the whole technical framework provides an external API call interface and a visual UI operation interface.
3. The intelligent recognition optimization system for data lifecycle of claim 1, characterized by: the metadata includes: path-directory Path, replication-backup number, modificationTime-last modification time, accessTime-last Access time, preferredBlockSize-preferred Block size, blocksCount-number of blocks, fileSize-File size, NSQUOTA-name quota, DSQUOTA-space quota, permission-Authority, userName-user, and GroupName-user group;
specifically, the fsimage is obtained, then the fsimage is analyzed to be metadata in a specified format, and finally the oiv file is output.
4. The intelligent recognition optimization system for data lifecycle of claim 1, characterized by: the data analysis comprises small file analysis, cold data analysis, hot data analysis, table analysis, damaged block analysis and disk memory analysis:
the small file analysis is used for counting the number and the scale of the small files according to the strategy setting.
The cold data analysis is used for counting the number and scale of the cold data according to the strategy setting.
The thermal data analysis is used for counting the number and scale of the thermal data according to strategy setting.
And the table analysis is used for counting the number and the scale of all small table files in the database according to the strategy setting.
The corrupted block analysis is used to count the number of corrupted file blocks.
The disk memory analysis is used for counting the total amount and the use condition of the disk.
5. The intelligent recognition optimization system for data lifecycle of claim 1, characterized by: the scoring rules for storing the health scores comprise a disk score, a small file score, a cold data score and a file block score;
the total number of disks in the disk score is 30,assuming that the number of nodes is n, each node has a score of
Figure FDA0003763682270000031
Is divided into 1 When the usage of the node disk exceeds the threshold value, deducting
Figure FDA0003763682270000032
Dividing; assuming that a node has m disks, the score of each disk is
Figure FDA0003763682270000033
And deducting the score when the total storage of the disks does not exceed the threshold value and the single disk exceeds the threshold value. w is a 2 Block disk over threshold, deduct
Figure FDA0003763682270000034
Dividing;
the total score of the small files is 30, the threshold number of the small files is set as t, when the number x of the small files exceeds the threshold value by 1-10%, 1 score is deducted, and when the number x of the small files exceeds 11-20%, 1 score is deducted again until the deduction is finished;
the cold data score is 30 in total, the cold data set with the y G is unprocessed, namely the data set with the hierarchical storage strategy and the erasure code strategy are not set, and 1 point is deducted every 100G until the total deduction is finished;
the total score of the file blocks in the file block score is 10, z file blocks are set to be damaged, and each time when the z file blocks are damaged, namely 1-10 file blocks are damaged, one score is deducted, and when the z file blocks are damaged 11-20 file blocks, one score is deducted again until the total score is deducted;
therefore, the calculation formula of the storage health score S obtained as described above is as follows:
Figure FDA0003763682270000035
wherein, each deduction score can not exceed each total score.
6. The intelligent recognition optimization system for data lifecycle of claim 1, characterized by: HDFS supports a variety of common storage types, including:
ARCHIVE: a storage medium with high storage density but less power consumption for storing cold data;
DISK: disk media, which is the default storage medium for HDFS;
SSD: a solid state disk storage medium;
RAM _ DISK: data is written into the memory and a copy is asynchronously written to the storage medium.
7. The intelligent recognition optimization system for data lifecycle of claim 6, characterized by: the hierarchical policy comprises PROVIDED, COLD, WARM, HOT, ONE _ SSD, ALL _ SSD, and LAZY _ PERSIST;
the PROVIDED is used for storing external HDFS, and the storage medium is DISK;
COLD is that all copies are stored on the ARCHIVE storage, and the storage medium is ARCHIVE;
WARM adopts one copy to store on a DISK, and the other copies store on an ARCHIVE storage, and the storage media are DISK and ARCHIVE;
HOT adopts all copies to store in the DISK, and the copies are a default storage strategy, and the storage medium is DISK;
the ONE _ SSD adopts ONE copy to be stored in the SSD, the other copies are stored in a DISK, and the storage media are the SSD and the DISK;
ALL _ SSD adopts ALL copies to be stored in SSD, and the storage medium is SSD;
LAZY _ PERSIST adopts one copy to be stored in a RAM _ DISK, the other copies are stored in a RAM _ DISK in a DISK, and a storage medium is a DISK.
CN202210879571.1A 2022-07-25 2022-07-25 Intelligent identification optimization system for data life cycle Pending CN115437997A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210879571.1A CN115437997A (en) 2022-07-25 2022-07-25 Intelligent identification optimization system for data life cycle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210879571.1A CN115437997A (en) 2022-07-25 2022-07-25 Intelligent identification optimization system for data life cycle

Publications (1)

Publication Number Publication Date
CN115437997A true CN115437997A (en) 2022-12-06

Family

ID=84241433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210879571.1A Pending CN115437997A (en) 2022-07-25 2022-07-25 Intelligent identification optimization system for data life cycle

Country Status (1)

Country Link
CN (1) CN115437997A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640370A (en) * 2022-12-08 2023-01-24 深圳市智多兴投控科技有限公司 Data analysis method and related equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640370A (en) * 2022-12-08 2023-01-24 深圳市智多兴投控科技有限公司 Data analysis method and related equipment

Similar Documents

Publication Publication Date Title
US10664453B1 (en) Time-based data partitioning
US8352429B1 (en) Systems and methods for managing portions of files in multi-tier storage systems
CN106662981B (en) Storage device, program, and information processing method
US9052832B2 (en) System and method for providing long-term storage for data
US8732217B2 (en) Using a per file activity ratio to optimally relocate data between volumes
US9916258B2 (en) Resource efficient scale-out file systems
US9110919B2 (en) Method for quickly identifying data residing on a volume in a multivolume file system
US8578096B2 (en) Policy for storing data objects in a multi-tier storage system
CN102576321B (en) Performance storage system in fast photographic system for capacity optimizing memory system performance improvement
CN103019887B (en) Data back up method and device
US20110145528A1 (en) Storage apparatus and its control method
US20060212495A1 (en) Method and system for storing data into a database
US8201001B2 (en) Method for optimizing performance and power usage in an archival storage system by utilizing massive array of independent disks (MAID) techniques and controlled replication under scalable hashing (CRUSH)
CN103914516A (en) Method and system for layer-management of storage system
CN107291889A (en) A kind of date storage method and system
CN104462389A (en) Method for implementing distributed file systems on basis of hierarchical storage
CN113568582B (en) Data management method, device and storage equipment
CN106326384A (en) File storage method suitable for high-speed mass storage based on FPGA (Field Programmable Gate Array)
CN115437997A (en) Intelligent identification optimization system for data life cycle
CN111741107A (en) Layering method and device based on file storage system and electronic equipment
US20220404987A1 (en) Storage system, storage control device, and storage control method
CN105630689B (en) Accelerate the method for data reconstruction in a kind of distributed memory system
Alatorre et al. Intelligent information lifecycle management in virtualized storage environments
JP4079244B2 (en) Reorganization processing method for write-once type storage media volume
CN110727406B (en) Data storage scheduling method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination