CN109634520B - Storage system based on HDFS optical disc library - Google Patents

Storage system based on HDFS optical disc library Download PDF

Info

Publication number
CN109634520B
CN109634520B CN201811443267.2A CN201811443267A CN109634520B CN 109634520 B CN109634520 B CN 109634520B CN 201811443267 A CN201811443267 A CN 201811443267A CN 109634520 B CN109634520 B CN 109634520B
Authority
CN
China
Prior art keywords
file
hdfs
disk
library
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811443267.2A
Other languages
Chinese (zh)
Other versions
CN109634520A (en
Inventor
王子炫
张育平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201811443267.2A priority Critical patent/CN109634520B/en
Publication of CN109634520A publication Critical patent/CN109634520A/en
Application granted granted Critical
Publication of CN109634520B publication Critical patent/CN109634520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/068Hybrid storage device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a storage system based on an HDFS optical disk library, which comprises a memory, a magnetic disk and an HDFS optical disk library, wherein the magnetic disk is used for storing files and comprises a magnetic disk management module, an HDFS optical disk library module, a file classification module and a file migration module, wherein the magnetic disk management module is used for managing the files in the magnetic disk and is responsible for communication between the storage system and a user; the HDFS optical disk library module is used for communication between a magnetic disk and the HDFS optical disk library, the file classification module is used for dividing files in the magnetic disk into cold data and hot data, and the file migration module is used for migrating the files between the HDFS optical disk library and the magnetic disk; the HDFS optical disc library is used for storing the cold data. The storage system combines the advantages of the magnetic disk and the HDFS optical disk library, and cold data which are frequently not used in the system are migrated to the HDFS optical disk library, so that the user response time is reduced.

Description

Storage system based on HDFS optical disc library
Technical Field
The invention belongs to the technical field of storage systems, and particularly relates to a storage system based on an HDFS optical disk library.
Background
With the rapid development and wide application of the internet, the global data volume is also explosively increased. In the idc (internet Data center) survey report, the total amount of Data produced globally reaches 4.4ZB in 2013 for one year only, and this value is increasing at a rate of doubling every two years, with the total amount of Data expected to reach 44ZB in 2020. The growth of data not only increases the storage cost of the data center in terms of data storage devices, but also brings great challenges in terms of data maintenance cost and data security. While 80% of the access requests of the user are concentrated on 20% of the data, storing another 80% of the data in the disk array increases the storage cost.
In the existing big data storage system based on an optical storage medium, a Hadoop distributed file system (HDFS optical disc library) based on an optical disc library is the most widely applied one, the storage capacity and the transmission speed of the HDFS optical disc library are greatly improved compared with those of the traditional optical disc library, but due to the storage structure of the distributed system and the physical structure of the optical disc library, when a user accesses a certain file, the storage position query time of a file data block and the disc fetching and sending time of a mechanical arm of the optical disc library increase the user response time, and the user experience is seriously influenced.
Disclosure of Invention
The invention aims to provide a storage system based on an HDFS optical disk library, which combines the advantages of a magnetic disk and the HDFS optical disk library, migrates cold data which is frequently unused in the system into the HDFS optical disk library, and reduces the user response time.
In order to achieve the above purpose, the solution of the invention is:
a storage system based on an HDFS optical disc library comprises a memory, a magnetic disc and an HDFS optical disc library, wherein the magnetic disc is used for storing files and comprises a magnetic disc management module, an HDFS optical disc library module, a file classification module and a file migration module, wherein the magnetic disc management module is used for managing the files in the magnetic disc and is responsible for communication between the storage system and a user; the HDFS optical disk library module is used for communication between a magnetic disk and the HDFS optical disk library, the file classification module is used for dividing files in the magnetic disk into cold data and hot data, and the file migration module is used for migrating the files between the HDFS optical disk library and the magnetic disk; the HDFS optical disc library is used for storing the cold data.
The storage system also comprises a directory generation module for establishing a file storage directory in the disk management module and the HDFS optical disk library module, wherein the disk management module stores the directory to record all file information of the disk, and the storage directory in the HDFS optical disk library module records all files to be recorded in the disk.
The storage system further comprises a file merging unit for merging small files with the same label in the disk into a large file.
The process of merging the small files into the large file by the file merging unit is as follows: and caching cold data files generated in the last period of time, combining small files with the same label into a large file suitable for being stored in the HDFS optical disk library, and marking the same time stamp to store the large file in the HDFS optical disk library.
The file classification module is used for realizing the conversion of the same file between cold data and hot data, and adopts a classification algorithm as follows:
Figure BDA0001885183170000021
wherein, fileHeat1 is the heat of file update, fileHeat0 is the initial heat of file, tscanLast scanning time, t, for documentvisitFor last physical access time of file, tnowPresentation textThe current scanning time of the file, visitNum is the accessed times of the file in the disk; when the fileHeat1 of the file in the disk is less than or equal to 0, the file in the disk is divided into cold data; when the fileHeat1 of the file in the disk is greater than 0, the file in the disk is divided into hot data.
After the scheme is adopted, the invention has the following advantages:
(1) the invention improves the data storage speed, when the data is stored in the system, the data is firstly stored in the disk cache, then the data is recorded in the optical disk through the processing of data classification, small file combination and the like, and the direct transmission speed of the data between the disk and the disk is higher than the transmission speed from the disk to the HDFS optical disk library.
(2) The invention reduces the disc fetching times of the optical disc library, combines the files with the same label into the large file suitable for the HDFS optical disc library to access, and records the large file with the same label into the same optical disc, the files in the same optical disc have stronger relevance, and the continuous access of the system for several times is concentrated in the same optical disc with a high probability according to the spatial locality principle, thereby achieving the purpose of reducing the disc fetching times of the mechanical arm. Meanwhile, the mode of large file centralized burning is adopted, so that the frequent disk taking of the optical disk library is avoided.
(3) The invention reduces the user response time, on one hand, when the user stores the file, the file only needs to be stored in the disk, and the user does not need to care about the following file recording part, on the other hand, the file which is possibly accessed by the system next is prefetched into the disk in advance through the caching technology and the file prefetching, thereby reducing the frequency of accessing the HDFS optical disk library by the system and achieving the purpose of reducing the user response time.
Drawings
FIG. 1 is a schematic structural view of the present invention;
FIG. 2 is a flow chart of the operation of the present invention;
fig. 3 is a schematic structural diagram of a file label in the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In addition, all directions or positional relationships mentioned in the embodiments of the present invention are positional relationships based on the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not imply or imply that the referred device or element must have a specific orientation, and are not to be construed as limiting the present invention.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1, the present invention provides a storage system based on an HDFS optical disc library, including a memory, a magnetic disc and an HDFS optical disc library, where the magnetic disc is used to store files, and includes a magnetic disc management module, an HDFS optical disc library module, a file classification module and a file migration module; the disk management module is used for managing files in a disk and is responsible for communication between the storage system and a user; the HDFS optical disc library module is used for communication between a magnetic disc and an HDFS optical disc library, the file classification module is used for dividing files in the magnetic disc into cold data and hot data, the HDFS optical disc library is used for storing the cold data, and the file migration module is used for migrating files between the HDFS optical disc library and the magnetic disc.
Therefore, the response time of the user is reduced, on one hand, when the user stores the file, the file only needs to be stored in the disk, and the user does not need to care about the following file recording part, on the other hand, the file which is possibly accessed by the system next is prefetched into the disk in advance through the caching technology and the file prefetching, so that the frequency of accessing the HDFS optical disk library by the system is reduced, and the purpose of reducing the response time of the user is achieved.
Because the unit storage cost, the data security, the service life and the like have obvious advantages, in the embodiment, the HDFS optical disc library is used as a three-level storage device of the magnetic disc, the file classification module periodically scans the magnetic disc, cold data in the magnetic disc is transferred to the file transfer module, and the file transfer module is used for transferring files between the HDFS optical disc library and the magnetic disc.
The HDFS optical disk library module is used for communication between a magnetic disk and an HDFS optical disk library, and comprises task forwarding, small file merging, file burning, file recovery and the like. When a system accesses a certain file, firstly, a disk management module checks whether the file is in a disk, and if the file is not found, the file is searched by an HDFS (Hadoop distributed File System) optical disk library.
The storage directory in the HDFS optical disk library module records all files to be recorded in the magnetic disk, mainly files stored in the system for the first time and files needing to be recorded again.
Furthermore, the storage system also comprises a directory generation module which is used for establishing a file storage directory in the disk management module and the HDFS optical disk library module. The storage directory in the disk management module records all file information of the disk, so that files in the disk can be managed conveniently, and file information can be inquired quickly through file tag information to find out file related information. All the related information of the files to be recorded is recorded in the virtual storage module, which is beneficial to merging small files and establishing the relevance between the files. And recording all files in the module under the root directory, wherein the primary label comprises all secondary label related information of the label, and the secondary label comprises all file related information linked lists under the label.
It should be noted that, when a file is stored in the system, the system first processes the file content and the file name through natural language, counts several words with the largest occurrence frequency, and then marks a primary label and a secondary label for each file according to the word database, wherein the range of the primary label is larger than that of the secondary label, and the primary label includes a plurality of secondary labels. The files with the same label have relevance, and the relevance is stronger when the range of the label is smaller. The virtual HDFS library module and the disk management module are designed based on file tags, as shown in fig. 3.
Further, the storage system further comprises a file merging unit, which is used for merging the small files with the same label in the disk into a large file. The file merging unit caches cold data files generated in the latest period of time, merges small files with the same label into a large file suitable for being stored in the HDFS optical disk library, and marks the same time stamp to store the large file in the HDFS optical disk library. The method has the advantages that the problem that the efficiency of processing small files by the HDFS and the optical disk library is low can be well solved by adopting a small file merging mode, meanwhile, the files with the same labels and the same time stamps have strong relevance, when a certain file in the HDFS optical disk library is accessed by a system, the large file where the file is located is prefetched into the magnetic disk array, the file heat is updated to be an initial value, and because the small files under the same large file have strong relevance, the frequency of accessing the HDFS optical disk library by the system can be reduced, and the access speed of the system is improved.
The data storage speed is improved by the arrangement, when the data is stored in the system, the data is firstly stored in the disk cache, then the data is recorded in the optical disk through the processing of data classification, small file combination and the like, and the direct transmission speed of the data between the disk and the disk is higher than the transmission speed from the disk to the HDFS optical disk library. Meanwhile, the disk taking times of the optical disk library are reduced, files with the same label are combined into a large file suitable for being accessed by the HDFS optical disk library, the large file with the same label is recorded into the same optical disk, the files in the same optical disk have strong relevance, and the continuous access of the system for several times is concentrated in the same optical disk with a high probability according to the spatial locality principle, so that the purpose of reducing the disk taking times of the mechanical arm is achieved. Meanwhile, the mode of large file centralized burning is adopted, so that the frequent disk taking of the optical disk library is avoided.
For a new file to be recorded, the storage system firstly searches the HDFS optical disk library module root file directory to find a corresponding label directory, and if the label directory is not found, the corresponding label directory is created. And then inserting corresponding file information into a label directory, detecting whether the label capacity meets the recording condition, combining all files under the label into a large file when the recording requirement is met, establishing an index table from the large file to a small file, establishing a timestamp according to the current large file establishing time, and storing the large file and the index information table into an HDFS optical disc library. And after the large file is recorded, deleting the corresponding file in the disk, and updating the corresponding related file information.
Generally, data that is frequently used is referred to as hot (hot) data, and data that is less frequently used is referred to as cold (cold) data. The conventional classification method determines the file state according to the interval between the current file scanning time and the last physical access time or the time interval between two consecutive physical accesses, and the file classification module is used for realizing the conversion of the same file between cold data and hot data.
But the migration cost of the file between the disk array and the HDFS optical disk library is also an aspect to be considered. For example, assuming that a file is stored in the HDFS optical disc library in a cold state, if it is accessed once, the file state is converted to a hot type according to the hot/cold model, and then migrated back to the disk array. Therefore, the migration burden of the system is increased, and the space waste of the HDFS optical disk library is caused, so that the file heat value is determined according to the file scanning time and the file physical access time, the file heat state is determined according to the heat value, and the migration strategy is determined according to changFlag to avoid the repeated recording of the file. Therefore, the conversion time of the document cold and hot degree needs to be set more reasonably.
Further, the file classification module adopts a classification algorithm as follows:
Figure BDA0001885183170000061
wherein, fileHeat1 is the heat of file update, fileHeat0 is the initial heat of file, tscanLast scanning time, t, for documentvisitFor last physical access time of file, tnowRepresenting the current scanning time of the file, wherein visitNum is the accessed times of the file in the disk; when the fileHeat1 of the file in the disk is less than or equal to 0, the file in the disk is divided into cold data; when the fileHeat1 of the file in the disk is greater than 0, the file in the disk is divided into hot data. The advantage of this arrangement is that the change of the hot value of the file caused by one-time accidental access and no access in a short time is small, the hot state of the file cannot be changed enough, and the system jitter can be avoided.
changFlag indicates a modification bit indicating whether a file is modified within the disk. When changFlag is equal to 0, the file is not modified; when changFlag equals 1, it indicates that the file is modified. When the file is uploaded for the first time, the modification bit is 1; when the file is restored from the HDFS optical disk library to the magnetic disk, the modification bit is 0; when the file is modified during use, the modification bit is 1.
When the file heat is less than 0, migration is carried out according to file related information, and the migration method comprises the following three steps:
when changFlag is 1, hdflag is 1 and bdFlag is 0, the file is stored in the disk array, and the file is converted into cold data after long-term non-access, and then the cold data is migrated by the virtual HDFS optical disk library management module.
When changFlag is 0, hdflag is 1, and bdFlag is 1, the file is restored to the data in the disk array from the HDFS optical disk library, and no modification occurs in the access process in the disk array, so that only the corresponding file information needs to be modified, the changFlag is 1 to be updated, the file and the file path in the disk array are deleted, and repeated recording is not needed.
When changFlag is 1, hdflag is 1, bdFlag is 1, the file is restored to the data in the disk array from the HDFS disk library, the file content is modified in the disk array access process, and the disk characteristics are not modifiable, so that the file needs to be re-recorded by the virtual HDFS disk library module and the original file information is covered. The hddFlag is a flag bit of the disk and indicates whether the file exists in the disk, and when the hddFlag is 0, the file does not exist in the disk; when hddFlag is 1, the file is in disk. Similarly, bdFlag is the flag bit of the optical disc library and indicates whether the file exists in the HDFS optical disc library; when hdflag is 0, the file is not in the HDFS library; when hddFlag is 1, the HDFS library has the file.
Referring to fig. 2, a method for controlling a storage system based on an HDFS optical disc library, where the method described in this embodiment is based on the storage system of the HDFS optical disc library described in the foregoing embodiment, includes the following steps:
s1: reading newly uploaded files of a user, and labeling each file;
s2: establishing a file information directory according to the file label, and storing the file in a disk;
s3: scanning files in a magnetic disk regularly, and updating the heat of the files;
s4: judging the heat degree of the file: when the heat of the file is less than 0, the file is migrated according to information such as a modification bit, a magnetic disk zone bit, an optical disk library zone bit and the like of the file; when the file hot degree is larger than 0, the position of the file in the disk is kept unchanged.
S5: and judging the size of the transferred file, if the file is larger than 128M and the file is called a large file, uploading the large file to an HDFS optical disk library for recording, and if the file is smaller than 128M and the file is called a small file, transferring by combining the small files.
Uploading the small file name to a NameNode node in an HDFS optical disk library, storing small file information by NameSpace under the NameNode node, inquiring the use condition and recording information of the current DataNode node, selecting the optimal DataNode node to establish communication with the HDFS optical disk library module, and performing specific file recording operation.
S6: inserting small file information into HDFS optical disk library module management directory
And searching whether the management directory of the HDFS optical disk library module contains a corresponding first-level label directory and a corresponding second-level label directory or not, if so, directly inserting the management directory into the corresponding second-level label directory, otherwise, establishing a corresponding label file directory in the management directory of the HDFS optical disk library module, and inserting the small file information into the newly-established second-level label directory.
S7: and judging the sizes of all files in the secondary label directory of the newly inserted small file information, if the sizes of all the files in the secondary label directory are larger than 128M, carrying out small file merging processing, and if the sizes of all the files in the secondary label directory are smaller than 128M, temporarily not carrying out processing.
The small files are merged to create index information, which is shown in table 1 below:
TABLE 1
FileMD5 FileSize Offset
Small file 1 120K 0
Small file 2 86K 206
…… …… ……
Small file n …… ……
The FileMD5 value is a file MD5 value, and is obtained by calculating the file path, the file label and the file creation time, so that the uniqueness of the file in the system is ensured.
FileSize is the file size
Offset is the Offset of a small file within a large file
S8: large file recording
Uploading the large file index directory and the large file name to a NameNode node in an HDFS optical disk library, wherein the NameNode node stores the large file index directory and the large file name in the node;
and inquiring the use condition and the recording information of the current DataNode node, selecting the optimal DataNode node to establish the communication with the HDFS optical disk library module, and carrying out specific file recording operation.
And caching the file uploaded by the disk in the DataNode node, and then carrying out specific file burning work. The DataNode node firstly caches the file uploaded by the disk in a cache area of the optical disk library, then records the data blocks of the large files with the same label and different time stamps in different optical drives, and can improve the transmission speed of the data between the HDFS optical disk library and the disk through the optical drives when accessing the files. The data blocks under the same large file are recorded in the same optical drive, so that the data of one large file spans at most two discs, the disc changing operation of an optical disc library can be reduced, and the reading and writing speed of the data in the optical disc library is improved.
S9: document reading
When a small file is accessed or other small files which are positioned in the same large file with the small file are accessed, the file name, the large file index directory and the large file storage position of the large file to which the small file belongs are inquired through the NameNode node, the DataNode node is established to communicate with the HDFS optical disk library module, and the large file data and the index directory are restored to the disk space.
S10: and extracting the small files, restoring the large files into the small files according to the large file index directory, and updating the heat value of the small files into an initial value.
And dividing the large file into small files according to FilEMD5, FileSize and Offset information in the large file index directory, and inserting the small file information into the disk management module directory, wherein the label information of the small files is the same as that of the large files.
It should be noted that, in step S1, each small document is labeled by natural language processing technology, and each document corresponds to a related type or a keyword related to the document, for example, unnecessary words related to the document are deleted, and a keyword appearing in the document is counted. And then establishing a small file information directory according to the small file label, wherein the directory comprises directories established according to creation time, access time, file heat and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (4)

1. A storage system based on HDFS optical disc library is characterized in that: the system comprises a memory, a magnetic disk and an HDFS optical disk library, wherein the magnetic disk is used for storing files and comprises a magnetic disk management module, an HDFS optical disk library module, a file classification module and a file migration module, wherein the magnetic disk management module is used for managing the files in the magnetic disk and is responsible for communication between a storage system and a user; the HDFS optical disk library module is used for communication between a magnetic disk and the HDFS optical disk library, the file classification module is used for dividing files in the magnetic disk into cold data and hot data, and the file migration module is used for migrating the files between the HDFS optical disk library and the magnetic disk; the HDFS optical disc library is used for storing the cold data;
the storage system also comprises a directory generation module used for establishing a file storage directory in the disk management module and the HDFS optical disk library module, wherein the disk management module stores the directory to record all file information of the disk, and the storage directory in the HDFS optical disk library module records all files to be recorded in the disk.
2. The HDFS optical disc library-based storage system of claim 1, wherein: the storage system further comprises a file merging unit for merging small files with the same label in the disk into a large file.
3. The HDFS optical disc library-based storage system of claim 2, wherein: the process of merging the small files into the large file by the file merging unit is as follows: and caching cold data files generated in the last period of time, combining small files with the same label into a large file suitable for being stored in the HDFS optical disk library, and marking the same time stamp to store the large file in the HDFS optical disk library.
4. The HDFS optical disc library-based storage system of claim 1, wherein: the file classification module is used for realizing the conversion of the same file between cold data and hot data, and adopts a classification algorithm as follows:
Figure DEST_PATH_IMAGE002
wherein, the fileHeat1 is the update heat of the file, the fileHeat0 is the initial heat of the file, tscan is the last scanning time of the file, tvisit is the last physical access time of the file, tnow represents the current scanning time of the file, visitNum is the number of times of accessing the file in the disk; when the fileHeat1 of the file in the disk is less than or equal to 0, the file in the disk is divided into cold data; when the fileHeat1 of the file in the disk is greater than 0, the file in the disk is divided into hot data.
CN201811443267.2A 2018-11-29 2018-11-29 Storage system based on HDFS optical disc library Active CN109634520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811443267.2A CN109634520B (en) 2018-11-29 2018-11-29 Storage system based on HDFS optical disc library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811443267.2A CN109634520B (en) 2018-11-29 2018-11-29 Storage system based on HDFS optical disc library

Publications (2)

Publication Number Publication Date
CN109634520A CN109634520A (en) 2019-04-16
CN109634520B true CN109634520B (en) 2021-12-07

Family

ID=66069795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811443267.2A Active CN109634520B (en) 2018-11-29 2018-11-29 Storage system based on HDFS optical disc library

Country Status (1)

Country Link
CN (1) CN109634520B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125016A (en) * 2019-12-24 2020-05-08 普世(南京)智能科技有限公司 Magneto-optical hybrid file storage method and system based on label organization
CN113157206A (en) * 2021-03-19 2021-07-23 广东奥飞数据科技股份有限公司 Novel magneto-optical fusion storage system and method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605483A (en) * 2013-11-21 2014-02-26 浪潮电子信息产业股份有限公司 Feature processing method for block-level data in hierarchical storage system
CN104008207A (en) * 2014-06-18 2014-08-27 广东绿源巢信息科技有限公司 Optical disc based external data storage system for database and data storage method
CN104850358A (en) * 2015-05-26 2015-08-19 华中科技大学 Magnetic-optical-electric hybrid storage system and data acquisition and storage method thereof
CN105144074A (en) * 2013-04-12 2015-12-09 微软技术许可有限责任公司 Block storage using a hybrid memory device
CN107193500A (en) * 2017-05-26 2017-09-22 郑州云海信息技术有限公司 A kind of distributed file system Bedding storage method and system
CN107526546A (en) * 2017-08-25 2017-12-29 深圳大学 A kind of Spark distributed computational datas processing method and system
CN107704211A (en) * 2017-10-31 2018-02-16 武汉光忆科技有限公司 A kind of magneto-optic electricity mixed light is made an inventory of goods in a warehouse and its management method and management system
CN108491166A (en) * 2018-03-27 2018-09-04 江苏菲利斯通信息科技有限公司 Reading data caching management method towards CD server

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105144074A (en) * 2013-04-12 2015-12-09 微软技术许可有限责任公司 Block storage using a hybrid memory device
CN103605483A (en) * 2013-11-21 2014-02-26 浪潮电子信息产业股份有限公司 Feature processing method for block-level data in hierarchical storage system
CN104008207A (en) * 2014-06-18 2014-08-27 广东绿源巢信息科技有限公司 Optical disc based external data storage system for database and data storage method
CN104850358A (en) * 2015-05-26 2015-08-19 华中科技大学 Magnetic-optical-electric hybrid storage system and data acquisition and storage method thereof
CN107193500A (en) * 2017-05-26 2017-09-22 郑州云海信息技术有限公司 A kind of distributed file system Bedding storage method and system
CN107526546A (en) * 2017-08-25 2017-12-29 深圳大学 A kind of Spark distributed computational datas processing method and system
CN107704211A (en) * 2017-10-31 2018-02-16 武汉光忆科技有限公司 A kind of magneto-optic electricity mixed light is made an inventory of goods in a warehouse and its management method and management system
CN108491166A (en) * 2018-03-27 2018-09-04 江苏菲利斯通信息科技有限公司 Reading data caching management method towards CD server

Also Published As

Publication number Publication date
CN109634520A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN105677826A (en) Resource management method for massive unstructured data
CN104794123B (en) A kind of method and device building NoSQL database indexes for semi-structured data
CN104346357B (en) The file access method and system of a kind of built-in terminal
CN104731864B (en) A kind of date storage method of magnanimity unstructured data
CN102364474B (en) Metadata storage system for cluster file system and metadata management method
CN102930060B (en) A kind of method of database quick indexing and device
KR101672901B1 (en) Cache Management System for Enhancing the Accessibility of Small Files in Distributed File System
CN101799783A (en) Data storing and processing method, searching method and device thereof
CN101137981A (en) Methods and apparatus for managing the storage of content in a file system
CN109284273B (en) Massive small file query method and system adopting suffix array index
CN103530387A (en) Improved method aimed at small files of HDFS
CN109634911A (en) A kind of storage method based on HDFS CD server
CN101103355A (en) Methods and apparatus for managing deletion of data
CN102541968A (en) Indexing method
CN105912687A (en) Mass distributed database memory cell
CN109634520B (en) Storage system based on HDFS optical disc library
CN112947860B (en) Hierarchical storage and scheduling method for distributed data copies
CN111159176A (en) Method and system for storing and reading mass stream data
CN103473324A (en) Multi-dimensional service attribute retrieving device and method based on unstructured data storage
CN102567415A (en) Control method and device of database
CN116257523A (en) Column type storage indexing method and device based on nonvolatile memory
CN109299143B (en) Knowledge fast indexing method of data interoperation test knowledge base based on Redis cache
Rotem et al. Extendible arrays for statistical databases and OLAP applications
CN107273443B (en) Mixed indexing method based on metadata of big data model
CN109213760B (en) High-load service storage and retrieval method for non-relational data storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant