CN117453647A - File storage method, device, equipment and storage medium - Google Patents

File storage method, device, equipment and storage medium Download PDF

Info

Publication number
CN117453647A
CN117453647A CN202311346100.5A CN202311346100A CN117453647A CN 117453647 A CN117453647 A CN 117453647A CN 202311346100 A CN202311346100 A CN 202311346100A CN 117453647 A CN117453647 A CN 117453647A
Authority
CN
China
Prior art keywords
file
object storage
storage system
target file
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311346100.5A
Other languages
Chinese (zh)
Inventor
朱晨畅
冯家麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311346100.5A priority Critical patent/CN117453647A/en
Publication of CN117453647A publication Critical patent/CN117453647A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a file storage method, a device, equipment and a storage medium, relates to the field of data processing, and particularly relates to the technical fields of distributed storage, object storage, cloud storage and big data. The specific implementation scheme is as follows: storing file directory information in directory nodes of the distributed file system; the file directory information comprises a corresponding relation between an identification of a file and storage position information of the file in an object storage system; the data nodes of the distributed file system are used for storing data blocks obtained by dividing the first type of files, and the object storage system is used for storing the second type of files. According to the method and the device, the storage space can be expanded for the distributed file system, the whole operation of the files in the object storage system is facilitated, and the file processing efficiency is improved.

Description

File storage method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing, and in particular, to the technical fields of distributed storage, object storage, cloud storage, and big data.
Background
The Hadoop distributed file system (Hadoop Distributed File System, HDFS) is a distributed file system. HDFS supports multiple storage media and storage types such as Solid State Disk (SSD), mechanical Hard Disk (HDD), ARCHIVE type (ARCHIVE), etc., but does not support object storage, and storage cost is expensive. The storage cost of object storage is much lower than that of a distributed file system.
Disclosure of Invention
The disclosure provides a file storage method, device, equipment and storage medium.
According to an aspect of the present disclosure, there is provided a file storage method including:
storing file directory information in directory nodes of the distributed file system;
the file directory information comprises a corresponding relation between an identification of a file and storage position information of the file in an object storage system; the data nodes of the distributed file system are used for storing data blocks obtained by dividing the first type of files, and the object storage system is used for storing the second type of files.
According to another aspect of the present disclosure, there is provided a file storage device including:
a first storage module for storing file directory information in directory nodes of the distributed file system;
the file directory information comprises a corresponding relation between an identification of a file and storage position information of the file in an object storage system; the data nodes of the distributed file system are used for storing data blocks obtained by dividing the first type of files, and the object storage system is used for storing the second type of files.
According to another aspect of the present disclosure, there is provided an electronic device including:
At least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.
According to the method and the device, the storage file directory information in the directory node of the distributed file system comprises the identification of the file and the corresponding relation between the storage position information of the file in the object storage system, the object storage system can be mounted on the distributed file system, the storage space can be expanded for the distributed file system, the storage cost is saved, the whole operation of the file in the object storage system is facilitated, and the file processing efficiency is improved. Further, the files of different types are respectively stored in the distributed file system and the object storage system, so that classification management is convenient according to the characteristics of the files of different types.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a method of storing files according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method of storing files according to another embodiment of the present disclosure;
FIG. 3 is a flow chart of a method of storing files according to another embodiment of the present disclosure;
FIG. 4 is a flow chart of a method of storing files according to another embodiment of the present disclosure;
FIG. 5 is a flow chart of a method of storing files according to another embodiment of the present disclosure;
FIG. 6 is a flow chart of a method of storing files according to another embodiment of the present disclosure;
FIG. 7 is a flow chart of a method of storing files according to another embodiment of the present disclosure;
FIG. 8 is a flow chart of a method of storing files according to another embodiment of the present disclosure;
FIG. 9 is a block diagram of a distributed file system with an object storage system mounted;
FIG. 10 is a schematic diagram of a file storage device according to an embodiment of the present disclosure;
FIG. 11 is a schematic diagram of a file storage device according to another embodiment of the present disclosure;
fig. 12 is a block diagram of an electronic device used to implement an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a flow chart of a method of storing files according to an embodiment of the disclosure, the method comprising:
s101, storing file directory information in directory nodes of a distributed file system;
in the embodiment of the disclosure, the file directory information includes a correspondence between an identifier of a file and storage position information of the file in an object storage system; the data nodes of the distributed file system are used for storing data blocks obtained by dividing the first type of files, and the object storage system is used for storing the second type of files.
In embodiments of the present disclosure, the distributed file system (Distributed File System, DFS) may include a Hadoop distributed file system, HBase, spark, etc., which are not limited by embodiments of the present disclosure. The distributed file system may include a name node (naminode), a data node (datainode), a data Block (Block), and a Client (Client). Wherein the name node may also be referred to as a directory node for file directory information. The file directory information can comprise information such as identification of a file, the size of the file, the operation time of the file and the like, and can also comprise information of a partition block of the file, information of a data node where a data block divided by the file is located in a distributed file system and the like. The partitioning information of the file may include a relationship between the file and the data block after the file is divided into a plurality of data blocks by encoding or the like. For example, file a includes data blocks block1, block2, and block3; file B includes block2, block3, block4, and block5. Typically, a single data block of a file after encoding is not readable, and the single data block is readable after being combined with other data blocks of the file to form a complete file. In order to improve the security of data, a copy (duplicate) is generated for each data block after the file is partitioned, and one data block and its copy may be stored in different data nodes respectively. For example, one data block is stored in data node datanode1, its copy 1 is stored in data node datanode2, and its copy 2 is stored in data node datanode3. The data nodes on which a data block and its copies reside may be distributed across multiple storage devices. Information such as the identification of data nodes distributed in the distributed file system of the data block of a file which can be recorded in the file directory information. In addition to storing the content of the data block or its copy, the data node may also store meta information of the data block, for example, an identifier of a file to which the data block belongs, information associated with the data block, information of the copy, and the like. The identification of the file may be unique and may be generated based on the name, content, or other information of the file. The client accesses one or more data nodes through the directory node, and may perform operations such as reading and writing to a distributed file system, e.g., HDFS.
In embodiments of the present disclosure, the Object store system (Object-based Storage System, OSS) may include a simple store service (Simple Storage Service, S3) or the like, to which embodiments of the present disclosure are not limited. The Object storage system may include an Object storage device (Object-based Storage Devices, OSD), a Metadata server (MDS), a Client (Client), and the like. The object storage device is provided with a central processing unit (Central Processing Unit, CPU), a memory, a network and a disk system, can store data, can optimize data distribution and supports data pre-reading; the metadata server controls the interaction between the client and the object storage device, and manages quota control, creation and deletion of catalogs and files, and access control rights; the client can provide interface access and calling functions of the file system, and facilitates external access.
In the embodiment of the disclosure, a complete file can be saved in the object storage system. The storage location information of the file in the object storage system may include one or more of a uniform resource locator (Uniform Resource Locator, URL), a network address, a physical address, a virtual address, etc. File directory information at directory nodes of the distributed file system may be added to storage location information of the file in the object storage system to mount the object storage system on the distributed file system. For example, correspondence between an identification of a file and storage location information of the file in the object storage system may be stored in file directory information. For a complete file stored only in the object storage system, there is no need to record data node related information in the directory nodes of the distributed file system. In this case, the client may directly perform operations such as reading and writing on the file in the object storage system through the directory node, and it is not necessary to perform operations on the file after finding the data node related to the file through the directory node.
In the embodiment of the disclosure, files can be classified according to the degree of coldness and warmness, the frequency of use, the access amount and the like, so as to obtain a first type of file and a second type of file. For example, the first type of files may include hot data, hot files, files with a large access or files with a high frequency of use, and so on. The second type of files may include cold data, cold files, files with a small access or files with a low frequency of use, etc. Different types of files may be stored separately in different storage systems. For example, the first type of files are stored in the distributed file system, so that the reading and writing speed can be improved. Storing the second type of file in the object storage system facilitates expanding the storage space. For another example, the first type of files are files of Solid State Disk (SSD) storage, mechanical Hard Disk Drive (HDD) storage, and the like. The second type of file is an Archive (Archive) type of file.
For example, there are files A, B and C, where file a is a cold file and files B and C are hot files. When storing the files A, B and C, the file a is stored in the object storage system, and a URL of the file a is generated, for example, "http:// user. And, the URL and the identification of the file a may be stored in a directory node of the distributed file system correspondingly. After the files B and C are coded and blocked, a plurality of data blocks and copies thereof are stored in a plurality of data nodes of the distributed file system, and storage position information of each data block, such as a storage path, identification of the data block, identification of a file to which the data block belongs, and the like, are correspondingly stored in a directory node of the distributed file system.
According to the embodiment of the disclosure, the storage file directory information in the directory node of the distributed file system comprises the identification of the file and the corresponding relation between the storage position information of the file in the object storage system, so that the object storage system can be mounted on the distributed file system, the storage space can be expanded for the distributed file system, the storage cost is saved, the whole operation of the file in the object storage system is facilitated, and the file processing efficiency is improved. Further, the files of different types are respectively stored in the distributed file system and the object storage system, so that classification management is convenient according to the characteristics of the files of different types.
Fig. 2 is a flow chart illustrating a method of storing files according to another embodiment of the present disclosure. The method may include one or more features of the file storage method described above. In one embodiment, the method further comprises:
s201, receiving a file writing request from a client, wherein the file writing request comprises information of a first target file which is requested to be written by the client;
s202, storing a first target file in the object storage system, and acquiring storage position information of the first target file in the object storage system;
S203, writing the corresponding relation between the first target file and the storage position information of the first target file in the object storage system into the file directory information.
In the embodiment of the disclosure, a client responds to an operation of uploading a file by a user, generates a corresponding file writing request based on an identification of the file uploaded by the user, a type of the file and the like, and sends the file writing request to a directory node of a distributed file system. After receiving the file writing request of the client, the directory node of the distributed file system can analyze the request to obtain the information of the first target file which is requested to be written by the client. If the information of the first target file includes an identification of the first target file, the content of the first target file can be found according to the identification and written into the object storage system. If the information of the first target file includes the content of the first target file (the content that is initially written, the content that needs to be modified, etc.), the content of the first target file may be directly written to the object storage system. If the information of the first target file comprises the source address of the first target file, the content of the first target file is found according to the source address and written into the object storage system.
After the first target file is initially written, storage position information of the first target file in the object storage system can be generated, and the first target file and the storage position information, such as a URL, of the first target file in the object storage system are written into file directory information in a directory node of the distributed file system. If the first target file needs to be modified subsequently, after the first target file is modified, the storage position information of the file can be updated in the directory node of the distributed file system without modifying the storage position information of the file, or after the storage position information of the generated file is regenerated or acquired, the storage position information of the file in the file directory information is updated.
Based on the above, in the file directory information of the directory node of the distributed file system, the identifier of a certain file, for example, the second type file, and the storage location information of the file in the object storage system may be included, or the correspondence between the identifier of a certain file, for example, the first type file, and the data node where the data block of the file is distributed in the distributed file system may be included. These two correspondences may be stored separately, for example in two data tables, or may be stored together, for example in one data table.
According to the embodiment of the disclosure, the file which is written in the object storage system by the client request can be stored in the object storage system based on the file writing request, and the storage position information of the file in the object storage system is recorded in the directory node of the distributed file system, so that the file in the object storage system can be directly operated through the directory node conveniently, the file storage space is expanded, the cost of data storage is saved, and the complexity of file operation is reduced.
In one embodiment, as shown in fig. 2, the method further comprises:
s204, when the file writing request fails to be executed, a file repairing tool is used for carrying out inspection on the file in the object storage system, and repairing processing is executed.
In the embodiment of the disclosure, a file repair tool of a distributed file system, such as fsck tool, may be used to patrol files in an object storage system. For example, fsck tool patrol units may be modified to enable patrol in the object storage system in units of files; the function of performing patrol of fsck in the distributed file system by taking the data block as a unit can also be reserved. Alternatively, two fsck tools may be built, one for inspecting the object storage system in units of files and one for inspecting the distributed file system in units of data blocks.
In the embodiment of the disclosure, when a file writing request is executed, writing abnormality such as incomplete, unreadable, analysis error and the like may occur to a written file due to some factors such as software factors, hardware factors, network factors, human misoperation and the like. If the distributed file system detects a write exception, it may be determined that the file write request failed to execute. In the event that the file write request fails to execute, a file repair tool, such as fsck, may be used for patrol based on the file type in the file write request. If the file type is a second type of file, the file repair tool may be used to patrol the files in the object storage system. If the file type is a first type of file, a file repair tool may be used to patrol data blocks in the distributed file system. And executing subsequent repair processing according to the inspection result.
According to the embodiment of the disclosure, the file can be inspected and repaired under the condition that the file writing request fails to be executed, so that the effectiveness and the correctness of data writing are ensured.
In one embodiment, a repair process is performed, including one of:
Re-executing the file write request if the first target file does not exist in the object storage system;
deleting the file with the writing error in the object storage system and re-executing the file writing request under the condition that the first target file in the object storage system is in error;
and under the condition that the first target file in the object storage system is incomplete, performing incremental update on the file written in the object storage system.
In the embodiment of the disclosure, different repair operations may be performed based on different reasons for failure of execution of the file write request. For example, after performing the operation of writing the first target file to the object storage system based on the file writing request, if the first target file does not exist in the object storage system, it may indicate that the first target file is not successfully stored in the object storage system. The file write request may be regenerated based on the first target file, and the file write operation may be re-executed; instead of regenerating the file write request, a file write operation may be performed again based on the original file write request. For another example, after the operation of writing the first target file into the object storage system is performed based on the file writing request, if an error occurs in the first target file in the object storage system, for example, a file disorder, a file format error, a file size error, etc., the first target file with the error may be deleted directly. Subsequently, a file write request may be regenerated based on the first target file, and the write operation of the first target file may be re-performed. For another example, after the operation of writing the first target file into the object storage system is performed based on the file writing request, if the first target file is included in the object storage system, but the information such as the size of the first target file is inconsistent with the known information, it may indicate that the first target file in the object storage system is incomplete. The first target file may miss portions of data, and if missing data information, such as which data from the file was missed or which data was missed, the first target file in the object storage system may be incrementally updated and subsequently written with the missing data.
According to the embodiment of the disclosure, after the reason of the failure of executing the file writing request is determined by inspection, different file repairing modes can be adopted based on different failure reasons, so that the flexibility and adaptability of file repairing are improved, and the efficiency of file repairing is improved.
Fig. 3 is a flow chart illustrating a method of storing files according to another embodiment of the present disclosure. The method may include one or more features of the file storage method described above. In one embodiment, the method further comprises:
s301, receiving a file reading request from a client, wherein the file reading request comprises information of a second target file requested to be read by the client;
s302, searching storage position information of the second target file in the object storage system in the file catalog information based on the information of the second target file;
s303, sending the storage position information of the second target file in the object storage system to the client.
In the embodiment of the disclosure, a client responds to operations such as file access or file searching by a user, generates a file reading request based on information such as identification of the file which the user needs to access or search, and sends the file reading request to directory nodes of a distributed file system. After receiving the file reading request of the client, the directory node of the distributed file system can analyze the request to obtain information such as the identification of the second target file, and search in the file directory information based on the information such as the identification of the second target file. And under the condition that the type of the second target file is found to be the second type file, the URL corresponding to the identification of the second target file can be sent to the client. And the client accesses the object storage system based on the URL and reads the file corresponding to the URL. And under the condition that the type of the second target file is found to be the first type file, sending the information of the data nodes distributed by the data blocks of the first type file to the client. And the client accesses the data nodes based on the information of the data nodes distributed by the data blocks, and reads the data blocks of the first type file.
For example, the client generates a file read request based on the file a, the file B and the file C, and sends the file read request to the directory node of the distributed file system, and after the distributed file system receives and parses the file read request, the operations of reading the file a, the file B and the file C are executed. Based on the file identification in the file reading request, the directory node searches the URL corresponding to the file A, such as 'http:// user.closed// failA', in the file directory information, and the information of the data nodes of the data block distribution of the file B and the file C. Information of the URL of the file a, the distributed data nodes of the file B and the file C may be transmitted to the client. The client side reads the file A from the object storage system based on the URL of the file A, and reads the data blocks of the file B and the data blocks of the file C from the related data nodes of the distributed file system based on the information of the distributed data nodes of the file B and the file C respectively.
According to the embodiment of the disclosure, the storage position information of the file requested to be read by the client in the object storage system can be searched based on the file directory information, so that the file can be read quickly, and the convenience and speed of file reading are improved.
Fig. 4 is a flow chart illustrating a method of storing files according to another embodiment of the present disclosure. The method may include one or more features of the file storage method described above. In one embodiment, the method further comprises:
S401, setting a first migration tag of one or more data blocks in a data node of the distributed file system, wherein the first migration tag is used for indicating that a file to which the one or more data blocks belong is changed into a second file;
s402, migrating the file to which the one or more data blocks belong to the object storage system based on the first migration tag of the one or more data blocks.
In some application scenarios, a distributed file system may be used for local storage, an object storage system may be used for remote storage, cloud storage, and so on. As the frequency of access by a client to files in distributed file systems and object storage systems changes may result in the first migration tag of a data block being used to represent a change in the type of file to which the data block belongs in embodiments of the present disclosure. For example, the first migration tag of a certain data block may include a cold tag, indicating that the type of file to which the data block belongs is changed to cold data or a cold file, etc. In the embodiment of the disclosure, a cold label may be set for each data block conforming to the cold data feature, or a cold label may be set uniformly for a file to which the data block conforming to the cold data feature belongs. For example, in the directory file information of the directory node, a cold label may be added to each data block that meets the cold data characteristics. For another example, if a file is determined to conform to a cold data characteristic based on one or more data blocks of the file, a cold label may be added to the file conforming to the cold data characteristic in the directory file information of the directory node. And searching the cold label corresponding to the file according to the identification of the file. Based on the identification of the file of the data block of the file, the identification of the file is searched, and the cold label corresponding to the file is searched.
In the embodiment of the disclosure, the file to which the data block belongs can be completely migrated from the data node of the distributed file system to the object storage system based on the first migration tag of the data block or the file. For example, the data node N of the distributed file system includes a data block10, a data block11, and a data block12 of the file D. If a cold label is required to be added to a data block10, a cold label may also be added to the duplicate block10' and block10 "of the data block 10. Further, a cold label may be added to the other data block11, the data block12, and their copies of the file D to which the block10 belongs. Then, the complete file D may be read based on the cold labels of the data blocks block10, 11, and 12, and migrated to the object storage system.
In the embodiment of the disclosure, the data blocks in the data nodes of the distributed file system can be periodically migrated to the object storage system according to the first migration tag, the data blocks in the data nodes of the distributed file system can be migrated to the object storage system according to the first migration tag at a fixed time point, and the data blocks in the data nodes of the distributed file system can be migrated to the object storage system according to the first migration tag at any time according to user requirements and the like.
According to the embodiment of the disclosure, cold data can be migrated to the object storage system according to the cold and hot degrees of the files corresponding to the data blocks in the distributed file system by the label distinction, so that the file storage cost is saved.
Fig. 5 is a flow chart illustrating a method of storing files according to another embodiment of the present disclosure. The method may include one or more features of the file storage method described above. In one embodiment, S401 sets a first migration tag for one or more data blocks in a data node of the distributed file system, including:
s501, counting the access amount of one or more data blocks in the data nodes of the distributed file system in a set time period;
s502, setting a first migration tag of one or more data blocks with the access quantity smaller than a first set value as a cold tag.
In the embodiment of the disclosure, the access amount of the data blocks in the data nodes of the distributed file system can be counted according to one or more set time periods. For example, statistics of day 10:00-17: access amount of each data block in the data nodes of the inter-00 distributed file system. For another example, time period 10 of a certain day is counted separately: 00-12:00, and time period 14:00-16: between 00, the access amount of each data block in the data node of the distributed file system.
In the embodiment of the present disclosure, the comparison value of the access amount, that is, the first set value may be set in advance. For example, the first setting value may be set for one data block, or the first setting value may be set for one file. Based on statistics of access amounts of individual data blocks in data nodes of the distributed file system, a comparison is made with a first set value. For example, the access amount of one data block may be compared with a first set value. The sum of the access amounts of all the data blocks of one file may also be compared with the first set value. The tag of the data block or file may be set as the first migration tag according to the comparison result. For example, in the case where the access amount of one data block is smaller than the first set value, the tag of the data block is set as the first migration tag. For another example, in the case where the sum of the access amounts of all the data blocks of one file is smaller than the first set value, the tag of the file is set as the first migration tag. The first setting value may be set according to a user demand, or may be set as a default value.
According to the embodiment of the disclosure, the cold and hot degree of the file corresponding to the data block can be confirmed based on the access amount of the data block, so that the cold file can be conveniently migrated to the object storage system subsequently, and storage resources of the distributed file system are saved.
In one embodiment, as shown in fig. 5, S402, based on a first migration tag of one or more data blocks, migrates a file to which the one or more data blocks belong to the object storage system, including:
s503, based on one or more data blocks with cold labels in the data nodes of the distributed file system, acquiring information of a third target file to be migrated, to which the one or more data blocks belong;
s504, reading each data block of the third target file from the data nodes of the distributed file system based on the information of the third target file;
s505, merging all the data blocks of the third target file to obtain the third target file;
s506, storing the third target file in the object storage system.
In the embodiment of the present disclosure, if the first migration tag exists in the directory node, information such as an identifier of a third target file to which the data block having the first migration tag belongs and a data block having the first migration tag may be searched in file directory information of the directory node in response to the data migration instruction. If the first migration label exists in the data node, after the data node is traversed to search the data block with the first migration label, information such as identification of a third target file to which the data block with the first migration label belongs is searched in file directory information of the directory node. The data migration instructions may be from the client, may be generated periodically, or may be generated at fixed points in time.
And reading all data blocks of the third target file in the data node based on the information such as the identification of the third target file. And decoding and merging all the data blocks of the third target file to obtain the third target file. And storing the third target file in the object storage system. Each data block of the third target file has one or more copies, respectively. The decoding and merging operations may be performed after all the data blocks and their copies are read, or only one set of data blocks or one set of copies may be read, i.e. the decoding and merging operations may be performed.
According to the embodiment of the disclosure, the data blocks with the cold labels in the distributed file system can be combined to obtain the complete file, so that the file with the cold labels is migrated to the object storage system, and storage resources of the distributed file system are saved.
In one embodiment, as shown in fig. 5, S402 migrates, based on a first migration tag of one or more data blocks, a file to which the one or more data blocks belong to the object storage system, and further includes:
s507, deleting each data block and the copy of the third target file from the distributed file system. After successful migration of the third target file, the respective data blocks of the third target file and their copies may be deleted from the respective data nodes of the distributed file system. For example, after the file D is stored in the object storage system, the data blocks block10, block11, and block12 of the file D at the data node N1 may be deleted from the data nodes of the distributed file system, and the data block copies block10', block11', and block12' of the file D at the data node N2 and the data block copies block10", block11", and block12″ of the file D at the data node N3 may be deleted. In this way, storage resources of the distributed file system may be conserved.
In one embodiment, as shown in fig. 5, the method further comprises:
s508, writing the corresponding relation between the third target file and the storage position information of the third target file in the object storage system into the file directory information.
In the embodiment of the disclosure, after the third target file is stored in the object storage system, storage location information, such as URL, of the third target file may be generated and recorded into file directory information of directory nodes of the distributed file system.
According to the embodiment of the disclosure, after the cold data is migrated, the cold data can be deleted from the distributed file system, so that the data redundancy is reduced, the storage space is released, and the storage space of the distributed file system can be saved.
Fig. 6 is a flow chart illustrating a method of storing files according to another embodiment of the present disclosure. The method may include one or more features of the file storage method described above. In one embodiment, the method further comprises:
s601, setting a second migration tag of a fourth target file to be migrated in the object storage system, wherein the second migration tag is used for indicating that the fourth target file is changed into a first type file;
S602, migrating the fourth target file to the data node of the distributed file system based on the second migration tag of the fourth target file.
In the embodiment of the disclosure, the second migration tag of the data block is used to represent a type of the file, for example, the first type of file and/or the second type of file. The second migration tag may include a thermal tag and the first type of file may include thermal data, thermal files, and the like. After the tag of one file is set to the second migration tag, it indicates that the file is hot data, a hot file, or the like.
In the embodiment of the disclosure, the file can be migrated from the object storage system to the data node of the distributed file system based on the second migration tag of the file. For example, in the object storage system, there are a file E, a file F, and a file G, where the label of the file E is a second migration label, which indicates that the file E is a hot data or a hot file. File E may be migrated into the data node of the distributed file system based on the second migration tag of file E.
In the embodiment of the disclosure, the second migration tags of one or more files in the object storage system may be set periodically, the second migration tags of the files may be set at a fixed time point, and the second migration tags of the files may be set at any time according to the user requirements. The files in the object storage system can be periodically migrated to the data nodes of the distributed file system according to the second migration label, the files in the object storage system can be migrated to the data nodes of the distributed file system according to the second migration label at a fixed time point, and the files in the object storage system can be migrated to the data nodes of the distributed file system at any time according to the second migration label according to the user demand. The period of setting the tag and the period of migration may be the same or different, and the present disclosure is not limited thereto. The fixed time point at which the tag is set and the fixed time point at which the tag is migrated may be the same or different, and the present disclosure is not limited thereto.
According to the embodiment of the disclosure, the types of the files in the object storage system can be distinguished according to the labels, the target files changed into the first type of files are migrated into the distributed file system, and the file reading speed is improved.
Fig. 7 is a flow chart illustrating a method of storing files according to another embodiment of the present disclosure. The method may include one or more features of the file storage method described above. In one embodiment, S601 sets a second migration tag of a fourth target file to be migrated in the object storage system, including:
s701, counting the access quantity of one or more files in the object storage system in a set time period;
s702, setting a second migration tag of the fourth target file with the access amount higher than the second set value as a thermal tag.
In the embodiment of the disclosure, the access amount of the files in the object storage system can be counted according to one or more set time periods. For example, statistics of day 10:00-17: access amount of each file in the inter-00 object storage system. For another example, time period 10 of a certain day is counted separately: 00-12:00, and time period 14:00-16:00, the access amount of each file in the object storage system.
In the embodiment of the present disclosure, the comparison value of the access amount, that is, the second set value may be set in advance. For example, the second setting value may be set for one file. And comparing the statistics of the access amount of each file in the object storage system with a second set value. For example, the access amount of one file may be compared with a second set value. And setting the label of the file as a second migration label according to the comparison result. For example, in the case where the access amount of one file is larger than the second set value, the tag of the data block is set as the second migration tag. The second setting value may be set according to a user demand, or may be set as a default value.
According to the embodiment of the disclosure, the cold and hot labels of the files are set based on the access quantity of the files, so that the cold and hot degrees of the files in the object storage system can be distinguished according to the labels, the hot data is migrated into the distributed file system, and the reading speed of the hot data is improved.
In one embodiment, as shown in fig. 7, S602 migration of the fourth target file into the data node of the distributed file system based on the second migration tag of the fourth target file includes:
s703, dividing a fourth target file with a thermal tag in the object storage system into a plurality of data blocks;
S704, storing the segmented data blocks and copies thereof of the fourth target file into a plurality of data nodes of the distributed file system, and storing the segmented data blocks and copies thereof with the same content into different data nodes;
s705, deleting the fourth target file from the object storage system.
In the embodiment of the disclosure, if the second migration tag exists in the directory node, information such as an identifier of a fourth target file with the second migration tag may be searched in file directory information of the directory node in response to the data migration instruction. If the second migration label exists in the object storage system, after the object storage system is traversed to search the file with the second migration label, information such as identification, address and the like of a fourth target file with the second migration label is searched in file directory information of the directory node. The data migration instructions may be from the client, may be generated periodically, or may be generated at fixed points in time.
In the embodiment of the disclosure, the fourth target file in the object storage system is read based on the information such as the identification of the fourth target file. And cutting and encoding the fourth target file to obtain a data block of the fourth target file. The data blocks of the fourth target file are distributed in the data nodes of the file system. Each data block of the fourth target file may have one or more copies, respectively. The data blocks and their copies may be stored separately at different data nodes.
According to the embodiment of the disclosure, the file with the thermal tag can be obtained from the object storage system, the data block is obtained after the file is segmented and encoded, and then the file with the thermal tag is migrated to the distributed file system, so that the access efficiency to the thermal tag file is improved.
Fig. 8 is a flow chart of a file storage method according to another embodiment of the present disclosure. The method may include one or more features of the file storage method described above. In one embodiment, the method further comprises:
s801, monitoring file states in the object storage system and/or the distributed file system to obtain monitoring indexes of the object storage system; the monitoring index comprises at least one of the proportion of cold and hot labels, the number of files of the cold label files, the data volume of the cold label files and the space usage of the cold label files.
In the embodiment of the disclosure, the data monitoring module or the data monitoring module of the distributed file system not only can monitor or analyze the data blocks in the data nodes in the distributed file system, but also can monitor or analyze the complete files in the object storage system.
In the embodiment of the disclosure, the ratio of the cold and hot labels may be a ratio of the number of files or a ratio of the data amounts of the files. The number of files of the cold label file may include the number of files stored in the object storage system. The data volume of the cold label file may include the data volume of the file stored in the object storage system. The space usage of the cold label file may include a usage of storage space of the object storage system. For example, the number of cold label files is obtained by counting the complete files in the object storage system, the number of hot label files is obtained by counting the files to which the data blocks belong in the distributed file system, and the proportion of cold labels to hot labels is obtained by calculating the number of cold label files and the number of hot label files.
According to the embodiment of the disclosure, the distributed file system can monitor the object storage system, and the file storage state of the object storage system can be conveniently and quickly known through the monitoring index, so that the security of the file of the object storage system is improved.
In one application scenario, a block diagram of a distributed file system with an object storage system mounted is shown in fig. 9. For example, mounting an object storage system into an HDFS has the following features:
1. The file storage of a remote object such as a cold label is registered in the HDFS, and the file storage type of the object storage system is regarded as an Archive type.
2. The storage format for files in a directory node (also known as a name node) of the HDFS is modified. If a file is stored in the object storage system, the directory node stores the associated URL of the file in the object storage system, rather than information for the data shards. This provides great convenience because this portion of the data no longer needs to be fragmented and therefore the logic is very simple whether replicated or migrated. Data shards may also be referred to as data blocks, file blocks, and the like.
3. For data in cold storage (object storage system), after a client (client) issues a read request to a directory node (naminode), the naminode returns directly to the URL of an object store in the client, instead of the storage address in the data node (datainode).
For example, for a read-write request, the request content may be converted into an associated interface call instruction of the corresponding object storage system. Based on the interface call instruction, the content of the file can be written into the object storage system, and the file can be directly read from the object storage system.
For the case of a write request failure, reference may be made to the handling of missing blocks (missing blocks): the fsck tool is used in HDFS to handle the missing block case. It is desirable to retrofit fsck tools to support both the patrol of files on the object storage system and the patrol of data blocks on the distributed file system. The minimum unit of patrol on the object storage object is a file, and the minimum unit of patrol on the distributed file system is a data block.
4. In order to realize the automatic cold and hot conversion of the data, a migration (mover) tool can be called regularly to migrate the data. For example, in HDFS, a data block may default to a hot tag, one or more hot tags of the data block are modified to a cold tag according to the cold/hot degree of the data block, and files to which the data block with the cold tag belongs are migrated to the object storage system at regular time. For another example, the file with the thermal tag in the object storage system is migrated to the data node of the HDFS at regular time.
5. Accessing a data monitoring module (or data analysis module) of the object store, providing a monitoring service or analysis service specific to the cold data (e.g., files in the object store) based on the capabilities of the object store system, may ensure that the statistics of the cluster still contain files within the cold store (e.g., object store system).
FIG. 10 is a schematic diagram of a file storage device according to an embodiment of the present disclosure, the device comprising:
a first storage module 1001, configured to store file directory information in a directory node of a distributed file system;
the file directory information comprises a corresponding relation between an identification of a file and storage position information of the file in an object storage system; the data nodes of the distributed file system are used for storing data blocks obtained by dividing the first type of files, and the object storage system is used for storing the second type of files.
FIG. 11 is a schematic structural view of a file storage device according to another embodiment of the present disclosure, the device further comprising:
a first receiving module 1101, configured to receive a file write request from a client, where the file write request includes a first target file requested to be written by the client;
an obtaining module 1102, configured to store the first target file in the object storage system, and obtain storage location information of the first target file in the object storage system;
the first writing module 1103 is configured to write a correspondence between the first target file and storage location information of the first target file in the object storage system into the file directory information.
In one embodiment, as shown in fig. 11, the apparatus further comprises:
and a repair module 1104, configured to, when the file writing request fails to execute, perform a repair process on the file in the object storage system by using a file repair tool.
In one embodiment, a repair process is performed, including one of:
re-executing the file write request if the first target file does not exist in the object storage system;
deleting the file with the writing error in the object storage system and re-executing the file writing request under the condition that the first target file in the object storage system is in error;
and under the condition that the first target file in the object storage system is incomplete, performing incremental update on the file written in the object storage system.
In one embodiment, as shown in fig. 11, the apparatus further comprises:
a second receiving module 1105, configured to receive a file read request from a client, where the file read request includes information of a second target file that the client requests to read;
a searching module 1106, configured to search the file directory information for storage location information of the second target file in the object storage system based on the information of the second target file;
A sending module 1107, configured to send, to the client, storage location information of the second target file in the object storage system.
In one embodiment, as shown in fig. 11, the apparatus further comprises:
a first setting module 1108, configured to set a first migration tag of one or more data blocks in a data node of the distributed file system, where the first migration tag is used to indicate that a file to which the one or more data blocks belong is changed into a second type file;
a first migration module 1109, configured to migrate, based on a first migration tag of one or more data blocks, a file to which the one or more data blocks belong to the object storage system.
In one embodiment, as shown in fig. 10, the first setting module 1108 includes:
a first statistics sub-module 1110, configured to count an access amount of one or more data blocks in a data node of the distributed file system in a set period of time;
the first setting submodule 1111 is configured to set a first migration tag of the one or more data blocks having the access amount less than the first set value as a cold tag.
In one embodiment, as shown in fig. 11, the first migration module 1109 includes:
An obtaining sub-module 1112, configured to obtain, based on one or more data blocks having cold labels in data nodes of the distributed file system, information of a third target file to be migrated to which the one or more data blocks belong;
a reading sub-module 1113, configured to read each data block of the third target file from the data nodes of the distributed file system based on the information of the third target file;
a merging sub-module 1114, configured to merge each data block of the third target file to obtain the third target file;
a first storage sub-module 1115 is configured to store the third target file in the object storage system.
In one embodiment, as shown in fig. 10, the first migration module 1109 further includes:
a first delete submodule 1116 for deleting each data block of the third target file and its copy from the distributed file system;
a writing submodule 1117, configured to write a correspondence between the third target file and storage location information of the third target file in the object storage system into the file directory information.
In one embodiment, as shown in fig. 11, the apparatus further comprises:
a second setting module 1118, configured to set a second migration tag of a fourth target file to be migrated in the object storage system, where the second migration tag is used to indicate that the fourth target file is changed to a first type file;
The second migration module 1119 is configured to migrate the fourth target file to the data node of the distributed file system based on the second migration tag of the fourth target file.
In one embodiment, as shown in fig. 11, the second setting module 1118 includes:
a second statistics sub-module 1120, configured to count an access amount of one or more files in the object storage system in a set period of time;
the second setting sub-module 1121 is configured to set a second migration tag of the fourth target file having the access amount higher than the second set value as a thermal tag.
In one embodiment, as shown in fig. 11, the second migration module 1119 further includes:
a splitting submodule 1122, configured to split the fourth target file with the thermal tag in the object storage system into a plurality of data blocks;
the second storage sub-module 1123 is configured to store the data block and its copy of the segmented fourth target file into a plurality of data nodes of the distributed file system, and store the data block and its copy of the segmented data block with the same content into different data nodes;
a second deletion sub-module 1124 for deleting the fourth target file from the object storage system.
In one embodiment, as shown in fig. 11, the apparatus further comprises:
The monitoring module 1125 is configured to monitor a file state in the object storage system and/or the distributed file system, and obtain a monitoring index of the object storage system; the monitoring index comprises at least one of the proportion of cold and hot labels, the number of files of the cold label files, the data volume of the cold label files and the space usage of the cold label files.
For descriptions of specific functions and examples of each module and sub-module of the apparatus in the embodiments of the present disclosure, reference may be made to the related descriptions of corresponding steps in the foregoing method embodiments, which are not repeated herein.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 12 shows a schematic block diagram of an example electronic device 1200 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the apparatus 1200 includes a computing unit 1201, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other via a bus 1204. An input/output (I/O) interface 1205 is also connected to the bus 1204.
Various components in device 1200 are connected to I/O interface 1205, including: an input unit 1206 such as a keyboard, mouse, etc.; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208 such as a magnetic disk, an optical disk, or the like; and a communication unit 1209, such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1201 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The computing unit 1201 performs the various methods and processes described above, such as a file storage method. For example, in some embodiments, the file storage method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1200 via ROM 1202 and/or communication unit 1209. When a computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the file storage method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform the file storage method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. that are within the principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (29)

1. A method of storing a file, comprising:
storing file directory information in directory nodes of the distributed file system;
the file directory information comprises a corresponding relation between an identification of a file and storage position information of the file in an object storage system; the data nodes of the distributed file system are used for storing data blocks obtained by dividing the first type of files, and the object storage system is used for storing the second type of files.
2. The method of claim 1, further comprising:
receiving a file writing request from a client, wherein the file writing request comprises information of a first target file which is requested to be written by the client;
storing the first target file into the object storage system, and acquiring storage position information of the first target file in the object storage system;
and writing the corresponding relation between the first target file and the storage position information of the first target file in the object storage system into the file directory information.
3. The method of claim 2, further comprising:
and under the condition that the file writing request fails to be executed, a file repairing tool is used for carrying out inspection on the file in the object storage system, and repairing processing is executed.
4. A method according to claim 3, performing a repair process comprising one of:
re-executing the file write request if the first target file is not present in the object storage system;
deleting the file written with the error in the object storage system and re-executing the file writing request under the condition that the first target file in the object storage system is in error;
And under the condition that the first target file in the object storage system is incomplete, performing incremental update on the file written in the object storage system.
5. The method of any one of claims 1 to 4, further comprising:
receiving a file reading request from a client, wherein the file reading request comprises information of a second target file requested to be read by the client;
searching storage position information of the second target file in the object storage system in the file directory information based on the information of the second target file;
and sending the storage position information of the second target file in the object storage system to the client.
6. The method of any one of claims 1 to 5, further comprising:
setting a first migration tag of one or more data blocks in a data node of the distributed file system, wherein the first migration tag is used for indicating that a file to which the one or more data blocks belong is changed into a second file;
and migrating the file to which the one or more data blocks belong to the object storage system based on a first migration tag of the one or more data blocks.
7. The method of claim 6, setting a first migration tag for one or more data blocks in a data node of the distributed file system, comprising:
counting the access amount of one or more data blocks in the data nodes of the distributed file system in a set time period;
and setting a first migration tag of the one or more data blocks with the access quantity smaller than the first set value as a cold tag.
8. The method of claim 6, migrating, based on a first migration tag of one or more data blocks, a file to which the one or more data blocks belong into the object storage system, comprising:
acquiring information of a third target file to be migrated, to which the one or more data blocks belong, based on one or more data blocks with cold labels in data nodes of the distributed file system;
reading each data block of the third target file from the data nodes of the distributed file system based on the information of the third target file;
merging all data blocks of the third target file to obtain the third target file;
and storing the third target file into the object storage system.
9. The method of claim 8, migrating, based on a first migration tag of one or more data blocks, a file to which the one or more data blocks belong into the object storage system, further comprising:
deleting each data block and its copy of the third target file from the distributed file system;
and writing the corresponding relation between the third target file and the storage position information of the third target file in the object storage system into the file directory information.
10. The method of any one of claims 1 to 9, further comprising:
setting a second migration tag of a fourth target file to be migrated in the object storage system, wherein the second migration tag is used for indicating that the fourth target file is changed into a first type file;
and migrating the fourth target file to a data node of the distributed file system based on a second migration tag of the fourth target file.
11. The method of claim 10, setting a second migration tag of a fourth target file to be migrated in the object storage system, comprising:
counting the access quantity of one or more files in the object storage system in a set time period;
And setting a second migration tag of the fourth target file with the access amount higher than a second set value as a thermal tag.
12. The method of claim 10 or 11, migrating the fourth target file into a data node of the distributed file system based on a second migration tag of the fourth target file, further comprising:
dividing a fourth target file with a thermal tag in the object storage system into a plurality of data blocks;
storing the segmented data blocks and the copies thereof of the fourth target file into a plurality of data nodes of the distributed file system, and storing the segmented data blocks with the same content and the copies thereof into different data nodes;
deleting the fourth target file from the object storage system.
13. The method of any one of claims 1 to 12, further comprising:
monitoring the file states of the object storage system and/or the distributed file system to obtain monitoring indexes of the object storage system; the monitoring index comprises at least one of the proportion of cold and hot labels, the number of files of the cold label files, the data volume of the cold label files and the space usage of the cold label files.
14. A file storage device comprising:
a first storage module for storing file directory information in directory nodes of the distributed file system;
the file directory information comprises a corresponding relation between an identification of a file and storage position information of the file in an object storage system; the data nodes of the distributed file system are used for storing data blocks obtained by dividing the first type of files, and the object storage system is used for storing the second type of files.
15. The apparatus of claim 14, further comprising:
the first receiving module is used for receiving a file writing request from a client, wherein the file writing request comprises information of a first target file which is requested to be written by the client;
the acquisition module is used for storing the first target file into the object storage system and acquiring storage position information of the first target file in the object storage system;
and the first writing module is used for writing the corresponding relation between the first target file and the storage position information of the first target file in the object storage system into the file directory information.
16. The apparatus of claim 15, further comprising:
And the repair module is used for carrying out inspection on the file in the object storage system by using a file repair tool under the condition that the file writing request fails to be executed, and executing repair processing.
17. The apparatus of claim 16, performing a repair process comprising one of:
re-executing the file write request if the first target file is not present in the object storage system;
deleting the file written with the error in the object storage system and re-executing the file writing request under the condition that the first target file in the object storage system is in error;
and under the condition that the first target file in the object storage system is incomplete, performing incremental update on the file written in the object storage system.
18. The apparatus of any of claims 14 to 17, further comprising:
the second receiving module is used for receiving a file reading request from a client, wherein the file reading request comprises information of a second target file requested to be read by the client;
the searching module is used for searching storage position information of the second target file in the object storage system in the file directory information based on the information of the second target file;
And the sending module is used for sending the storage position information of the second target file in the object storage system to the client.
19. The apparatus of any of claims 14 to 18, further comprising:
the first setting module is used for setting a first migration tag of one or more data blocks in a data node of the distributed file system, wherein the first migration tag is used for indicating that a file to which the one or more data blocks belong is changed into a second type file;
and the first migration module is used for migrating the file to which the one or more data blocks belong to the object storage system based on the first migration label of the one or more data blocks.
20. The apparatus of claim 19, the first setup module comprising:
a first statistics sub-module, configured to count an access amount of one or more data blocks in a data node of the distributed file system in a set period of time;
and the first setting submodule is used for setting the first migration label of the one or more data blocks with the access quantity smaller than the first set value as a cold label.
21. The apparatus of claim 19, the first migration module comprising:
An obtaining sub-module, configured to obtain information of a third target file to be migrated, where the third target file is to be migrated and the one or more data blocks are located in the data nodes of the distributed file system, where the data blocks have cold labels;
a reading sub-module, configured to read each data block of the third target file from the data nodes of the distributed file system based on the information of the third target file;
the merging sub-module is used for merging all the data blocks of the third target file to obtain the third target file;
and the first storage sub-module is used for storing the third target file into the object storage system.
22. The apparatus of claim 21, the first migration module further comprising:
a first deleting sub-module, configured to delete each data block of the third target file and its copy from the distributed file system;
and the writing sub-module is used for writing the corresponding relation between the third target file and the storage position information of the third target file in the object storage system into the file directory information.
23. The apparatus of any of claims 14 to 22, further comprising:
The second setting module is used for setting a second migration tag of a fourth target file to be migrated in the object storage system, wherein the second migration tag is used for indicating that the fourth target file is changed into a first type file;
and the second migration module is used for migrating the fourth target file to the data node of the distributed file system based on the second migration label of the fourth target file.
24. The apparatus of claim 23, the second setup module comprising:
the second statistics sub-module is used for counting the access quantity of one or more files in the object storage system in a set time period;
and the second setting submodule is used for setting a second migration tag of the fourth target file with the access higher than the second set value as a thermal tag.
25. The apparatus of claim 24, the second migration module further comprising:
the molecule cutting module is used for cutting a fourth target file with a thermal tag in the object storage system into a plurality of data blocks;
the second storage sub-module is used for storing the data blocks and the copies thereof after the segmentation of the fourth target file into a plurality of data nodes of the distributed file system, and storing the data blocks and the copies thereof with the same content after the segmentation into different data nodes;
And the second deleting sub-module is used for deleting the fourth target file from the object storage system.
26. The apparatus of any of claims 14 to 25, further comprising:
the monitoring module is used for monitoring the file states in the object storage system and/or the distributed file system to obtain monitoring indexes of the object storage system; the monitoring index comprises at least one of the proportion of cold and hot labels, the number of files of the cold label files, the data volume of the cold label files and the space usage of the cold label files.
27. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.
29. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-13.
CN202311346100.5A 2023-10-17 2023-10-17 File storage method, device, equipment and storage medium Pending CN117453647A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311346100.5A CN117453647A (en) 2023-10-17 2023-10-17 File storage method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311346100.5A CN117453647A (en) 2023-10-17 2023-10-17 File storage method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117453647A true CN117453647A (en) 2024-01-26

Family

ID=89595824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311346100.5A Pending CN117453647A (en) 2023-10-17 2023-10-17 File storage method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117453647A (en)

Similar Documents

Publication Publication Date Title
KR102240557B1 (en) Method, device and system for storing data
CN108519862B (en) Storage method, device and system of block chain system and storage medium
US9195673B2 (en) Scalable graph modeling of metadata for deduplicated storage systems
US8898120B1 (en) Systems and methods for distributed data deduplication
US8103621B2 (en) HSM two-way orphan reconciliation for extremely large file systems
US10642837B2 (en) Relocating derived cache during data rebalance to maintain application performance
CN107704202B (en) Method and device for quickly reading and writing data
CN109800218B (en) Distributed storage system, storage node device and data deduplication method
US10956499B2 (en) Efficient property graph storage for streaming/multi-versioning graphs
CN102938784A (en) Method and system used for data storage and used in distributed storage system
CN106909597B (en) Database migration method and device
US11604808B2 (en) Methods, electronic devices and computer program product for replicating metadata
CN104881466A (en) Method and device for processing data fragments and deleting garbage files
CN105227672A (en) The method and system that data store and access
CN113760847A (en) Log data processing method, device, equipment and storage medium
US9684668B1 (en) Systems and methods for performing lookups on distributed deduplicated data systems
CN109542860B (en) Service data management method based on HDFS and terminal equipment
CN111930684A (en) Small file processing method, device and equipment based on HDFS (Hadoop distributed File System) and storage medium
US10083121B2 (en) Storage system and storage method
US10705752B2 (en) Efficient data migration in hierarchical storage management system
CN117453647A (en) File storage method, device, equipment and storage medium
CN115113798B (en) Data migration method, system and equipment applied to distributed storage
CN113760600B (en) Database backup method, database restoration method and related devices
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
US11132401B1 (en) Distributed hash table based logging service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination