CN112416880A

CN112416880A - Method and device for optimizing storage performance of mass small files based on real-time merging

Info

Publication number: CN112416880A
Application number: CN202110090701.9A
Authority: CN
Inventors: 杨鹏; 杨波
Original assignee: Nanjing Qunding Technology Co ltd
Current assignee: Nanjing Qunding Technology Co ltd
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2021-02-26

Abstract

The invention provides a method and a device for optimizing storage performance of massive small files based on real-time merging, and relates to the field of computer storage. The method and the device for optimizing the storage performance of the mass small files based on real-time merging are disclosed, wherein the method comprises the following steps: receiving a file data storage request sent by a client, and writing files to be stored into an SSD and HDD storage resource pool in a layered manner in an object storage manner; when the file to be stored is a large file, storing the file to be stored in the HDD storage resource pool, when the file to be stored is a small file, storing the file to be stored in the SSD storage resource pool, and storing metadata information of the file in a file index pool; and generating a global unique identifier key of the file according to the metadata information. The invention can improve the writing speed of the small files and realize reasonable storage of a large number of small files.

Description

Method and device for optimizing storage performance of mass small files based on real-time merging

Technical Field

The invention relates to the field of computer storage, in particular to a method and a device for optimizing storage performance of massive small files based on real-time merging.

Background

With the rapid development of the industries such as the internet, the internet of things, cloud computing, big data and the like, the number of various files such as audio and video, pictures, logs and the like is exponentially increased, a large number of files need to be continuously uploaded by terminal equipment, wherein the number of small files below 1M can reach millions, millions and even hundreds of millions, and the files are defined as massive small files. The requirement on the writing performance of the terminal equipment and the reading performance of the mass small files are high, and how to reasonably store the mass small files is significant to the sustainable development of the current big data era.

The traditional file storage system is mainly based on a tree directory hierarchy structure and has limited expandability. Moreover, the large amount of small files can cause the depth of the directory tree to be increased, the balance efficiency of the directory tree is seriously influenced, and when large-scale concurrency occurs, the access performance is limited. In addition, HDD disks are generally used to store small files at present. Therefore, there is a need for a method for solving the problems that the existing file storage directory tree is not suitable for large-scale file storage and the storage efficiency of small files in a HDD disk is low.

Disclosure of Invention

The invention aims to provide a method for optimizing the storage performance of mass small files based on real-time merging, which can realize reasonable storage of the mass small files and improve the storage efficiency of the small files.

The invention also aims to provide a storage performance optimization method device for massive small files based on real-time merging, which can reasonably store massive small files and improve the access efficiency when a large number of users access simultaneously.

The embodiment of the invention is realized by the following steps:

in a first aspect, an embodiment of the present application provides a method for optimizing storage performance of a large number of small files based on real-time merging, where the method includes S1: receiving a file data storage request sent by a client, and writing files to be stored into an SSD and HDD storage resource pool in a layered manner in an object storage manner;

when the file to be stored is a small file, storing the file into the SSD storage resource pool, and storing metadata information of the file in a file index pool; generating a global unique identifier key of the file according to the metadata information;

s2: setting a threshold value of the merging quantity of the small files and a threshold value of the total number of merging bytes, triggering a document merging process of a server in real time when the merging quantity of the small files stored in an SSD storage resource pool or the total number of the bytes exceeds the threshold value, extracting the content of the existing small files by a server background, merging the content into a large file, and storing the large file into the HDD storage resource pool;

s3: after the small files are merged and filed, data positioning information is added according to the metadata information mapped to the file index pool by the identification key;

s4: after the metadata information of the small files is updated, deleting the merged small files in the SSD data pool;

s5: the server receives the file data access request sent by the client, if the metadata information does not have the data positioning information, the identifier key is analyzed according to the metadata information, and the corresponding file content is accessed from the SSD storage resource pool through the identifier key; and if the metadata information has the data positioning information, finding the large file after the small files are merged from the HDD storage resource pool according to the data positioning information, and extracting the file content of the small files from the large file.

In a second aspect, an embodiment of the present application provides an apparatus for optimizing storage performance of a large number of small files based on real-time merging, including:

a data request receiving module: the system comprises a server, a client and a server, wherein the server is used for receiving file data writing or access requests sent by the client;

the file data hierarchical storage module: generating a unique identity key and a file content data value corresponding to the key according to the received metadata information of the file to be stored; dividing the file into a large file and a small file by taking 1M byte number as a boundary, writing a value corresponding to the large file into an HDD storage resource pool, and writing a value corresponding to the small file into an SSD storage resource pool;

the small file data merging module: setting triggering conditions of a small file merging process, namely a small file number threshold and a small file byte total number threshold; when the number of small files or the number of bytes in the SSD storage resource pool exceeds a threshold value, a server document merging process is triggered in real time, the small files in the resource pool are merged into a large file and written into the HDD storage resource pool; in the triggering condition of the small file merging process, the upper limit threshold of the number of the small files cannot exceed the maximum concurrent access amount of the SSD storage resource pool, and the upper limit threshold of the number of bytes of the small files cannot exceed the size of the storage space of the SSD storage resource pool;

the small file data updating module: mapping the key value of the merged small file to corresponding metadata, and adding data positioning information, wherein the data positioning information comprises any one or more of the name and the path of the merged large file, and the position offset and the size of the small file in the large file;

the small file original data deleting module: after the small files are merged and filed and the metadata information is updated successfully, the original file data of the small files are deleted from the SSD storage resource pool, so that the reliability and the safety of the file data are ensured, and the storage space of the SSD storage resource pool can be released in time;

a file data reading module: file metadata information is provided according to the data access request information, keys are generated, and file contents are read from the resource pool according to the key identifiers, wherein: the large file is directly read from the HDD storage resource pool; directly reading the small files which are not merged from the SSD storage resource pool; the merged small files need to be mapped to corresponding metadata of the files in the index pool according to keys, data positioning information is extracted, the merged large files are found, and the small file contents at corresponding positions are extracted from the merged large files.

Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:

with respect to the first aspect: the invention aims to provide a method for optimizing the storage performance of mass small files based on real-time merging, which selects an object storage mode to store files hierarchically according to received client file storage request information, so that large files with larger capacity are directly written into an HDD storage resource pool, small files with smaller capacity are written into an SSD storage resource pool, the files with different capacities are read by using different storage modes, the reading efficiency of the small files can be improved, and the large files can be conveniently and quickly read and uniformly managed through the HDD storage resource pool. When the small files are accumulated to a fixed number or byte number, the background extracts the small files in the SSD storage resource pool and merges the small files into a large file to be written into the HDD storage resource pool, so that the normal use of the client side is guaranteed. And finally, deleting the merged original small file data in the SSD storage resource pool, not affecting the access performance of the file, and simultaneously releasing the storage space of the SSD storage resource pool, so that the SSD storage resource pool is efficiently recycled, and the cost requirement is reduced. The invention can realize reasonable storage of a large amount of small files, solves the problem that the existing file storage directory tree is not suitable for large-scale file storage, and improves the storage efficiency of the small files, thereby solving the problem of low storage efficiency of small files using HDD disks.

With respect to the second aspect: the invention aims to provide a device for optimizing the storage performance of mass small files based on real-time merging, the working principle and the beneficial effects of the device are the same as those of the first aspect, and repeated description is not needed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a schematic flow chart of a method for optimizing storage performance of a large number of small files based on real-time merging according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating a principle of a method for optimizing storage performance of a large number of small files based on real-time merging according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.

Example 1

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for optimizing storage performance of a large number of small files based on real-time merging according to an embodiment of the present application. The method for optimizing the storage performance of the mass small files based on real-time merging comprises the following steps of S1: the method comprises the steps that a server receives a file data storage request sent by a client, and files to be stored are written into an SSD and HDD storage resource pool in a layered mode through object storage;

in step S1, storing the file to be stored in the HDD storage resource pool if the file to be stored is a large file, storing the file to be stored in the SSD storage resource pool if the file to be stored is a small file, and storing metadata information of the file in the file index pool; and generating a global unique identifier key of the file according to the metadata information.

In detail, the server receives a file data storage request sent by the client through the network, and divides the file to be stored into different resource pools by an object storage mode. In detail, when the file to be stored is a large-capacity file, the file is written into the HDD storage resource pool, otherwise, the file is written into the SSD storage resource pool. And storing metadata information of the files to be stored by using the file index pools of the HDD storage resource pool and the SSD storage resource pool, and generating identification keys of different files according to the metadata information, so that the unique files can be indexed by using the keys.

In detail, a value of metadata information corresponding to the identifier key may be generated according to the file content of the file to be stored, the identifier key and the value are mapped one to one, and the corresponding file content data may be directly accessed through the identifier key.

S2: setting a threshold value of the merging quantity of the small files and a threshold value of the total number of the merging bytes, triggering a document merging process of a server in real time when the merging quantity of the small files stored in an SSD storage resource pool or the total number of the bytes exceeds the threshold value, extracting the content of the existing small files by a server background, merging the content into a large file, and storing the large file into the HDD storage resource pool.

In detail, in step S2: and setting a threshold value of the merging quantity of the small files and a threshold value of the total number of bytes, so that when any one of the quantity of the small files or the total number of bytes stored in the SSD storage resource pool reaches the corresponding threshold value, a document merging process of the server is triggered, namely the server extracts the content of the small files in the SSD storage resource pool and merges the small files into a large file, and the merged large file is stored in the HDD storage resource pool.

S3: and after the small files are merged and filed, increasing data positioning information according to the metadata information mapped into the file index pool by the identification key.

In detail, after the small files are merged and stored, the identification keys of the small files are mapped to the corresponding metadata information in the file index pool to increase the data positioning information of the small files in the metadata information, so that the small files can be conveniently searched through the data positioning information.

S4: and after the metadata information of the small file is updated, deleting the small file merged in the SSD data pool.

In detail, after the data positioning information is added to the metadata information, the content data of the small file is deleted, so that the storage space of the SSD data pool is released.

S5: the server receives the file data access request sent by the client, if the metadata information does not have the data positioning information, the identifier key is analyzed according to the metadata information, and the corresponding file content is accessed from the SSD storage resource pool through the identifier key; if the metadata information has the data positioning information, the file is a small file which is merged and filed, the large file after merging the small file needs to be found from the HDD storage resource pool according to the data positioning information, and then the file content of the small file is extracted from the corresponding position in the large file.

In detail, a file data access request of a client is received through a server, the server obtains data positioning information to be accessed through the file data access request, and if the file data access request has the data positioning information, an identification key is analyzed according to the data positioning information, so that corresponding file content is accessed from an SSD storage resource pool through the identification key. And generating data positioning information according to the metadata information mapped into the file index pool by the identification key, thereby acquiring the metadata information according to the data positioning information, finding the large file formed by merging the small files from the HDD storage resource pool through the metadata information, and extracting the corresponding file content from the large file. And when the file data access request does not have the data positioning information, directly acquiring the corresponding file content in the SSD data pool through the access request.

In some embodiments of the present invention, the step S1 includes: the metadata information includes any one or more items of a file identification number, a file generation time node, a file size and a file type.

In detail, the metadata information in step S1 includes any one or more of a file identification number of the small file, a time node of file generation, a file size, and a file type. And obtaining the file content corresponding to the identification key through the metadata information.

In some embodiments of the present invention, the step S1 includes: and according to the metadata information of the file to be stored, generating the identification key of the file by splicing the character strings according to the identity identification number and the file generation time.

In detail, the identification key in step S1 is formed by combining the identification number and the file generation time, and the metadata information is acquired by the identification key, thereby mapping to the value of the file content.

In some embodiments of the present invention, the step S1 includes: setting a threshold value of the number of layered bytes of a file, when the number of layered bytes of the file to be stored is larger than the threshold value, judging the file to be stored as a large file, otherwise, judging the file to be stored as a small file.

In detail, the server sets a threshold of the number of layered bytes of the file, so that when the number of layered bytes of the file to be stored is greater than the threshold, the file to be stored is determined to be a large file, and otherwise, the file to be stored is determined to be a small file. And writing the files to be stored into the SSD storage resource pool or the HDD storage resource pool in a layered manner according to the fact that the files to be stored are large files or small files.

In some embodiments of the present invention, the above-mentioned threshold of the number of layered bytes of the file is set to 1M. Therefore, further merging is carried out according to the small files stored in a layered mode, and file content can be conveniently searched according to data positioning information.

In some embodiments of the present invention, the step S1 includes: generating a value corresponding to the identifier key according to the file content; mapping the values of the identification key one by one; and storing the file content value of the file in the HDD storage resource pool or the SSD storage resource pool.

In detail, the file to be stored is judged to be large-capacity or small-capacity according to the set threshold of the number of layered bytes, so that a value mapped correspondingly to the identification key is generated according to the file content, and the file content is hierarchically stored in the storage resource pool at the corresponding position by utilizing different types of identification keys.

In some embodiments of the present invention, the step S2 includes setting the threshold of the merging number of the small files to be 500, and the threshold of the total number of merging bytes to be 100M.

In detail, in step S2, the threshold of the number of merged small files is set to 500, and the threshold of the total number of bytes is set to 100M capacity, so that it is determined whether the small files need to be merged into a large file according to the threshold.

In some embodiments of the present invention, the step S2 includes persistently storing the large file to the HDD storage resource pool in an EC mode.

In detail, in step S2, the large file is stored in an electronic communication manner, so as to improve the stability of storing the large file.

In some embodiments of the present invention, in the step S3, the data positioning information includes any one or more of a name, a path, a position offset and a size of the large file in which the small file is merged.

In detail, the data positioning information includes data information after merging of the small files, including any one or more of a stored large file name, a stored path, and a position offset and a size of the small file in the large file, so that the metadata information mapped by the identification key is conveniently searched.

Example 2

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a principle of a device for optimizing storage performance of a large number of small files based on real-time merging according to an embodiment of the present application. A device for optimizing storage performance of massive small files based on real-time merging comprises:

The principle and advantageous effects of the above embodiment are the same as those of embodiment 1, and a repeated description thereof is not necessary.

It is to be understood that the flow or structure shown in fig. 1-2 is only illustrative, and the method or apparatus for optimizing storage performance of mass small files based on real-time merging may further include more or less components than those shown in fig. 1/2, or have a different configuration than that shown in fig. 1/2. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

To sum up, the embodiment of the present application provides a method and an apparatus for optimizing storage performance of a large number of small files based on real-time merging:

according to the file storage method and device, the files are stored in a layered mode in an object storage mode according to the received client file storage request information, so that large files with large capacity are directly written into the HDD storage resource pool, small files with small capacity are written into the SSD storage resource pool, the files with different capacities are read by using different storage modes, the reading efficiency of the small files can be improved, and the large files can be conveniently and quickly read and uniformly managed through the HDD storage resource pool. When the small files are accumulated to a fixed number or byte number, the background extracts the small files in the SSD storage resource pool and merges the small files into a large file to be written into the HDD storage resource pool, so that the normal use of the client side is guaranteed. And finally, deleting the merged original small file data in the SSD storage resource pool, not affecting the access performance of the file, and simultaneously releasing the storage space of the SSD storage resource pool, so that the SSD storage resource pool is efficiently recycled, and the cost requirement is reduced. The invention can realize reasonable storage of a large amount of small files, solves the problem that the existing file storage directory tree is not suitable for large-scale file storage, and improves the storage efficiency of the small files, thereby solving the problem of low storage efficiency of small files using HDD disks.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for optimizing storage performance of mass small files based on real-time merging is characterized by comprising the following steps of S1: receiving a file data storage request sent by a client, and writing files to be stored into an SSD and HDD storage resource pool in a layered manner in an object storage manner;

when the file to be stored is a large file, storing the file to be stored in the HDD storage resource pool, when the file to be stored is a small file, storing the file to be stored in the SSD storage resource pool, and storing metadata information of the file in a file index pool; generating a global unique identifier key of the file according to the metadata information;

s5: the server receives the file data access request sent by the client, if the metadata information does not have the data positioning information, the identification key is analyzed according to the metadata information, and the corresponding file content is accessed from the SSD storage resource pool through the identification key; and if the metadata information has the data positioning information, finding the large file after the small files are merged from the HDD storage resource pool according to the data positioning information, and then extracting the file content of the small files from the large file.

2. The method for optimizing the storage performance of the mass small files based on real-time merging as claimed in claim 1, wherein step S1 comprises: the metadata information comprises any one or more items of a file identity identification number, a file generation time node, a file size and a file type.

3. The method for optimizing the storage performance of the mass small files based on real-time merging as claimed in claim 1, wherein step S1 comprises: and according to the metadata information of the file to be stored, generating the identification key of the file by splicing the character strings according to the identification number and the file generation time.

4. The method for optimizing the storage performance of the mass small files based on real-time merging as claimed in claim 1, wherein step S1 comprises: setting a threshold value of the number of layered bytes of a file, when the number of layered bytes of the file to be stored is larger than the threshold value, judging the file to be stored as a large file, otherwise, judging the file to be stored as a small file.

5. The method for optimizing the storage performance of the mass small files based on real-time merging as claimed in claim 4, wherein the threshold of the number of layered bytes of a file is set to be 1M.

6. The method for optimizing the storage performance of the mass small files based on real-time merging as claimed in claim 1, wherein step S1 comprises: generating a value corresponding to the identification key according to the file content; the values of the identification keys are mapped one by one, and the corresponding file content data can be directly accessed through the identification keys; and storing the file content value of the file in the HDD storage resource pool or the SSD storage resource pool.

7. The method as claimed in claim 1, wherein step S2 includes setting the threshold of merging number of the small files to be 500, and the threshold of total number of merged bytes to be 100M.

8. The method for optimizing the storage performance of the mass small files based on real-time merging as claimed in claim 1, wherein step S2 includes persistently storing the large file to the HDD storage resource pool in an EC mode.

9. The method as claimed in claim 1, wherein in step S3, the data location information includes any one or more of a name, a path, a position offset and a size of the small file in the large file after merging the small file.

10. A device for optimizing the storage performance of massive small files based on real-time merging is characterized by comprising the following components:

the small file original data deleting module: after the small files are merged and filed and the metadata information is updated successfully, the original file data are deleted from the SSD storage resource pool;