Distributed file storage system and method based on FastDFS + Redis
Technical Field
The invention relates to the technical field of computers, in particular to a distributed file storage system and method based on FastDFS + Redis.
Background
With the development of the internet, the arrival of a big data age is promoted. In the process, structured data is growing at an explosive rate in addition to unstructured data. Structured data typically refers to small files that are between 2KB-1MB in size. The storage of file data comprises the storage of file metadata and file contents, the storage of the file metadata usually adopts MySQL at present, and the access performance is sharply reduced when the data scale is larger and larger; the FastDFS is used as a lightweight open-source distributed file system and is suitable for storage of small and medium files, but the FastDFS stores data on a disk, the data needs to be read from the disk every time of access, and the query performance is reduced due to multiple IO. In order to reduce the access times of a disk and improve the reading speed of a file, Redis is introduced as a file cache service in an optional scheme, but the Redis is a database based on a memory, and the problems of overlarge memory space occupation and low query performance can be caused by directly utilizing the Redis cache file, so that the method cannot be applied to large-scale file data; in addition, when Redis is used as a cache service, 6 native cache elimination strategies provided by Redis have low cache hit rate for random and periodic queries.
Disclosure of Invention
In view of this, the present invention aims to provide a distributed file storage system and method based on FastDFS + Redis, which achieve more efficient storage and reading and writing of file data, solve the problems of too long length of character strings and large occupied memory space when the Redis caches files, improve the utilization rate of memory space, and have a good cache hit rate.
In order to achieve the purpose, the invention adopts the following technical scheme:
a distributed file storage system based on FastDFS + Redis comprises a database and a middleware; the middleware comprises a storage module, a query module and a deletion module; the storage module realizes distributed storage of mass file data by using a FastDFS cluster; the query module utilizes a Redis cluster to realize high-performance file query based on distributed cache; the deletion module provides a deletion function of the distributed file.
Further, the storage module comprises a file uploading component, a cache compression component and a cache replacement component.
The file uploading method of the distributed file storage system based on the FastDFS + Redis comprises the following steps:
step 1: acquiring file content and file metadata information according to the designated path, and uploading the file to FastDFS;
step 2: judging whether the file uploading is successful or not, if so, throwing out the exception, and ending;
and step 3: writing the returned FileID and metadata after successful uploading into Redis;
and 4, step 4: judging whether Redis data writing is successful or not, and if so, uploading the file successfully;
and 5: judging whether the Redis writing times are smaller than a system default setting value or not, and turning to the step 3 if the Redis writing times are smaller than the system default setting value;
step 6: the file is deleted from the FastDFS and an upload failure is returned.
Further, the metadata storage of the file specifically includes: and setting character strings formed by splicing the three fields of the owner, the file name and the file type in the metadata as keys, and writing the character strings formed by splicing the rest fields and the FileID into Redis in a key-value mode as values corresponding to the keys.
Further, the cache compression component encodes the file to be cached by using Base64, compresses the encoded file by using a Gzip algorithm, and finally writes the encoded and compressed file into Redis.
A cache replacement method of a distributed file storage system based on FastDFS + Redis comprises the following steps:
step 1: recording the use times of the file by using a history queue;
step 2: writing the file index into a history access queue, and adding 1 to the use times of the record if the index of the file already exists; if the file index does not exist, writing the file index into an access queue, and setting the use times to be 1;
and step 3: when the using times of the file reach k times, writing the file after the coding compression into a Redis cache;
and 4, step 4: judging whether the cache capacity is smaller than a preset threshold value of the system, if so, turning to step 6;
and 5: according to the file use frequency, carrying out elimination selection of cache;
step 6: and writing the encoded and compressed file data into Redis.
Preferably, the file usage frequency calculation formula is as follows:
wherein
The frequency of use of the file is indicated,
the period is represented by the number of cycles,
indicating the number of uses of the file in the period,
indicating the frequency of use of the file in the last cycle,
indicating the number of cycles.
Further, the query module queries Redis according to the query condition, and if the cache of the file exists in the Redis, the file is decompressed by a Gzip algorithm and decoded by Base 64; otherwise, the file is obtained by inquiring FastDFS by using the FileID, the recorded inquiry times are updated, and whether the execution conditions of the cache replacement algorithm are met or not is judged.
A deleting method of a distributed file storage system based on FastDFS + Redis comprises the following steps:
step 1: querying Redis according to the deletion condition to obtain the FileID of the file;
step 2: judging whether a cache of the file exists in Redis, and turning to the step 6 if the cache of the file exists in the Redis;
and step 3: deleting the files saved on the FastDFS according to the FileID;
and 4, step 4: judging whether the file deletion is successful or not, and ending if the file deletion is successful;
and 5: judging whether the attempted deletion frequency is smaller than a preset threshold value of the system, if so, adding one to the deletion frequency, and turning to the step 3; otherwise, ending;
step 6: copying the file as a copy;
and 7: executing the operation of simultaneous deletion, and deleting the cache on Redis and the file on FastDFS respectively;
and 8: judging whether the deletion is successful at the same time, if so, finishing the deletion;
and step 9: judging whether the attempted deletion frequency is smaller than a preset threshold value of the system, if so, adding one to the deletion frequency, and turning to the step 7;
step 10: and rolling back the deletion operation, rewriting the copy saved in advance into the middleware, and ending.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention realizes the distributed storage of mass file data by using the FastDFS cluster, and realizes the high-performance file query based on the distributed cache by using the Redis cluster, so that the system has good query performance.
2. The invention provides a file compression caching strategy based on Base64 and Gzip compression algorithms, solves the problems of overlong character string length and large memory space occupation when a Redis caches files, and improves the utilization rate of the memory space.
3. The invention designs a cache replacement algorithm based on the file use frequency, so that the system has good cache hit rate.
Drawings
FIG. 1 is a schematic diagram of the system architecture of the present invention;
FIG. 2 is a flowchart illustrating file upload according to an embodiment of the present invention;
FIG. 3 is a flow chart of cache write in an embodiment of the present invention;
FIG. 4 is a flow diagram of a query module in accordance with an embodiment of the present invention;
FIG. 5 is a flow chart of a delete module in an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
Referring to fig. 1, the present invention provides a distributed file storage system based on FastDFS + Redis, which includes a database and a middleware; the middleware comprises a storage module, a query module and a deletion module; the storage module realizes distributed storage of mass file data by using a FastDFS cluster; the query module utilizes a Redis cluster to realize high-performance file query based on distributed cache; the deletion module provides a deletion function of the distributed file. The storage module comprises a file uploading component, a cache compression component and a cache replacement component.
Referring to fig. 2, in the present embodiment, the file upload component implements storing file metadata using a Redis cluster and storing files using a FastDFS cluster; the uploading of the file comprises the storage of the file content and the file metadata, and the file uploading can be determined to be successful only if the file content and the file metadata are stored successfully. If any one of the items fails to be uploaded, the file uploading failure is represented. The method can attempt uploading for multiple times within a threshold preset by a system, and represents that the file uploading finally fails after the threshold is reached, and specifically comprises the following steps:
step 1: acquiring file content and file metadata information according to the designated path, and uploading the file to FastDFS;
step 2: judging whether the file uploading is successful or not, if so, throwing out the exception, and ending;
and step 3: writing the returned FileID and metadata memory into Redis after the file is uploaded successfully;
and 4, step 4: judging whether Redis data writing is successful or not, and if so, uploading the file successfully;
and 5: judging whether the Redis writing times are smaller than a system default setting value or not, and turning to the step 3 if the Redis writing times are smaller than the system default setting value;
step 6: the file is deleted from the FastDFS and an upload failure is returned.
In this embodiment, the metadata storage of the file specifically includes: and setting character strings formed by splicing the three fields of the owner, the file name and the file type in the metadata as keys, and writing the character strings formed by splicing the rest fields and the FileID into Redis in a key-value mode as values corresponding to the keys.
In this embodiment, the cache compression component encodes the file to be cached using Base64, performs Gzip compression on the encoded file, and finally writes the encoded and compressed file into Redis.
Referring to fig. 3, in this embodiment, the cache replacement component calculates a final use frequency value of a file by using a current period use frequency and a historical use frequency value of the file, writes the encoded and compressed file into Redis when the number of use times of the file reaches k times, and performs cache elimination selection according to the level of the frequency value if the cache space is full.
The specific cache replacement method comprises the following steps:
step 1: recording the use times of the file by using a history queue;
step 2: writing the file index into a history access queue, and adding 1 to the use times of the record if the index of the file already exists; if the file index does not exist, writing the file index into an access queue, and setting the use times to be 1;
and step 3: when the using times of the file reach k times, writing the file after the coding compression into a Redis cache;
and 4, step 4: judging whether the cache capacity is smaller than a preset threshold value of the system, if so, turning to step 6;
and 5: according to the file use frequency, carrying out elimination selection of cache;
step 6: and writing the encoded and compressed file data into Redis.
Preferably, the file usage frequency calculation formula is as follows:
wherein
The frequency of use of the file is indicated,
the period is represented by the number of cycles,
indicating the number of uses of the file in the period,
indicating the frequency of use of the file in the last cycle,
indicating the number of cycles.
Referring to fig. 4, in this embodiment, the query module queries Redis according to a query condition, and if the file cache exists in the Redis, returns the file decompressed by Gzip algorithm and decoded by Base 64; otherwise, the file is obtained by inquiring FastDFS by using the FileID, the recorded inquiry times are updated, and whether the execution conditions of the cache replacement algorithm are met or not is judged. The method specifically comprises the following steps:
step 1: and generating a Key according to the query condition input by the user and querying Redis.
Step 2: and judging whether the file exists in the Redis, and if not, turning to the step 5.
And step 3: the recorded number of queries is updated.
And 4, step 4: and returning the file which is decompressed by the Gzip algorithm and decoded by the Base 64.
And 5: redis is queried to obtain FileID.
Step 6: query FastDFS according to FileID, return file.
Referring to fig. 5, in this embodiment, the deletion module determines the location where the file is stored by a query operation to perform a deletion operation, and when the deleted file is a hot file, the file needs to be synchronously deleted from the Redis and the FastDFS; when the file is a cold file, the file need only be deleted on the FastDFS. The file deletion method can try to delete for multiple times, and when a preset threshold value of a system is reached, the file deletion finally fails, and the method specifically comprises the following steps:
step 1: querying Redis according to the deletion condition to obtain the FileID of the file;
step 2: judging whether a cache of the file exists in Redis, and turning to the step 6 if the cache of the file exists in the Redis;
and step 3: deleting the files saved on the FastDFS according to the FileID;
and 4, step 4: judging whether the file deletion is successful or not, and ending if the file deletion is successful;
and 5: judging whether the attempted deletion frequency is smaller than a preset threshold value of the system, if so, adding one to the deletion frequency, and turning to the step 3; otherwise, ending;
step 6: copying the file as a copy;
and 7: executing the operation of simultaneous deletion, and deleting the cache on Redis and the file on FastDFS respectively;
and 8: judging whether the deletion is successful at the same time, if so, finishing the deletion;
and step 9: judging whether the attempted deletion frequency is smaller than a preset threshold value of the system, if so, adding one to the deletion frequency, and turning to the step 7;
step 10: and rolling back the deletion operation, rewriting the copy saved in advance into the middleware, and ending.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.