CN111930555B - Erasure code based file processing method and device and computer equipment - Google Patents

Erasure code based file processing method and device and computer equipment Download PDF

Info

Publication number
CN111930555B
CN111930555B CN202010911451.6A CN202010911451A CN111930555B CN 111930555 B CN111930555 B CN 111930555B CN 202010911451 A CN202010911451 A CN 202010911451A CN 111930555 B CN111930555 B CN 111930555B
Authority
CN
China
Prior art keywords
file
cold
group
cold file
hot data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010911451.6A
Other languages
Chinese (zh)
Other versions
CN111930555A (en
Inventor
赵芳明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010911451.6A priority Critical patent/CN111930555B/en
Publication of CN111930555A publication Critical patent/CN111930555A/en
Application granted granted Critical
Publication of CN111930555B publication Critical patent/CN111930555B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The invention discloses a method, a device and computer equipment for processing files based on erasure codes, wherein the method is applied to a FastDFS system and comprises the following steps: acquiring a cold file in a hot data set of a storage server; partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file; coding the file block according to an erasure code technology to obtain a check block corresponding to the file block; storing the file block and the check block into a cold data group of the storage server; and if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request. The invention is based on the hierarchical storage technology, improves the storage utilization rate of the disk by optimizing the files stored in the FastDFS system, reduces the storage cost and improves the operation and maintenance efficiency.

Description

Erasure code based file processing method and device and computer equipment
Technical Field
The invention belongs to the technical field of distributed storage, and particularly relates to a file processing method and device based on erasure codes and computer equipment.
Background
The FastDFS is an open-source lightweight distributed file Storage system developed in C language and composed of a tracking Server (Tracker Server), a Storage Server (Storage Server) and a Client (Client). The storage servers are divided into a plurality of groups (groups) for actually storing data, and high availability of the data is ensured by mutual backup of storage in the groups. As the N storage data in the same group are the same, the storage device can tolerate N-1 storage faults, but the storage utilization rate is 1/N. The distributed file storage system such as the Hdfs and the Ceph encodes n original data blocks through an erasure code technology to generate m check blocks, and stores the n + m data blocks into the system, so that the system is ensured to be usable while errors of the m data blocks can be tolerated, and the storage efficiency is reduced to n/n + m. Because the Hdfs and the Ceph are respectively designed for storing the large file and the object file, the multi-copy mode is mainly adopted on the FastDFS, so that the high efficiency and the availability of data are ensured, and the storage of a disk cannot be fully utilized.
Disclosure of Invention
The embodiment of the invention provides a method, a device and computer equipment for processing files based on erasure codes, which solve the problem that files in a small file storage system FastDFS cannot be coded through an erasure code technology in the prior art so as to fully utilize the storage of a disk.
In a first aspect, an embodiment of the present invention provides a file processing method based on erasure codes, which includes:
acquiring a cold file in a hot data group of a storage server;
partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file;
coding the file block according to an erasure code technology to obtain a check block corresponding to the file block;
storing the file block and the check block into a cold data group of the storage server;
and if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request.
In a second aspect, an embodiment of the present invention provides an erasure code-based file processing apparatus, which includes:
the acquisition unit is used for acquiring cold files in the hot data set of the storage server;
the blocking unit is used for blocking the cold file according to a preset blocking rule to obtain a file block corresponding to the cold file;
the encoding unit is used for encoding the file block according to an erasure code technology to obtain a check block corresponding to the file block;
the first storage unit is used for storing the file blocks and the check blocks into the cold data group of the storage server;
and the first acquisition unit is used for acquiring the cold file from the cold data set according to the download request if the download request for downloading the cold file is received.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the erasure code based file processing method described in the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the erasure code based file processing method according to the first aspect.
The embodiment of the invention provides a file processing method, a file processing device and computer equipment based on erasure codes, wherein the method comprises the steps of acquiring cold files in a hot data set of a storage server; partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file; coding the file block according to an erasure code technology to obtain a check block corresponding to the file block; storing the file block and the check block into a cold data group of the storage server; and if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request. The embodiment of the invention avoids the great reduction of the system performance caused by cutting one by performing cold and hot grouping on the files in the FastDFS system and adopting different copying strategies for each group by using an erasure code technology, and simultaneously improves the storage utilization rate and reduces the storage cost by reducing the copies.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for processing a document based on erasure codes according to an embodiment of the present invention;
fig. 2 is a schematic view of an application scenario of the erasure code based file processing method according to the embodiment of the present invention;
fig. 3 is a schematic sub-flow diagram of a document processing method based on erasure codes according to an embodiment of the present invention;
fig. 4 is another schematic flow chart of the erasure code based file processing method according to the embodiment of the present invention;
FIG. 5 is a schematic view of another sub-flow of the erasure code based document processing method according to the embodiment of the present invention;
fig. 6 is a schematic view of another sub-flow of the erasure code-based document processing method according to the embodiment of the present invention;
FIG. 7 is a schematic block diagram of an erasure code based document processing apparatus provided by an embodiment of the present invention;
FIG. 8 is a block diagram that schematically illustrates sub-units of an erasure code based document processing apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic block diagram of another sub-unit of an erasure code based document processing apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic block diagram of another sub-unit of an erasure code based document processing apparatus according to an embodiment of the present invention;
FIG. 11 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic flowchart of a document processing method based on erasure codes according to an embodiment of the present invention; fig. 2 is a schematic view of an application scenario of the erasure code based file processing method according to the embodiment of the present invention. The method for processing the files based on the erasure codes is applied to a FastDFS system, the FastDFS system is composed of a client 10, a tracking server 20 and a storage server 30, the method for processing the files in the erasure codes based FastDFS is applied to the tracking server 20, the tracking server 20 is responsible for receiving a request of the client 10 and fully utilizing a storage disk of the storage server by performing cold and hot grouping on the files stored in the storage server 30 and encoding cold data in the storage server 30 by using an erasure code technology, the client 10 is used for uploading the files to the storage server 30 and downloading the files from the storage server 30, and the storage server 30 is used for storing the data and dividing the data into a plurality of groups for distributed storage.
As shown in fig. 1, the method includes steps S110 to S160.
And S110, acquiring the cold file in the hot data group of the storage server.
And acquiring the cold file in the hot data group of the storage server. The storage server 30 is a part of a FastDFS system, which is a high-performance distributed file storage system, and performs data redundancy in units of groups, and the colleges and universities of data are guaranteed to be available by mutual backup of data in the groups. In this embodiment, the hot data group is a group of Storage servers (Storage servers) in the FastDFS system and is used for storing frequently accessed files, and the cold files are files that are not frequently accessed by the client. The tracking server 20 retrieves the cold file and deletes the cold file from the hot dataset by querying all files in the hot dataset to identify files in the hot dataset that are not frequently accessed and retrieving the cold file.
In one embodiment, as shown in FIG. 3, step S110 includes sub-steps S111 and S112.
And S111, judging whether the access frequency of the cold file is smaller than a preset first threshold value.
And judging whether the access frequency of the cold file is less than a preset first threshold value. Wherein the cold file is not initially stored in the hot data set but is defined by the tracking server 20 as a cold file as the number of accesses to the cold file decreases over time. The access frequency is the frequency of frequent accesses to the cold file by the user through the client 10 within a preset time, and whether the cold file is changed into a cold file which is not accessed any more for a long time by the user is known through the access frequency to the cold file. In the embodiment of the present invention, the preset time may be set to 3 days, the first threshold may be set to 10 times that the user accesses the file within 3 days, and the access frequency is determined by determining whether the number of times that the user accesses the cold file within 3 days exceeds ten times.
And S112, if the access frequency of the cold file is smaller than the first threshold value, acquiring the cold file from the hot data set and deleting the cold file stored in the hot data set.
And if the access frequency of the cold file is less than the first threshold value, acquiring the cold file from the hot data group and deleting the cold file stored in the hot data group. Specifically, when the number of times that a user accesses the cold file through the client 10 within a preset time is greater than or equal to the number of times that the FastDFS system sets within the preset time, it may be determined that the cold file is a hot file, at this time, it is not necessary to process the cold file and continue to store the cold file in the hot data group, and when the number of times that the user accesses the cold file within the preset time is less than the number of times that the FastDFS system sets within the preset time, it may be determined that the cold file has been converted from the hot file into the cold file, and the tracking server 20 acquires the cold file from the hot data group and deletes the cold file stored in the hot data group to increase the storage space of the hot data group.
In one embodiment, as shown in FIG. 4, step S110 is preceded by step S1101.
S1101, if the cold file uploaded by the user is received, storing the cold file in the hot data set based on a preset virtual set.
And if the cold file uploaded by the user is received, storing the cold file in the hot data group based on a preset virtual group. Specifically, when the cold file is received for the first time in the FastDFS system, since it cannot be determined whether the cold file is a hot file, the FastDFS system first stores the cold file in the hot data set for a period of time and monitors the number of times that a user accesses the cold file within a preset time, and then determines whether the cold file is a hot file. For example, when the FastDFS system receives the cold file for the first time, the cold file is stored in the hot data set, the number of times that the cold file is accessed by the client 10 within 3 days is monitored by the tracking server 20, and if the number of times that the user accesses within 3 days does not exceed the number of times set by the FastDFS system, the cold file may be determined as a cold file, otherwise, the cold file is determined as a hot file.
Since a virtual group is a logical group and not a real physical group, 1 VGroup may include multiple groups, but each group only belongs to 1 VGroup, as shown in table 1:
TABLE 1
Virtual group ID Virtual group name Member
0 VGroup0 group1,group2,group3
1 VGroup1 group4
The virtual group generates a new group in the hot data group of the storage server 30 according to the cold file and stores the cold file in the group. In general, in the process of capacity expansion, the storage server 30 of the FastDFS system may perform capacity expansion in two ways, i.e., horizontal capacity expansion and vertical capacity expansion, where the horizontal capacity expansion increases the system capacity by adding a new group, and the vertical capacity expansion increases the system capacity by adding a new storage in a group. The storage server 30 of the FastDFS system of the present invention has a cold data group, and because each cold data group has only one storage, the longitudinal capacity expansion mode cannot meet the capacity expansion requirement of the storage server 30 of the FastDFS system. The random allocation manner of the FastDFS system during capacity expansion using the horizontal capacity expansion manner is not favorable for capacity expansion of the storage server 30 of the FastDFS system in the later period. The virtual group includes a plurality of sub-virtual groups, i.e., the virtual group may manage a group in a hot data group in the storage server 30 of the FastDFS system or a group in a cold data group in the FastDFS system. The invention allocates a group to the cold file in the hot data group by setting a virtual group (VGroup) so as to better realize data migration.
In an embodiment, as shown in fig. 5, step S1101 includes sub-steps S1101a and S1101b.
S1101a, judging whether the cold file memory exceeds a preset second threshold value.
And judging whether the memory of the cold file exceeds a preset second threshold value. Specifically, the second threshold is used to determine a memory where the FastDFS system receives the cold file uploaded by the user, and the second threshold may be set according to the habit of the user or according to the FastDFS system. In this embodiment, if the memory of the cold file is greater than 10 megabytes, it may be determined that the memory of the cold file exceeds a preset second threshold; and if the memory of the cold file is less than 10 million, judging that the memory of the cold file does not exceed a preset threshold value.
S1101b, if the memory of the cold file exceeds the second threshold value, creating a new virtual group and storing the cold file into the hot data group according to the new virtual group.
If the cold file memory exceeds the second threshold, a new virtual group is created and the cold file is stored in the hot data group according to the new virtual group. Specifically, when the memory of the cold file exceeds a second threshold set by the storage server 30 of the FastDFS system, a group is generated in the hot data group according to the new virtual group and the cold file is stored in the group, that is, a virtual group is newly added to the table of the virtual group, a group is newly added to the newly added virtual group and is divided into a part of the hot data group, and then the cold file is added to the newly added group. For example, when storage6 corresponding to the cold file is added to the storage server 30 of the FastDFS system, because the memory of the cold file exceeds a preset second threshold, the capacity of VGroup1 may be expanded, so that a virtual group VGroup2 may be newly added to the table of the virtual group and group name group6 corresponding to storage6 is added to VGroup2, and the result is shown in table 2:
TABLE 2
Virtual group ID Virtual group name Member
0 VGroup0 group1,group2,group3
1 VGroup1 group4,group5
2 VGroup2 group6
And if the memory of the cold file does not exceed the second threshold value, generating a group in the hot data group according to the virtual group and storing the cold file in the group. Specifically, when the memory of the cold file does not exceed the second threshold set by the storage server 30 of the FastDFS system, a new group may be added to a child virtual group of the virtual group, where the group is divided into a part of the hot data group, and the cold file is added to the group, where the child virtual group is used to manage the groups in the hot data group. For example, when storage5 corresponding to the cold file is added to the storage server 30 of the FastDFS system, because the memory of the cold file does not exceed the preset second threshold, VGroup1 may be extended, and group name group5 corresponding to storage5 may be added to VGroup1, and the result is shown in table 3:
TABLE 3
Virtual group ID Virtual group name Member
0 VGroup0 group1,group2,group3
1 VGroup1 group4,group5
In one embodiment, as shown in FIG. 4, step S1101 is preceded by step S1102.
S1102, grouping the storage servers according to preset grouping rules to obtain hot data groups and cold data groups.
Grouping the storage servers 30 according to preset grouping rules to obtain hot data groups and cold data groups. Specifically, the grouping rule is rule information for grouping the storage space of the storage server 30 so that the storage space of the storage server 30 is divided into a hot data group and a cold data group. The system comprises a thermal data group, a data storage module, a data processing module and a data processing module, wherein the thermal data group is used for storing thermal data uploaded by a user, the thermal data group comprises a plurality of groups, each group comprises N identical thermal data and can tolerate the fault of N-1 identical thermal data; the cold data group is used for storing cold data uploaded by a user, a plurality of groups are also arranged in the cold data group, and each group has only one cold data. In the process of grouping the storage spaces of the storage server 30, the storage spaces of the storage server 30 may be grouped according to the memory sizes of the cold data and the hot data in the storage server 30, or the storage spaces of the storage server 30 may be divided into a hot data group and a cold data group according to the specific use of the FastDFS system. In the embodiment of the present invention, the storage space of the storage server 30 is divided into the hot data group and the cold data group according to the specific use of the FastDFS system, so as to improve the utilization rate of the storage space of the storage server 30 and reduce the storage cost.
And S120, partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file.
And partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file. And the preset blocking rule is rule information used for carrying out blocking processing on the processed file to obtain N file blocks corresponding to the file. In this embodiment, the MD5 algorithm is used to divide the cold file into N file blocks of the same length, MD5 is an algorithm that inputs information of indefinite length and outputs 128-bits of fixed length, and through the program flow, four 32-bit data are generated and finally combined to form a 128-bits hash.
S130, coding the file block according to an erasure code technology to obtain a verification block corresponding to the file block.
And coding the file block according to an erasure code technology to obtain a check block corresponding to the file block. Specifically, the erasure code technology mainly encodes original data through an erasure code algorithm to obtain redundancy, and stores the data and the redundancy together to achieve the purpose of fault tolerance. In this embodiment, the FastDFS system encodes N file blocks corresponding to the file once by using erasure coding technique, assuming that N data blocks are d respectively 1 ,d 2 ,…,d n Definition of F i For linear combination of all data blocks, then according to F i The check block c can be obtained by calculation i (i=1,2,…,m):
Figure GDA0003868028040000091
Figure GDA0003868028040000092
I.e. we use vectors D and C to represent the set of all data blocks and check blocks, respectively, F i Representing each row in a matrix F, the whole encoding process is represented as FD = C using the following equation, where F is called an encoding matrix, the encoding matrix defines how data is encoded into redundant data, each element of the encoding matrix is a multiplication coefficient corresponding to an original data block, the number of columns of the encoding matrix corresponds to the number (n) of blocks of the original data, and the number of rows corresponds to the number (m) of all data blocks after encoding. We use m × n van der mond matrix as the encoding matrix F, and the generation formula of the check block C is as follows:
Figure GDA0003868028040000093
that is, when an error occurs in n data blocks, the data blocks can be repaired, and the repairing method comprises the following steps: firstly, defining matrix A and vector E to meet
Figure GDA0003868028040000094
(I is an identity matrix), and
Figure GDA0003868028040000095
then there is AD = E, whose formula is as follows:
Figure GDA0003868028040000101
at this time, the row corresponding to the erroneous data block can be deleted from the matrix a and the vector E, so as to obtain a new equation: a ' = DE ', since a ' is invertible, all unknowns in D can be solved by gaussian elimination, i.e. all data blocks can be repaired.
S140, storing the file block and the check block into the cold data group of the storage server.
And storing the file block and the check block into a cold data group of the storage server. Specifically, after the cold file in the storage server 30 of the FastDFS system is deleted from the hot data group, a new group is created in the cold data group to store the file block and the check block, and meanwhile, in the process of storing the file block and the check block into the FastDFS system, the returned actual storage address, the MD5 value of each block, the block sequence number, the file ID included in the block, the offset of the file in the block, the length of the file in the block, and other metadata information are recorded, so that when a part of the file blocks in the file block fails, the files corresponding to the file block can be repaired only by the check block and the remaining files in the file block that do not fail.
S150, if a downloading request for downloading the cold file is received, the cold file is obtained from the cold data set according to the downloading request.
And if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request. Specifically, the download request includes an ID for downloading the cold file, and after the FastDFS system acquires the ID of the cold file, the FastDFS system acquires the cold file from the storage server 30 of the FastDFS system according to the virtual group.
In one embodiment, as shown in FIG. 6, step S150 includes sub-steps S151 and S152.
S151, acquiring the virtual group corresponding to the download request according to the download request to acquire the cold file.
And acquiring the virtual group corresponding to the downloading request according to the downloading request to acquire the cold file. Specifically, the and operation is performed according to the hashCode in the MD5 algorithm and the total number of the virtual groups to obtain the virtual group corresponding to the cold file, and then a round robin manner is adopted in the virtual group to obtain the group corresponding to the cold file, thereby obtaining the file corresponding to the request.
S152, if part of the file blocks corresponding to the cold file have faults, acquiring the cold file according to the check blocks corresponding to the cold file and the file blocks which do not have faults.
And if part of the file blocks corresponding to the cold file have faults, acquiring the cold file according to the check blocks corresponding to the cold file and the file blocks which do not have faults. Specifically, if a part of file blocks corresponding to the cold file fails and the file cannot be downloaded, the cold file is acquired by acquiring the file blocks corresponding to the cold file and not failed and the check blocks through the download request. For example, when the number of file blocks corresponding to the request is n and the number of check blocks corresponding to the n data blocks is m, if one file block in the file blocks corresponding to the cold file fails, the cold file may be obtained by n-1 data blocks that do not fail in the file corresponding to the cold file and the m check blocks corresponding to the n data blocks.
The technical method can be applied to application scenes including data acquisition, such as intelligent government affairs, intelligent city management, intelligent community, intelligent security, intelligent logistics, intelligent medical treatment, intelligent education, intelligent environmental protection and intelligent traffic, so that the construction of the intelligent city is promoted.
In the erasure code-based file processing method provided by the embodiment of the invention, a cold file in a hot data group of a storage server is acquired; partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file; coding the file block according to an erasure code technology to obtain a check block corresponding to the file block; storing the file block and the check block into a cold data group of the storage server; and if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request. By the method, the files stored in the FastDFS system are optimized to improve the storage utilization rate of the disk, reduce the storage cost and improve the operation and maintenance efficiency.
In addition, in the FastDFS system, data is managed in units of virtual groups, and data of each node can be balanced. When the FastDFS system expands the capacity, the data migration of the multi-thread program is carried out by taking the virtual group as a unit, so that the data migration speed is accelerated, and the over-high load of a single node is avoided, thereby improving the operation and maintenance efficiency and reducing the operation and maintenance difficulty.
The embodiment of the present invention further provides an erasure code-based document processing apparatus 100, which is configured to execute any embodiment of the erasure code-based document processing method. Specifically, referring to fig. 7, fig. 7 is a schematic block diagram of an erasure code-based document processing apparatus 100 according to an embodiment of the present invention.
As shown in fig. 7, the erasure code based document processing apparatus 100 includes an obtaining unit 110, a blocking unit 120, an encoding unit 130, a first storage unit 140, and a first obtaining unit 150.
An obtaining unit 110 is configured to obtain a cold file in the hot data set of the storage server.
In other invention embodiments, as shown in fig. 8, the acquiring unit 110 includes a first judging unit 111 and a cold file acquiring unit 112. A first judging unit 111, configured to judge whether an access frequency of the cold file is less than a preset first threshold; a cold file obtaining unit 112, configured to obtain the cold file from the hot data set and delete the cold file stored in the hot data set if the access frequency of the cold file is smaller than the first threshold.
In another embodiment of the present invention, the erasure code-based document processing apparatus further includes: a second storage unit 1101.
A second storage unit 1101, configured to store the cold file in the hot data set based on a preset virtual set if the cold file uploaded by a user is received.
In another embodiment of the invention, as shown in fig. 9, the second storage unit 1101 includes a second judgment unit 1101a and a third storage unit 1101b. A second determining unit 1101a, configured to determine whether the cold file memory exceeds a preset second threshold; a third storage unit 1101b, configured to create a new virtual group and store the cold file into the hot data group according to the new virtual group if the cold file memory exceeds the second threshold.
In another embodiment of the present invention, the erasure code-based document processing apparatus further includes: and a grouping unit 1102.
A grouping unit 1102, configured to group the storage servers according to a preset grouping rule to obtain a hot data group and a cold data group.
A blocking unit 120, configured to block the cold file according to a preset blocking rule to obtain a file block corresponding to the cold file.
An encoding unit 130, configured to encode the file block according to an erasure coding technique to obtain a parity block corresponding to the file block.
The first storage unit 140 is configured to store the file block and the check block in the cold data set of the storage server.
In another embodiment of the present invention, the erasure code-based document processing apparatus further includes: a generation unit 1401.
A first obtaining unit 150, configured to, if a download request for downloading the cold file is received, obtain the cold file from the cold data set according to the download request.
In another embodiment of the present invention, as shown in fig. 10, the first obtaining unit 150 includes: a second acquisition unit 151 and a third acquisition unit 152. A second obtaining unit 151, configured to obtain, according to the download request, a virtual group corresponding to the download request to obtain the cold file; a third obtaining unit 152, configured to, if a part of file blocks corresponding to the cold file fails, obtain the cold file according to the check blocks corresponding to the cold file and the file blocks that do not fail.
The erasure code-based file processing apparatus 100 provided in the embodiment of the present invention is configured to execute the above-mentioned method for acquiring a cold file in a hot data set of a storage server; partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file; coding the file block according to an erasure code technology to obtain a check block corresponding to the file block; storing the file block and the check block into a cold data group of the storage server; and if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request.
Referring to fig. 11, fig. 11 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Referring to fig. 11, the device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, causes the processor 502 to perform an erasure code based document processing method.
The processor 502 is used to provide computing and control capabilities that support the operation of the overall device 500.
The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be caused to execute the erasure code based file processing method.
The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 11 is a block diagram of only a portion of the configuration associated with aspects of the present invention and does not constitute a limitation of the apparatus 500 to which aspects of the present invention may be applied, and that a particular apparatus 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following functions: acquiring a cold file in a hot data group of a storage server; partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file; coding the file block according to an erasure code technology to obtain a check block corresponding to the file block; storing the file block and the check block into a cold data group of the storage server; and if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request.
Those skilled in the art will appreciate that the embodiment of the apparatus 500 illustrated in fig. 11 does not constitute a limitation on the specific construction of the apparatus 500, and in other embodiments, the apparatus 500 may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the apparatus 500 may only include the memory and the processor 502, and in such embodiments, the structure and function of the memory and the processor 502 are the same as those of the embodiment shown in fig. 11, and are not repeated herein.
It should be understood that in the present embodiment, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors 502, a Digital Signal Processor 502 (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general-purpose processor 502 may be a microprocessor 502 or the processor 502 may be any conventional processor 502 or the like.
In another embodiment of the present invention, a computer storage medium is provided. The storage medium may be a non-volatile computer-readable storage medium. The storage medium stores a computer program 5032, wherein the computer program 5032 when executed by the processor 502 performs the steps of: acquiring a cold file in a hot data group of a storage server; partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file; coding the file block according to an erasure code technology to obtain a check block corresponding to the file block; storing the file block and the check block into a cold data group of the storage server; and if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a device 500 (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A file processing method based on erasure codes is applied to a FastDFS system and is characterized by comprising the following steps:
acquiring a cold file in a hot data group of a storage server;
partitioning the cold file according to a preset partitioning rule to obtain a file block corresponding to the cold file;
coding the file block according to an erasure code technology to obtain a check block corresponding to the file block;
storing the file block and the check block into a cold data group of the storage server;
if a downloading request for downloading the cold file is received, acquiring the cold file from the cold data set according to the downloading request;
before the obtaining the cold file in the hot data group of the storage server, the method further includes:
if receiving the cold file uploaded by a user, storing the cold file in the hot data group based on a preset virtual group; judging whether the cold file memory exceeds a preset second threshold value or not; if the memory of the cold file exceeds the second threshold value, a new virtual group is created, and the cold file is stored in the hot data group according to the new virtual group;
the storage server expands the capacity through two modes of horizontal capacity expansion and vertical capacity expansion, wherein the horizontal capacity expansion is realized by increasing the system capacity through adding a virtual group, and the vertical capacity expansion is realized by adding a new storage space in the group;
the virtual group manages members in the hot data group and members in the cold data group; the virtual group includes a plurality of sub-virtual groups.
2. The erasure code based file processing method of claim 1, wherein the obtaining cold files in the hot data set of the storage server comprises:
judging whether the access frequency of the cold file is smaller than a preset first threshold value or not;
and if the access frequency of the cold file is less than the first threshold value, acquiring the cold file from the hot data group and deleting the cold file stored in the hot data group.
3. The erasure code based file processing method of claim 1, wherein before storing the cold file in the hot data set based on a predetermined virtual set if the cold file is uploaded by a user, the erasure code based file processing method comprises:
and grouping the storage servers according to a preset grouping rule to obtain a hot data group and a cold data group.
4. The erasure code based file processing method of claim 1, wherein the obtaining the cold file from the cold data set according to the download request comprises:
acquiring a virtual group corresponding to the downloading request according to the downloading request to acquire the cold file;
and if part of the file blocks corresponding to the cold file have faults, acquiring the cold file according to the check blocks corresponding to the cold file and the file blocks which do not have faults.
5. An erasure code-based document processing apparatus, comprising:
the acquisition unit is used for acquiring cold files in the hot data set of the storage server;
the blocking unit is used for blocking the cold file according to a preset blocking rule to obtain a file block corresponding to the cold file;
the encoding unit is used for encoding the file block according to an erasure code technology to obtain a check block corresponding to the file block;
the first storage unit is used for storing the file block and the check block into a cold data group of the storage server;
a first obtaining unit, configured to obtain the cold file from the cold data set according to a download request if the download request for downloading the cold file is received;
the second storage unit is used for storing the cold file in the hot data set based on a preset virtual set if the cold file uploaded by a user is received; the virtual group manages members in the hot data group and members in the cold data group; the virtual group comprises a plurality of sub-virtual groups;
judging whether the cold file memory exceeds a preset second threshold value or not; if the memory of the cold file exceeds the second threshold value, a new virtual group is created, and the cold file is stored in the hot data group according to the new virtual group;
the storage server expands the capacity through two modes of horizontal capacity expansion and vertical capacity expansion, wherein the horizontal capacity expansion is realized by increasing the system capacity through adding a virtual group, and the vertical capacity expansion is realized by adding a new storage space in the group;
the virtual group manages members in the hot data group and members in the cold data group; the virtual group includes a plurality of sub-virtual groups.
6. The erasure code-based document processing apparatus according to claim 5, wherein the obtaining unit includes:
the first judging unit is used for judging whether the access frequency of the cold file is smaller than a preset first threshold value or not;
and the cold file acquisition unit is used for acquiring the cold file from the hot data set and deleting the cold file stored in the hot data set if the access frequency of the cold file is less than the first threshold.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the erasure code based document processing method according to any one of claims 1 to 4 when executing the computer program.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the erasure code-based document processing method according to any one of claims 1 to 4.
CN202010911451.6A 2020-09-02 2020-09-02 Erasure code based file processing method and device and computer equipment Active CN111930555B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010911451.6A CN111930555B (en) 2020-09-02 2020-09-02 Erasure code based file processing method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010911451.6A CN111930555B (en) 2020-09-02 2020-09-02 Erasure code based file processing method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN111930555A CN111930555A (en) 2020-11-13
CN111930555B true CN111930555B (en) 2022-12-02

Family

ID=73309030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010911451.6A Active CN111930555B (en) 2020-09-02 2020-09-02 Erasure code based file processing method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN111930555B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536310A (en) * 2021-07-08 2021-10-22 浙江网商银行股份有限公司 Code file processing method, code file checking device and electronic equipment
CN113704200A (en) * 2021-11-01 2021-11-26 北京国科环宇科技股份有限公司 Data storage method, device, equipment and storage medium
CN116107797A (en) * 2021-11-09 2023-05-12 上海哔哩哔哩科技有限公司 Data storage method and device, electronic equipment and storage medium
CN114116321A (en) * 2022-01-25 2022-03-01 苏州浪潮智能科技有限公司 Redundant data management method and device, computer equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064902A (en) * 2012-12-18 2013-04-24 厦门市美亚柏科信息股份有限公司 Method and device for storing and reading data in hadoop distributed file system (HDFS)
CN106649406B (en) * 2015-11-04 2020-04-28 华为技术有限公司 Method and device for self-adaptively storing files
CN105701156B (en) * 2015-12-29 2019-06-14 青岛海信网络科技股份有限公司 A kind of distributed file system management method and device
CN109597567B (en) * 2017-09-30 2022-03-08 网宿科技股份有限公司 Data processing method and device
CN109857737B (en) * 2019-01-03 2024-04-16 平安科技(深圳)有限公司 Cold and hot data storage method and device and electronic equipment
CN111008181A (en) * 2019-10-31 2020-04-14 苏州浪潮智能科技有限公司 Method, system, terminal and storage medium for switching storage strategies of distributed file system

Also Published As

Publication number Publication date
CN111930555A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN111930555B (en) Erasure code based file processing method and device and computer equipment
US10678472B2 (en) Generating additional slices based on data access frequency
US10114588B2 (en) Consolidating encoded data slices in read memory devices in a distributed storage network
US9727275B2 (en) Coordinating storage of data in dispersed storage networks
US9927978B2 (en) Dispersed storage network (DSN) and system with improved security
US10013191B2 (en) Encoding data for storage in a dispersed storage network
US20170034184A1 (en) Proxying data access requests
US10545699B2 (en) Dynamic retention policies and optional deletes
US10296404B2 (en) Determining slices used in a reconstruction
US10652350B2 (en) Caching for unique combination reads in a dispersed storage network
US10042577B2 (en) Storing and retrieving mutable objects
US11513685B2 (en) Retrieving data in a storage network
US20160330181A1 (en) Securing data in a dispersed storage network
US20180107552A1 (en) Storage pool migration employing proxy slice requests
US10528282B2 (en) Modifying and utilizing a file structure in a dispersed storage network
US10958731B2 (en) Indicating multiple encoding schemes in a dispersed storage network
US20190250990A1 (en) Time based storage of encoded data slices
US10506045B2 (en) Memory access using deterministic function and secure seed
JP2018524705A (en) Method and system for processing data access requests during data transfer
US10594793B2 (en) Read-prepare requests to multiple memories
US20180052735A1 (en) Efficient, secure, storage of meaningful content as part of a dsn memory
US11115469B2 (en) Efficient updates within a dispersed storage network
US20190197032A1 (en) Preventing unnecessary modifications, work, and conflicts within a dispersed storage network
US10409522B2 (en) Reclaiming storage capacity in a dispersed storage network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant