CN110659250A

CN110659250A - File processing method and system

Info

Publication number: CN110659250A
Application number: CN201810604126.8A
Authority: CN
Inventors: 王海霞; 李先绪; 吴家隐; 黄植勤; 邱红飞; 郑文武; 陈泳; 朱海云; 黄春光
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2018-06-13
Filing date: 2018-06-13
Publication date: 2020-01-07
Anticipated expiration: 2038-06-13
Also published as: CN110659250B

Abstract

The disclosure provides a file processing method and a file processing system, and relates to the IT field. The method comprises the following steps: receiving a new file request sent by a client; selecting a corresponding data block group and a data block which can be used for storage in the data block group according to the size of the newly added file; judging whether non-empty data blocks exist in the data blocks available for storage; if yes, storing the newly added file in an idle storage unit of the non-empty data block; if not, combining the plurality of newly added files into a large file according to the size of the newly added file, and storing the large file in an empty data block. The data layout can be optimized, and the space utilization rate of the disk is improved.

Description

File processing method and system

Technical Field

The present disclosure relates to the IT field, and in particular, to a file processing method and system.

Background

Data is explosively increased in the internet nowadays, and various applications such as social networks, mobile communication, network video, electronic commerce and the like can generate billions or even billions or billions of small files. Due to the enormous challenges faced in metadata management, access performance, storage efficiency, etc., the problem of massive small files has become an industry-recognized problem.

Disclosure of Invention

The technical problem to be solved by the present disclosure is to provide a file processing method and system, which can optimize data layout and improve the utilization rate of disk space.

According to an aspect of the present disclosure, a file processing method is provided, including: receiving a new file request sent by a client; selecting a corresponding data block group and a data block which can be used for storage in the data block group according to the size of the newly added file; judging whether non-empty data blocks exist in the data blocks available for storage; if yes, storing the newly added file in an idle storage unit of the non-empty data block; if not, combining the plurality of newly added files into a large file according to the size of the newly added file, and storing the large file in an empty data block.

Optionally, the method further comprises: receiving a file deletion request sent by a client; searching metadata of a file to be deleted; determining a data block in a data block group where a file to be deleted is located according to the metadata; finding a storage unit where the file to be deleted is located in the data block according to the index file of the file to be deleted; and deleting the file to be deleted.

Optionally, the method further comprises: and modifying the state of the storage unit where the file to be deleted is located into an idle state.

Optionally, the method further comprises: receiving a file modification request sent by a client; judging whether the sizes of the modified file and the file to be modified are changed; if not, replacing the file to be modified with the modified file; and if the file is changed, storing the modified file as a new file, and deleting the file to be modified as a file to be deleted.

Optionally, replacing the file to be modified with the modified file includes: searching metadata of a file to be modified; determining a data block in a data block group where a file to be modified is located according to the metadata; finding the file to be modified in the data block according to the index file of the file to be modified; and replacing the file to be modified with the modified file.

According to another aspect of the present disclosure, there is also provided a file processing system, including: the request receiving unit is used for receiving a new file request sent by a client; the data block selecting unit is used for selecting the corresponding data block group and the data blocks which can be stored in the data block group according to the size of the newly added file; a non-empty data block judgment unit, configured to judge whether there is a non-empty data block in the data blocks available for storage; and the data storage unit is used for storing the newly added files in the idle storage unit of the non-empty data blocks if the data blocks which can be used for storage have the non-empty data blocks, otherwise, combining a plurality of newly added files into a large file according to the size of the newly added files, and storing the large file in the empty data blocks.

Optionally, the request receiving unit is further configured to receive a file deletion request sent by the client, and the system further includes: the metadata searching unit is used for searching metadata of the file to be deleted; the data block query unit is used for determining the data blocks in the data block group where the files to be deleted are located according to the metadata; the storage unit query unit is used for finding out the storage unit where the file to be deleted is located in the data block according to the index file of the file to be deleted; and the data deleting unit is used for deleting the file to be deleted.

Optionally, the system further comprises: and the state modifying unit is used for modifying the state of the storage unit where the file to be deleted is located into an idle state.

Optionally, the request receiving unit is further configured to receive a file modification request sent by the client, and the system further includes: the file size judging unit is used for judging whether the sizes of the modified file and the file to be modified are changed or not; and the data modification unit is used for replacing the file to be modified with the modified file if the size of the modified file is not changed, otherwise, taking the modified file as a newly added file, sending a storage request to the request receiving unit, and sending a request for deleting the file to be modified to the request receiving unit in response to the modified file stored in the data storage unit.

Optionally, the data modification unit is configured to search metadata of the file to be modified; determining a data block in a data block group where a file to be modified is located according to the metadata; finding the file to be modified in the data block according to the index file of the file to be modified; and replacing the file to be modified with the modified file.

According to another aspect of the present disclosure, there is also provided a file processing system, including: a memory; and a processor coupled to the memory, the processor configured to perform the file processing method as described above based on the instructions stored in the memory.

According to another aspect of the present disclosure, a computer-readable storage medium is also proposed, on which computer program instructions are stored, which instructions, when executed by a processor, implement the steps of the above-mentioned file processing method.

Compared with the prior art, the data block group and the data blocks which can be used for storage in the data block group are selected according to the size of the newly added file, the newly added file is preferentially stored in the idle storage unit of the non-empty data block, the newly added file is merged according to the size under the condition that the data block is empty, the merged large file is stored in the same data block, the data layout can be optimized, and the utilization rate of the disk space is improved.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a schematic structural diagram of an embodiment of a processing method of the present disclosure.

FIG. 2 is a schematic diagram of an embodiment of a newly added document of the present disclosure.

Fig. 3 is a schematic structural diagram of another embodiment of the processing method of the present disclosure.

FIG. 4 is a schematic diagram of one embodiment of deleting a file according to the present disclosure.

Fig. 5 is a schematic structural diagram of another embodiment of the processing method of the present disclosure.

Fig. 6 is a schematic structural diagram of a further embodiment of the processing method of the present disclosure.

FIG. 7 is a schematic diagram of one embodiment of a modification file of the present disclosure.

FIG. 8 is a schematic block diagram of one embodiment of a document processing system according to the present disclosure.

FIG. 9 is a schematic block diagram of another embodiment of a document processing system according to the present disclosure.

FIG. 10 is a schematic block diagram of yet another embodiment of a document processing system according to the present disclosure.

FIG. 11 is a schematic block diagram of yet another embodiment of a document processing system according to the present disclosure.

FIG. 12 is a schematic block diagram of yet another embodiment of a document processing system according to the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

In step 110, a new file request sent by the client is received. The newly added file may be a small file, i.e., a file with a size within 1 MB.

In step 120, the corresponding data block group and the data blocks available for storage in the data block group are selected according to the size of the new file. The method comprises the steps of firstly finding a data block group _ x capable of storing a file according to the size of the file, and then finding a writable data block _ id in the data block group _ x. As shown in fig. 2, the upper part of the dotted line is a schematic diagram before a new file is added, and it can be seen from the diagram that block _1, block _2, and block _ k in group _ x can write data.

In step 130, it is determined whether there is a non-empty data block in the data blocks available for storage, if so, step 140 is executed, otherwise, step 150 is executed.

In step 140, the newly added file is stored in a free storage unit of the non-empty data block. For example, as shown in the right side below the dotted line in fig. 2, if there are free storage units in block _1 and there are non-free storage units in block _1, the new file newfile is written into the free storage units in block _ 1.

In step 150, the plurality of newly added files are merged into a large file according to the size of the newly added file, and the large file is stored in an empty data block. Wherein a plurality of small files can be merged into one large file, the size of the large file should be less than or equal to the storage amount of empty data blocks in the data blocks available for storage. As shown in the lower left side of the dotted line of fig. 2, File _ n1 … File _ nn is merged and stored in the same database block _ k.

After the file is written into the disk, the embodiment may further include step 160, generating metadata and an index file of the newly added file, so that the newly added file is found according to the metadata and the index file when the file is subsequently modified or deleted. At step 170, a storage success response may be returned to the client.

In the embodiment, the corresponding data block group and the data blocks in the data block group which can be used for storage are selected according to the size of the newly added file, the newly added file is preferentially stored in the idle storage unit of the non-empty data block, and for the condition that the data block is empty, the newly added files are merged according to the size, and the merged large file is stored in the same data block, so that the data layout can be optimized, and the utilization rate of the disk space can be improved.

Fig. 3 is a schematic structural diagram of another embodiment of the processing method of the present disclosure. This embodiment mainly describes a file deletion flow.

In step 310, a file deletion request sent by a client is received.

At step 320, metadata for the file to be deleted is looked up. According to the metadata, the information of file position information, name, guid (disk partition table), owner, size, creation date, access authority and the like can be obtained.

In step 330, the data block in the data block group where the file to be deleted is located is determined according to the metadata. Namely determining the group _ x and block _ id of the file to be deleted. As shown in the upper part of the dotted line in fig. 4, File _ xx to be deleted is stored in block _1 in group _ x.

In step 340, the storage unit where the file to be deleted is located is found in the data block according to the index file of the file to be deleted. The index file may include information such as a file name, offset, and size. As shown in fig. 4, File _ xx to be deleted is in the last storage unit in block _1 in group _ x.

In step 350, the file to be deleted is deleted.

In one embodiment, step 360 may further be included, modifying the state of the storage unit where the file to be deleted is located into an idle state. As shown in the lower part of the dotted line in fig. 4, when File _ xx is deleted, the state of the memory cell in which File _ xx is located becomes null. At this point, the metadata and index table may be updated.

At step 370, a file delete success response is returned to the client.

In the embodiment, because the data layout is more reasonable during data storage, when a file is deleted, the idle units in the data block generated by deleting a single small file can be reduced, and in addition, after the file is deleted, the storage space can be immediately recovered, so that the problem of waste of the disk space is reduced.

Fig. 5 is a schematic structural diagram of another embodiment of the processing method of the present disclosure. This embodiment mainly describes the modified file flow.

At step 510, a file modification request sent by a client is received.

In step 520, it is determined whether the sizes of the modified file and the file to be modified are changed, if so, step 530 is executed, otherwise, step 540 is executed.

In step 530, the modified file is stored as a new file, and the file to be modified is deleted as a file to be deleted. The modified file is stored as a new file as shown in fig. 1, and the file to be modified is deleted as a file to be deleted as shown in fig. 3.

At step 540, the file to be modified is replaced with the modified file.

The specific process of the above embodiment may be as shown in fig. 6.

At step 610, a file modification request sent by a client is received.

In step 620, it is determined whether the sizes of the modified file and the file to be modified are changed, if so, step 630 is executed, otherwise, step 640 is executed.

In step 630, the corresponding data block group and the data blocks available for storage in the data block group are selected according to the modified file size.

In step 631, it is determined whether there is a non-empty data block in the data blocks available for storage, if so, step 632 is executed, otherwise, step 633 is executed.

At step 632, the modified file is stored in free storage units of non-empty data blocks.

In step 633, the plurality of files are merged into a large file according to the modified file size, and the large file is stored in an empty data block.

In step 634, the metadata of the file to be modified is looked up.

In step 635, the data block in the data block group where the file to be modified is located is determined according to the metadata.

In step 636, the storage unit where the file to be modified is located is found in the data block according to the index file of the file to be modified.

In step 637, the file to be modified is deleted.

In step 640, the metadata of the file to be modified is looked up.

In step 641, the data block in the data block group where the file to be modified is located is determined according to the metadata. As shown in the upper part of the dotted line in fig. 7, File _ xx to be modified is stored in block _1 in group _ x.

In step 642, the file to be modified is found in the data block according to the index file of the file to be modified. As shown in fig. 7, File _ xx to be modified is in the last storage unit in block _1 in group _ x.

In step 643, the file to be modified is replaced with the modified file. As shown in fig. 7 by the lower portion of the dotted line, File _ xx is replaced with newfile.

At step 650, a modification success response is returned to the client.

In the above embodiment, when the file is modified, if the size of the modified file is changed from that of the file to be modified, the modified file is stored as a new file, and the file to be modified is deleted as a file to be deleted. That is, when a file is modified, if the size of the modified file changes, the small file is moved to the corresponding file group and stored in the corresponding data block. The data layout can be optimized, the utilization rate of the disk space is improved, and the data access capability of the storage system is improved.

FIG. 8 is a schematic block diagram of one embodiment of a document processing system according to the present disclosure. The system includes a request receiving unit 810, a data block selecting unit 820, a non-empty data block judging unit 830, and a data storing unit 840.

The request receiving unit 810 is configured to receive a new file request sent by a client. The newly added file may be a small file, i.e., a file with a size within 1 MB.

The data block selecting unit 820 is configured to select a corresponding data block group and data blocks available for storage in the data block group according to the size of the new file. The method comprises the steps of firstly finding a data block group _ x capable of storing a file according to the size of the file, and then finding a writable data block _ id in the data block group _ x. As shown in fig. 2, the upper part of the dotted line is a schematic diagram before a new file is added, and it can be seen from the diagram that block _1, block _2, and block _ k in group _ x can write data.

The non-empty data block determination unit 830 is used to determine whether there is a non-empty data block in the data blocks available for storage.

The data storage unit 840 is configured to store the newly added file in a free storage unit of a non-empty data block if there is a non-empty data block in the data blocks available for storage, otherwise, merge a plurality of newly added files into a large file according to the size of the newly added file, and store the large file in an empty data block. Wherein a plurality of small files can be merged into one large file, the size of the large file should be less than or equal to the storage amount of empty data blocks in the data blocks available for storage.

FIG. 9 is a schematic block diagram of another embodiment of a document processing system according to the present disclosure. The system further includes a metadata lookup unit 910, a data block search unit 920, a storage unit search unit 930, and a data deletion unit 940.

The request receiving unit 810 is configured to receive a file deletion request sent by a client.

The metadata searching unit 910 is configured to search metadata of a file to be deleted. According to the metadata, the information of file position information, name, guid (disk partition table), owner, size, creation date, access authority and the like can be obtained.

The data block querying unit 920 is configured to determine, according to the metadata, a data block in the data block group where the file to be deleted is located. Namely determining the group _ x and block _ id of the file to be deleted.

The storage unit querying unit 930 is configured to find a storage unit where the file to be deleted is located in the data block according to the index file of the file to be deleted. The index file may include information such as a file name, offset, and size.

The data deleting unit 940 is used to delete the file to be deleted.

In this embodiment, the system may further include a state modification unit 950 configured to modify the state of the storage unit where the file to be deleted is located into an idle state.

FIG. 10 is a schematic block diagram of yet another embodiment of a document processing system according to the present disclosure. The system further includes a file size determination unit 1010 and a data modification unit 1020.

The request receiving unit 810 is configured to receive a file modification request sent by a client.

The file size determining unit 1010 is configured to determine whether the sizes of the modified file and the file to be modified are changed.

The data modification unit 1011 is configured to replace the file to be modified with the modified file if the size of the modified file is not changed, otherwise, use the modified file as a newly added file, send a storage request to the request receiving unit 810, and send a request for deleting the file to be modified to the request receiving unit 810 in response to the data storage unit 840 storing the file after the modification is completed.

In a specific embodiment, the data modification unit 1011 is configured to search metadata of a file to be modified; determining a data block in a data block group where a file to be modified is located according to the metadata; finding the file to be modified in the data block according to the index file of the file to be modified; and replacing the file to be modified with the modified file.

FIG. 11 is a schematic block diagram of yet another embodiment of a document processing system according to the present disclosure. The system includes a memory 1110 and a processor 1120, wherein:

memory 1110 may be a magnetic disk, flash memory, or any other non-volatile storage medium. The memory is used for storing instructions in the embodiments corresponding to fig. 1, 3, 5 and 6. Processor 1120, coupled to memory 1110, may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 1120 is configured to execute instructions stored in a memory.

In one embodiment, as also shown in fig. 12, the system 1200 includes a memory 1210 and a processor 1220. Processor 1220 is coupled to memory 1210 through a BUS 1230. The system 1200 may also be coupled to an external storage device 1250 via a storage interface 1240 for facilitating retrieval of external data, and may also be coupled to a network or another computer system (not shown) via a network interface 1260, which will not be described in detail herein.

In this embodiment, the data instructions are stored in the memory, and then processed by the processor, so that the data layout can be optimized, and the utilization rate of the disk space can be improved.

In another embodiment, a computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method in the embodiments corresponding to fig. 1, 3, 5, 6. As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Thus far, the present disclosure has been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of file processing, comprising:

receiving a new file request sent by a client;

selecting a corresponding data block group and a data block which can be used for storage in the data block group according to the size of the newly added file;

judging whether non-empty data blocks exist in the data blocks available for storage;

if yes, storing the newly added file in an idle storage unit of the non-empty data block;

if not, combining the plurality of newly added files into a large file according to the size of the newly added file, and storing the large file in an empty data block.

2. The file processing method according to claim 1, further comprising:

receiving a file deletion request sent by a client;

searching metadata of a file to be deleted;

determining a data block in the data block group of the file to be deleted according to the metadata;

finding out a storage unit where the file to be deleted is located in the data block according to the index file of the file to be deleted;

and deleting the file to be deleted.

3. The file processing method according to claim 2, further comprising:

and modifying the state of the storage unit where the file to be deleted is located into an idle state.

4. The file processing method according to claim 2 or 3, further comprising:

receiving a file modification request sent by a client;

judging whether the sizes of the modified file and the file to be modified are changed;

if not, replacing the file to be modified with the modified file;

and if the file is changed, storing the modified file as a new file, and deleting the file to be modified as a file to be deleted.

5. The file processing method according to claim 4, wherein replacing the file to be modified with the modified file comprises:

searching metadata of the file to be modified;

determining a data block in the data block group of the file to be modified according to the metadata;

finding the file to be modified in the data block according to the index file of the file to be modified;

and replacing the file to be modified with the modified file.

6. A document processing system comprising:

the request receiving unit is used for receiving a new file request sent by a client;

the data block selecting unit is used for selecting a corresponding data block group and data blocks which can be stored in the data block group according to the size of the newly added file;

a non-empty data block judgment unit, configured to judge whether there is a non-empty data block in the data blocks available for storage;

and the data storage unit is used for storing the newly added files in the idle storage unit of the non-empty data block if the data blocks which can be used for storage have the non-empty data blocks, otherwise, combining a plurality of newly added files into a large file according to the size of the newly added files, and storing the large file in the empty data block.

7. The file processing system according to claim 6, wherein the request receiving unit is further configured to receive a file deletion request sent by a client, and the file processing system further includes:

the metadata searching unit is used for searching metadata of the file to be deleted;

the data block query unit is used for determining the data blocks in the data block group where the files to be deleted are located according to the metadata;

the storage unit query unit is used for finding out the storage unit where the file to be deleted is located in the data block according to the index file of the file to be deleted;

and the data deleting unit is used for deleting the file to be deleted.

8. The document processing system of claim 7, further comprising:

and the state modifying unit is used for modifying the state of the storage unit where the file to be deleted is located into an idle state.

9. The file processing system according to claim 7 or 8, the request receiving unit further configured to receive a file modification request sent by a client, the file processing system further comprising:

the file size judging unit is used for judging whether the sizes of the modified file and the file to be modified are changed or not;

and the data modification unit is used for replacing the file to be modified with the modified file if the size of the modified file is not changed, otherwise, taking the modified file as a new file, sending a storage request to the request receiving unit, and sending a request for deleting the file to be modified to the request receiving unit in response to the fact that the modified file is stored in the data storage unit.

10. The document processing system according to claim 9,

the data modification unit is used for searching metadata of the file to be modified; determining a data block in the data block group of the file to be modified according to the metadata; finding the file to be modified in the data block according to the index file of the file to be modified; and replacing the file to be modified with the modified file.

11. A document processing system comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the file processing method of any of claims 1 to 5 based on instructions stored in the memory.

12. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the document processing method of any one of claims 1 to 5.