CN115840731A - File processing method, computing device and computer storage medium - Google Patents

File processing method, computing device and computer storage medium Download PDF

Info

Publication number
CN115840731A
CN115840731A CN202211584296.7A CN202211584296A CN115840731A CN 115840731 A CN115840731 A CN 115840731A CN 202211584296 A CN202211584296 A CN 202211584296A CN 115840731 A CN115840731 A CN 115840731A
Authority
CN
China
Prior art keywords
file
metadata
files
determining
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211584296.7A
Other languages
Chinese (zh)
Inventor
周翱
何振华
吴金虎
梁明旭
陈慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202211584296.7A priority Critical patent/CN115840731A/en
Publication of CN115840731A publication Critical patent/CN115840731A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application provides a file processing method, computing equipment and a computer storage medium. The file processing method comprises the following steps: determining at least two files to be processed; acquiring at least two metadata corresponding to the at least two files to be processed, wherein the metadata is used for indicating storage addresses in storage equipment; generating a target file; and establishing a reference relation between the target file and the at least two metadata, wherein the reference relation is used for determining the corresponding at least two metadata when the target file is accessed so as to read data from a storage device according to the at least two metadata. The technical scheme provided by the embodiment of the invention merges at least two files to be processed at the level of the file system without moving the data stored in the storage device, thereby improving the data security and efficiency.

Description

File processing method, computing device and computer storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a file processing method, a computing device and a computer storage medium.
Background
In the case of large data processing, it is generally necessary to read a file such as a pre-stored log file as a data source through a file system for analysis. In general, a File System, such as a Distributed File System (HDFS), performs block storage on a storage device, that is, a complete File is divided into a plurality of File blocks, and the data blocks corresponding to the File blocks are stored in the storage device. When a file is read, each block needs to be read separately to obtain a complete file, so that there is a large QPS (query-per-second) when the file is read.
To reduce QPS, multiple file blocks typically need to be merged to merge the multiple file blocks into a complete file. In a file merging method in the related art, data blocks corresponding to respective file blocks are usually read from a storage device, multiple data blocks are merged into complete data outside a file system, and then the complete data is stored in the storage device. The file merging method still has large I/O overhead, and data needs to be read out from the storage device and written into the storage device, so that data movement exists, and the data security and efficiency are low.
Disclosure of Invention
The embodiment of the invention provides a file processing method, a file processing device, computing equipment and a computer storage medium.
In a first aspect, an embodiment of the present invention provides a file processing method, including:
determining at least two files to be processed;
acquiring at least two metadata corresponding to the at least two files to be processed, wherein the metadata is used for indicating storage addresses in storage equipment;
generating a target file;
and establishing a reference relation between the target file and the at least two metadata, wherein the reference relation is used for determining the corresponding at least two metadata when the target file is accessed so as to read data from a storage device according to the at least two metadata.
In a second aspect, an embodiment of the present invention provides a file processing apparatus, including:
the file determining module is used for determining at least two files to be processed;
the metadata acquisition module is used for acquiring at least two metadata corresponding to the at least two files to be processed, and the metadata is used for indicating storage addresses in the storage equipment;
the file generation module is used for generating a target file;
and the relation establishing module is used for establishing a reference relation between the target file and the at least two metadata, and the reference relation is used for determining the corresponding at least two metadata when the target file is accessed so as to read data from a storage device according to the at least two metadata.
In a third aspect, an embodiment of the present invention provides a computing device, including a processing component and a storage component;
the storage component stores one or more computer instructions; the one or more computer instructions are used for being called and executed by the processing component to realize the file processing method provided by the embodiment of the invention.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium, which stores a computer program, where when the computer program is executed by a computer, the computer program implements the file processing method provided in the embodiment of the present invention.
The embodiment of the invention adopts the technical scheme that at least two files to be processed are determined, at least two metadata corresponding to the at least two files to be processed are obtained, the metadata are used for indicating the storage address in the storage device, the target file is generated, and the reference relation between the target file and the at least two metadata is established, so that the target file references the metadata of the at least two files to be processed, the at least two files to be processed are combined at the file system level, the data stored in the storage device does not need to be moved, and the data safety and the efficiency are improved.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart that schematically illustrates a method for processing documents, in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram schematically illustrating a file processing method according to an embodiment of the present invention;
FIG. 3 schematically illustrates a file organization form of an OSS-HDFS file system provided by an embodiment of the present invention;
FIG. 4 is a block diagram schematically illustrating a file processing apparatus according to an embodiment of the present invention;
FIG. 5 schematically illustrates a block diagram of a text computing device provided by one embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
In the case of large data processing, it is generally necessary to read a file such as a pre-stored log file as a data source through a file system for analysis. In general, a File System, such as a Distributed File System (HDFS), performs block storage on a storage device, that is, a complete File is divided into a plurality of File blocks, and the data blocks corresponding to the File blocks are stored in the storage device. When a file is read, each block needs to be read separately to obtain a complete file, so that there is a large QPS (query-per-second) when the file is read.
To reduce QPS, multiple file blocks typically need to be merged to merge the multiple file blocks into a complete file. In a file merging method in the related art, data blocks corresponding to respective file blocks are usually read from a storage device, multiple data blocks are merged into complete data outside a file system, and then the complete data is stored in the storage device. For example, taking merging two files as an example, a specific file merging process may be: the method comprises the steps of firstly reading first data corresponding to a first source file, creating a new file, then writing the first data into the new file, then reading second data corresponding to a second source file, writing the second data into the new file, and finally writing the new file in which the first data and the second data are written into the storage device.
In the process of realizing the concept of the invention, the invention discovers that the file merging mode still has larger I/O overhead, and data needs to be read from the storage device and then written into the storage device, so that data movement exists, and the data safety and efficiency are lower.
The inventor has found through research that each file block records the storage location of its corresponding data block in the storage device through metadata, and since real data is already stored in the storage device, for a file system, when a plurality of files need to be merged, the metadata of a new file can be reorganized, so that the metadata of the new file can indicate the storage location in the storage device where the metadata of the plurality of files are recorded.
Based on this, an embodiment of the present invention provides a file processing method, where at least two files to be processed are determined, at least two pieces of metadata corresponding to the at least two files to be processed are obtained, the metadata is used to indicate a storage address in a storage device, a target file is generated, and a reference relationship between the target file and the at least two pieces of metadata is established, so that the target file references the metadata of the at least two files to be processed, and thus the at least two files to be processed are merged at a file system level without moving data stored in the storage device, thereby improving data security and efficiency.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 schematically shows a flowchart of a file processing method according to an embodiment of the present invention, where the file processing method may include the following steps:
101, determining at least two files to be processed;
102, acquiring at least two metadata corresponding to at least two files to be processed, wherein the metadata is used for indicating a storage address in a storage device;
103, generating a target file;
and 104, establishing a reference relation between the target file and the at least two metadata, wherein the reference relation is used for determining the corresponding at least two metadata when the target file is accessed so as to read data from the storage device according to the at least two metadata. Fig. 2 schematically shows a schematic diagram of a file processing method provided by an embodiment of the present invention.
As shown in fig. 2, file1 and File2 may be the two determined files to be processed, 201 may be a storage device, 2011 may be a first storage space in the storage device, and 2012 may be a second storage space in the storage device.
File1 of the File to be processed has metadata 1, and File2 of the File to be processed has metadata 2, where metadata 1 points to a first storage space 2011 in the storage device 201, and metadata 2 points to a second storage space 2012 in the storage device 201. The metadata may indicate that data corresponding to the file to be processed is stored in a storage space in the storage device to which the metadata points. In the embodiment of the present invention, the metadata 1 may indicate that the first data corresponding to the File1 is stored in the first storage space 2011 of the storage device 201, and the metadata 2 may indicate that the data corresponding to the File2 is stored in the second storage space of the storage device 201.
According to the embodiment of the invention, the metadata corresponding to the file to be processed can be obtained by retrieving the reference table, and the reference table can record the reference relation between all files managed by the file system and the metadata.
According to the embodiment of the invention, the files in the file system have respective file identification information, so that after at least two files to be processed are determined, the reference table can be retrieved based on the file identifications of the at least two files to be processed to acquire the metadata corresponding to the at least two files to be processed respectively.
According to an embodiment of the present invention, the metadata may be implemented as address information of a storage space in the storage device, or a UUID (Universally Unique Identifier) of the file, for example.
According to an embodiment of the invention, the reference table may be, for example, as shown in table 1 below:
TABLE 1
Metadata File identification information
Metadata 1 Document 1
Metadata 2 Document 2
…… ……
Metadata n File n
As shown in table 1 above, the first column of table 1 may be a metadata column, the second column of table 1 may be a file identification information column, and the file identification information of file1 written in the data row of metadata 1 may indicate that file1 references metadata 1, that is, the data corresponding to file1 is stored in the storage space pointed by metadata 1 in the storage device.
According to the embodiment of the invention, after determining the metadata of the File to be processed File1, the File to be processed File2, the metadata of the File to be processed File1 and the metadata of the File to be processed File2, the target File 202 can be generated, and the target File can be an empty File, that is, the target File does not contain real data stored in the storage device.
According to the embodiment of the present invention, after the target file 202 is generated, the reference relationship between the target file 202 and the metadata 1 and the metadata 2 may be created, and after the reference relationship between the target file 202 and the metadata 1 and the metadata 2 is created, that is, the target file 202 includes data in the storage space pointed to by the metadata 1 and the metadata 2 in the storage device.
According to the embodiment of the present invention, the processing processes of the two files to be processed, that is, the File to be processed File1 and the File to be processed File2 in fig. 2 are only examples, and in the actual application process, the number of the files to be processed can be flexibly selected by a person skilled in the art according to the actual application requirements, and the embodiment of the present invention does not limit the number of the files to be processed.
In the embodiment of the invention, at least two files to be processed are determined, at least two pieces of metadata corresponding to the at least two files to be processed are obtained, the metadata are used for indicating the storage address in the storage device, the target file is generated, and the reference relation between the target file and the at least two pieces of metadata is established.
In the field of big data, a large number of files need to be processed by multiple data processing engines, so that when a file system for managing the files is determined, the file system with rich interfaces needs to be selected so as to be convenient for butting the file system with the multiple data processing engines, and seamless connection between file management and file processing is conveniently realized.
Each File System has its own features, for example, a Distributed File System (HDFS) has rich interfaces, and can facilitate interfacing with data processing engines, such as MapReduce, hive, spark, and flash. However, HDFS relies heavily on hardware resources when storing files, and it is difficult to achieve flexible expansion and contraction of storage resources.
The inventor finds in research that OSS (Object Storage Service) has the characteristics of flexible Storage space, convenience in capacity expansion, service deployment in a cloud, and the like, so that an OSS-HDFS file system can be used for file Storage, namely, an HDFS file system is used for externally interfacing data, writing and reading of files are completed through the HDFS file system, and after the files are written into the HDFS file system, real data corresponding to the files are written into the OSS file system based on OSS for data Storage. By adopting the OSS-HDFS file system, not only can rich data processing engines be docked, but also dependence on hardware resources can be avoided, and the storage space can be conveniently expanded.
In an OSS-HDFS file system, after a user writes a file into the HDFS file system, the HDFS file system divides the file into blocks according to a fixed size, and then writes data blocks corresponding to a plurality of file blocks into the OSS file system.
Fig. 3 schematically illustrates a file organization form of an OSS-HDFS file system according to an embodiment of the present invention.
In fig. 3, a file 301 may be a file written to the HDFS file system by a user, and after the file is written, the HDFS file system may divide the file 301 into a file 302 and a file 303 and write data of the file 302 and the file 303 into an underlying storage device 304 of the OSS file system.
According to the embodiment of the present invention, determining at least two files to be merged may specifically be implemented as:
determining a first file in a first file system, wherein the first file system is used for providing a file read-write interface and storing the first file acquired based on the file read-write interface to a second file system in blocks;
determining at least two second files corresponding to the first file from a second file system;
and determining at least two second files as files to be processed.
According to an embodiment of the present invention, the first file system may be an HDFS file system and the second file system may be an OSS file system.
In the embodiment of the present invention, the file fragments in the second file system may be checked periodically, and when it is checked that the file fragments exist in the second file system, the file fragments may be merged. The file shards may be a plurality of second files corresponding to the first files in the first file system.
According to the embodiment of the invention, when the file is divided, the HDFS file system may uniformly name the source file and the subfile generated after the division, for example, the file names of the source file and the subfile have the same prefix, and the like. Thus, the second file system can determine whether a file fragment exists by the file name.
According to the embodiment of the present invention, determining at least two files to be merged may specifically be implemented as:
acquiring a file processing request, wherein the file processing request comprises file identifications of at least two files;
and determining at least two files corresponding to the file identifications as files to be processed.
According to the embodiment of the invention, besides the regular check of the second file system to determine the file fragments and then carry out the file merging process, the user can also specify the files to be processed which need to be merged.
According to an embodiment of the present invention, the file processing method further includes:
receiving a reading request aiming at a target file;
responding to the reading request, and acquiring at least two metadata having a reference relation with the target file;
target data is retrieved from the storage device based on the at least two metadata.
According to the embodiment of the invention, after the HDFS file system receives the read request for the first file, the read request may be forwarded to the OSS file system, and the OSS file system may first determine whether the file fragments corresponding to the first file have been subjected to merging processing, and if the file fragments corresponding to the first file have been subjected to merging processing, may perform data reading through the metadata of the target file generated through the merging processing.
According to the embodiment of the present invention, generating the target file may specifically be implemented as:
newly adding a file name in the reference table so that the operating system can generate a target file corresponding to the file name, wherein the reference table records the reference relation between the metadata and the file;
establishing a reference relationship between the target file and at least two metadata comprises the following steps:
and respectively writing the file names into data rows corresponding to at least two metadata in the reference table to establish the reference relation between the at least two metadata and the target file.
According to an embodiment of the present invention, generating the target file and establishing a reference relationship of the target file and the at least two metadata may be as shown in table 2 below.
TABLE 2
Metadata File identification information
Metadata 1 Document 1, document 5
Metadata 2 File2, file 5, file 6
Metadata 3 Document 3 and document 6
Metadata 4 Document 4
As shown in table 2 above, the first column of table 2 may be a metadata column, the second column of table 2 may be a file identification information column, and the file identification information of file1 written in the data row of metadata 1 may indicate that file1 refers to metadata 1, that is, the data corresponding to file1 is stored in the storage space pointed by metadata 1 in the storage device.
The file 5 may represent file identification information of the first target file, the file identification information may be implemented as a table name, and after a table name is newly added to the reference table, the operating system may generate the first target file corresponding to the table name in response to a change of the reference table. The file 6 may represent file identification information of the second target file, the file identification information may be implemented as a table name, and after a table name is newly added to the reference table, the operating system may generate the second target file corresponding to the table name in response to a change of the reference table.
According to the embodiment of the invention, the newly generated file 5 refers to the metadata 1 and the metadata 2 at the same time by writing the newly added file 5 into the data rows where the metadata 1 and the metadata 2 are respectively located. The newly generated file 6 refers to both the metadata 2 and the metadata 3 by writing the newly added file 6 into the data line where the metadata 2 and the metadata 3 are respectively located.
According to the embodiment of the invention, the files in the file system can be freely and self-definitively merged by changing the reference relation between the files and the metadata.
According to an embodiment of the present invention, the file processing method further includes:
writing metadata information corresponding to target metadata in a metadata table, wherein the metadata information represents that the target metadata comprises at least two metadata;
the method further comprises the following steps:
and under the condition that a deletion request for any one of the at least two files to be processed is acquired, determining whether to delete the data corresponding to the file in the storage device based on the metadata table and the reference table.
According to an embodiment of the present invention, the metadata table may be as shown in table 3 below.
TABLE 3
Metadata Referencing metadata
Metadata 1 /
Metadata 2 /
Metadata 3 /
Metadata 4 /
Metadata 5 (metadata 1, metadata 2)
Metadata 6 (metadata 2, metadata 3)
The metadata information in table 3 can refer to table 2. The metadata table may be used to record reference relationships between metadata.
Referring to tables 2 and 3 above, metadata 5 is generated by writing file 5 to the data lines of metadata 1 and metadata 2, respectively, so that metadata 5 includes metadata 1 and metadata 2, that is, metadata 5 refers to metadata 1 and metadata 2; the metadata 6 is generated by writing the file 6 to the data lines of the metadata 2 and the metadata 3, respectively, and includes the metadata 2 and the metadata 3 with the metadata 6, that is, the metadata 6 refers to the metadata 2 and the metadata 3.
According to the embodiment of the invention, whether the real data in the storage device corresponding to the metadata is quoted by the file or not can be intuitively known through the reference table and the metadata table, so that whether the data corresponding to the file to be deleted is deleted or not can be determined through whether other files are quoted or not through the data corresponding to the at least two files to be processed or not under the condition that a deletion request aiming at any file in the at least two files to be processed is received.
According to the embodiment of the present invention, in a case where a deletion request for any one of at least two files to be processed is acquired, determining whether to delete data corresponding to the file in the storage device based on the metadata table and the reference table may specifically be implemented as follows:
determining first metadata corresponding to a file to be deleted;
retrieving a metadata table and a reference table, respectively, based on the first metadata to determine whether a file referencing the first metadata exists;
deleting data indicated by the first metadata in the storage device in the case that a file referencing the first metadata does not exist;
in the case where there is a file referencing the first metadata, the file to be deleted where it is located is deleted in the reference table.
According to an embodiment of the present invention, respectively retrieving the metadata table and the reference table based on the first metadata to determine whether there is a file referencing the first metadata may specifically be implemented as:
searching a metadata table, and determining whether the first metadata is used for composing other metadata;
determining that there is a file referring to the first metadata in a case where the first metadata is used to compose other metadata;
and under the condition that the first metadata is not used for forming other metadata, searching the reference table, determining whether a data line corresponding to the first metadata in the reference table records other files except the file to be deleted, and under the condition that the data line corresponding to the first metadata does not record other files except the file to be deleted, determining that the file which refers to the first metadata does not exist.
According to the embodiment of the present invention, for example, when file1 described in table 2 above is determined as a file to be deleted, metadata 1 may be the first metadata.
By retrieving the above table 3, it is determined that the metadata 1 is used to compose the metadata 4, that is, the metadata 4 refers to the metadata 1, that is, in the previous file processing process, the reference relationship between the metadata 1 and the metadata 2 and the file 4 is created, so that the file 4 contains the data stored in the storage space in the storage device indicated by the metadata 1 and the metadata 2 at the same time, in this case, if the data stored in the storage space in the storage device indicated by the metadata 1 is deleted, the integrity of the file 4 is destroyed, and the file 4 is unusable, so that, in the case of a file referring to the first metadata, only the information of the file to be deleted in the table 2, that is, the file1 recorded in the data row corresponding to the metadata 1 in the table 2, can be deleted.
According to the embodiment of the present invention, for example, when the file 4 described in table 2 above is determined as a file to be deleted, the metadata 4 may be the first metadata.
By retrieving the above table 3, it is determined that the metadata 4 is not used for composing other metadata 4, that is, there is no other metadata of the application metadata 4, further, the table 2 may be retrieved to determine whether a data row corresponding to the metadata 4 in the table 2 records a file other than the file 4, if the data row corresponding to the metadata 4 does not record a file other than the file 4, it indicates that the data of the storage space in the storage device pointed by the metadata 4 is not referred to by the file other than the file 4, in this case, besides deleting the file identification information of the file 4 in the table 2, the data of the storage space in the storage device pointed by the metadata 4 may also be deleted, so as to release the storage space of the storage device.
It should be noted that the file processing method provided by the embodiment of the present invention may be executed by a distributed file system, and the distributed file system may include, for example, OSS-HDFS, joinef fs, and the like.
Fig. 4 schematically shows a block diagram of a file processing apparatus according to an embodiment of the present invention, and the file processing apparatus 400 may include a file determining module 401, a metadata obtaining module 402, a file generating module 403, and a relationship establishing module 404.
A file determining module 401, configured to determine at least two files to be processed;
a metadata obtaining module 402, configured to obtain at least two pieces of metadata corresponding to at least two files to be processed, where the metadata is used to indicate a storage address in a storage device;
a file generation module 403, configured to generate a target file;
a relationship establishing module 404, configured to establish a reference relationship between the target file and the at least two pieces of metadata, where the reference relationship is used to determine the corresponding at least two pieces of metadata when the target file is accessed, so as to read data from the storage device according to the at least two pieces of metadata.
According to an embodiment of the present invention, the document processing apparatus 400 further includes:
the request receiving module is used for receiving a reading request aiming at a target file;
the request response module is used for responding to the reading request and acquiring at least two metadata which have reference relation with the target file;
and the data acquisition module is used for acquiring the target data from the storage device based on the at least two metadata.
According to an embodiment of the present invention, the file generation module 403 includes:
the adding unit is used for adding a file name in the reference table so that the operating system can generate a target file corresponding to the file name, and the reference table records the reference relation between the metadata and the file;
according to an embodiment of the invention, the relationship establishing module 404 includes:
and the relation establishing unit is used for respectively writing the file names into data rows corresponding to the at least two metadata in the reference table so as to establish the reference relation between the at least two metadata and the target file.
According to an embodiment of the present invention, the document processing apparatus 400 further includes:
the information writing module is used for writing metadata information corresponding to the target metadata in the metadata table, and the metadata information represents that the target metadata comprises at least two metadata;
according to an embodiment of the present invention, the document processing apparatus 400 further includes:
and the deleting module is used for determining whether to delete the data corresponding to the file in the storage device based on the metadata table and the reference table under the condition that a deleting request for any file in the at least two files to be processed is acquired.
According to an embodiment of the present invention, the deletion module includes:
the metadata determining submodule is used for determining first metadata corresponding to the file to be deleted;
a retrieval submodule for retrieving the metadata table and the reference table, respectively, based on the first metadata to determine whether a file referencing the first metadata exists;
a first deletion submodule for deleting data indicated by the first metadata in the storage device in a case where there is no file referencing the first metadata;
and the second deletion submodule is used for deleting the file to be deleted in the reference table under the condition that the file which refers to the first metadata exists.
According to an embodiment of the invention, the retrieval submodule comprises:
a retrieval unit for retrieving the metadata table, determining whether the first metadata is used to compose other metadata;
a first determination unit configured to determine that a file referencing the first metadata exists in a case where the first metadata is used to compose other metadata;
and a second determining unit, configured to, in a case where the first metadata is not used to compose other metadata, retrieve the reference table, determine whether a data line corresponding to the first metadata in the reference table records a file other than the file to be deleted, and in a case where the data line corresponding to the first metadata records no file other than the file to be deleted, determine that there is no file that references the first metadata.
According to an embodiment of the present invention, the file determining module 401 includes:
the third determining unit is used for determining a first file in the first file system, wherein the first file system is used for providing a file read-write interface and storing the first file acquired based on the file read-write interface to the second file system in blocks;
a fourth determining unit configured to determine at least two second files corresponding to the first file from the second file system;
and the fifth determining unit is used for determining at least two second files as the files to be processed.
According to an embodiment of the present invention, the file determining module 401 includes:
the device comprises a request acquisition unit, a file processing unit and a file processing unit, wherein the request acquisition unit is used for acquiring a file processing request which comprises file identifications of at least two files;
and the sixth determining unit is used for determining at least two files corresponding to the file identifications as the files to be processed.
The file processing apparatus shown in fig. 4 may execute the file processing method shown in the embodiment shown in fig. 1, and the implementation principle and the technical effect are not repeated. The specific manner in which each module and unit of the file processing apparatus in the above-mentioned embodiments perform operations has been described in detail in the embodiments related to the method, and will not be described in detail herein.
In one possible design, the file processing apparatus provided by the embodiment of the present invention may be implemented as a computing device, as shown in fig. 5, which may include a storage component 501 and a processing component 502;
the storage component 501 stores one or more computer instructions, wherein the one or more computer instructions are called and executed by the processing component 502 to implement the file processing method provided by the embodiment of the present invention.
Of course, a computing device may also necessarily include other components, such as input/output interfaces, communication components, and so forth. The input/output interface provides an interface between the processing component and a peripheral interface module, which may be an output device, an input device, etc. The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by a cloud computing platform, and the computing device may be a cloud server, and the processing component, the storage component, and the like may be a basic server resource rented or purchased from the cloud computing platform.
When the computing device is a physical device, the computing device may be implemented as a distributed cluster consisting of a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the computer program can realize the file processing method provided by the embodiment of the invention when being executed by a computer.
The embodiment of the invention also provides a computer program product, which comprises a computer program, and the computer program can realize the file processing method provided by the embodiment of the invention when being executed by a computer.
The processing components in the respective embodiments above may include one or more processors executing computer instructions to perform all or part of the steps of the above methods. Of course, the processing elements may also be implemented as one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components configured to perform the above-described methods.
The storage component is configured to store various types of data to support operations in the device. The storage component may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of file processing, comprising:
determining at least two files to be processed;
acquiring at least two metadata corresponding to the at least two files to be processed, wherein the metadata is used for indicating storage addresses in storage equipment;
generating a target file;
and establishing a reference relation between the target file and the at least two metadata, wherein the reference relation is used for determining the corresponding at least two metadata when the target file is accessed so as to read data from the storage device according to the at least two metadata.
2. The method of claim 1, further comprising:
receiving a read request for the target file;
responding to the reading request, and acquiring the at least two metadata which have reference relation with the target file;
target data is retrieved from the storage device based on the at least two metadata.
3. The method of claim 1, the generating a target file comprising:
newly adding a file name in a reference table so that an operating system can generate the target file corresponding to the file name, wherein the reference table records the reference relation between metadata and a file;
the establishing of the reference relationship between the target file and the at least two metadata comprises:
and respectively writing the file names into data rows corresponding to the at least two metadata in the reference table to establish the reference relationship between the at least two metadata and the target file.
4. The method of claim 3, further comprising:
writing metadata information corresponding to the target metadata in a metadata table, wherein the metadata information represents that the target metadata comprises the at least two metadata;
the method further comprises the following steps:
and under the condition that a deletion request for any file in the at least two files to be processed is acquired, determining whether to delete the data corresponding to the file in the storage device based on the metadata table and the reference table.
5. The method according to claim 4, wherein in a case where a deletion request for any one of the at least two files to be processed is obtained, the determining whether to delete the data corresponding to the file in the storage device based on the metadata table and the reference table comprises:
determining first metadata corresponding to a file to be deleted;
retrieving the metadata table and the reference table, respectively, based on the first metadata to determine whether a file referencing the first metadata exists;
deleting data indicated by the first metadata in a storage device in the absence of a file referencing the first metadata;
and in the case that the file which refers to the first metadata exists, deleting the file to be deleted in the reference table.
6. The method of claim 5, the retrieving the metadata table and the reference table, respectively, based on the first metadata to determine whether a file referencing the first metadata exists comprising:
retrieving the metadata table, determining whether the first metadata is used to compose other metadata;
determining that a file referencing the first metadata exists in a case where the first metadata is used to compose other metadata;
and under the condition that the first metadata is not used for forming other metadata, searching the reference table, determining whether a data line corresponding to the first metadata in the reference table records other files except the file to be deleted, and under the condition that the data line corresponding to the first metadata records no other files except the file to be deleted, determining that the file which refers to the first metadata does not exist.
7. The method of claim 1, the determining at least two files to merge comprising:
determining a first file in a first file system, wherein the first file system is used for providing a file read-write interface and storing the first file acquired based on the file read-write interface to a second file system in blocks;
determining at least two second files corresponding to the first file from a second file system;
and determining the at least two second files as the files to be processed.
8. The method of claim 1, the determining at least two files to merge comprising:
acquiring a file processing request, wherein the file processing request comprises file identifications of at least two files;
and determining at least two files corresponding to the file identifications as files to be processed.
9. A computing device comprising a processing component and a storage component;
the storage component stores one or more computer instructions; the one or more computer instructions to be invoked for execution by the processing component to implement the file processing method of any of claims 1 to 8.
10. A computer storage medium storing a computer program which, when executed by a computer, implements the file processing method according to any one of claims 1 to 8.
CN202211584296.7A 2022-12-09 2022-12-09 File processing method, computing device and computer storage medium Pending CN115840731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211584296.7A CN115840731A (en) 2022-12-09 2022-12-09 File processing method, computing device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211584296.7A CN115840731A (en) 2022-12-09 2022-12-09 File processing method, computing device and computer storage medium

Publications (1)

Publication Number Publication Date
CN115840731A true CN115840731A (en) 2023-03-24

Family

ID=85578407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211584296.7A Pending CN115840731A (en) 2022-12-09 2022-12-09 File processing method, computing device and computer storage medium

Country Status (1)

Country Link
CN (1) CN115840731A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116016553A (en) * 2023-03-27 2023-04-25 天津联想协同科技有限公司 File sharing method and device based on network disk, network disk and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116016553A (en) * 2023-03-27 2023-04-25 天津联想协同科技有限公司 File sharing method and device based on network disk, network disk and storage medium
CN116016553B (en) * 2023-03-27 2023-08-11 天津联想协同科技有限公司 File sharing method and device based on network disk, network disk and storage medium

Similar Documents

Publication Publication Date Title
US11474972B2 (en) Metadata query method and apparatus
US8762353B2 (en) Elimination of duplicate objects in storage clusters
US8904137B1 (en) Deduplication system space recycling through inode manipulation
CN102629247B (en) Method, device and system for data processing
CN110019004B (en) Data processing method, device and system
US11093387B1 (en) Garbage collection based on transmission object models
US20160364407A1 (en) Method and Device for Responding to Request, and Distributed File System
CN109947373B (en) Data processing method and device
US9430492B1 (en) Efficient scavenging of data and metadata file system blocks
CN110888837B (en) Object storage small file merging method and device
CN111400334B (en) Data processing method, data processing device, storage medium and electronic device
CN107679182B (en) Directory configuration method and device
CN109976669B (en) Edge storage method, device and storage medium
US11650967B2 (en) Managing a deduplicated data index
CN115840731A (en) File processing method, computing device and computer storage medium
CN110147203B (en) File management method and device, electronic equipment and storage medium
CN111240890B (en) Data processing method, snapshot processing device and computing equipment
CN103810114A (en) Method and device for distributing storage space
EP3264254A1 (en) System and method for a simulation of a block storage system on an object storage system
CN113853778B (en) Cloning method and device of file system
CN112988696B (en) File sorting method and device and related equipment
US11645333B1 (en) Garbage collection integrated with physical file verification
CN112965939A (en) File merging method, device and equipment
CN114968069A (en) Data storage method and device, electronic equipment and storage medium
CN110019031B (en) File creation method and file management device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination