CN111324590A - Data processing method, device, system and medium for distributed file system - Google Patents

Data processing method, device, system and medium for distributed file system Download PDF

Info

Publication number
CN111324590A
CN111324590A CN201811546173.8A CN201811546173A CN111324590A CN 111324590 A CN111324590 A CN 111324590A CN 201811546173 A CN201811546173 A CN 201811546173A CN 111324590 A CN111324590 A CN 111324590A
Authority
CN
China
Prior art keywords
archive
file
path
transferring
configuration data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811546173.8A
Other languages
Chinese (zh)
Inventor
张明阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811546173.8A priority Critical patent/CN111324590A/en
Publication of CN111324590A publication Critical patent/CN111324590A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a data processing method of a distributed file system, including: acquiring configuration data, wherein the configuration data comprises an archiving period corresponding to a file under a specific path; determining an archive file meeting an archive condition in the files under the specific path based on the configuration data, wherein the archive condition comprises that the non-access time of the files under the path is longer than an archive period corresponding to the files under the path; and transferring the archive file to a corresponding archive path.

Description

Data processing method, device, system and medium for distributed file system
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, apparatus, system, and medium for a distributed file system.
Background
With the rapid development of network technology and computer technology, various databases are increasingly used in many fields such as work and daily life. As the amount of data continues to grow at a high rate, the storage pressure of each database is increasing. However, there is usually some obsolete useless data occupying the storage space of the database. Therefore, how to effectively manage the data in the database becomes an important means for relieving the storage pressure of the database.
In the prior art, when the load pressure is too large, the database usually triggers an alarm to prompt the user that the storage space is insufficient. Users typically manually delete some data after receiving an alert to relieve the database of storage pressure.
In the process of implementing the concept of the present disclosure, the inventor finds that, in the prior art, at least the following problems exist, that the prior art cannot actively manage data in a database, and needs excessive participation of a user, which brings inconvenience to the user and affects user experience.
Disclosure of Invention
In view of the above, the present disclosure provides a data processing method, apparatus, system, and medium for a distributed file system.
One aspect of the present disclosure provides a data processing method of a distributed file system, including: the method comprises the steps of obtaining configuration data, wherein the configuration data comprise an archiving period corresponding to a file under a specific path, determining an archiving file meeting an archiving condition in the file under the specific path based on the configuration data, wherein the archiving condition comprises that the non-access time of the file under the path is longer than the archiving period corresponding to the file under the path, and transferring the archiving file to a corresponding archiving path.
According to an embodiment of the present disclosure, the method further includes: and screening the archived files based on a white list. The transferring the archive file to a corresponding archive path comprises: and transferring the screened archive files to the corresponding archive paths.
According to an embodiment of the present disclosure, the configuration data further comprises: and the files under the specific path correspond to the archiving path. The transferring the archive file to a corresponding archive path comprises: and transferring the archived file to an archive path corresponding to the file based on the configuration data.
According to an embodiment of the present disclosure, the configuration data further comprises: and the temporary path corresponds to the file under the specific path. The transferring the archive file to a corresponding archive path comprises: and transferring the archive file to a temporary path corresponding to the file based on the configuration data, and copying the archive file in the temporary path to an archive path corresponding to the file.
According to an embodiment of the present disclosure, the transferring the archive file to a corresponding archive path includes: and performing copy reduction processing on the archived file and storing the archived file in a corresponding archiving path.
Another aspect of the present disclosure provides a data processing apparatus of a distributed file system, including an obtaining module, a determining module, and a transferring module. The acquisition module acquires configuration data, wherein the configuration data comprises an archiving period corresponding to a file under a specific path. The determining module determines an archive file meeting an archive condition in the files under the specific path based on the configuration data, wherein the archive condition includes that the non-access time of the files under the path is longer than the archive period corresponding to the files under the path. A transfer module transfers the archive file under a corresponding archive path.
According to an embodiment of the present disclosure, the apparatus further comprises: and the screening module screens the archived files based on the white list. The transferring the archive file to a corresponding archive path comprises: and transferring the screened archive files to the corresponding archive paths.
According to an embodiment of the present disclosure, the configuration data further comprises: and the files under the specific path correspond to the archiving path. The transferring the archive file to a corresponding archive path comprises: and transferring the archived file to an archive path corresponding to the file based on the configuration data.
According to an embodiment of the present disclosure, the configuration data further comprises: and the temporary path corresponds to the file under the specific path. The transferring the archive file to a corresponding archive path comprises: and transferring the archive file to a temporary path corresponding to the file based on the configuration data, and copying the archive file in the temporary path to an archive path corresponding to the file.
According to an embodiment of the present disclosure, the transferring the archive file to a corresponding archive path includes: and performing copy reduction processing on the archived file and storing the archived file in a corresponding archiving path.
Another aspect of the present disclosure provides a data processing system of a distributed file system, including: one or more memories storing executable instructions and one or more processors executing the executable instructions to implement the methods described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.
According to the embodiment of the disclosure, the problems that the prior art cannot actively manage data in the database, needs excessive participation of users, brings inconvenience to the users, and affects user experience can be at least partially solved, and therefore, the technical effects of effectively managing the data in the database, reducing user labor and improving user experience can be achieved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario of a data processing method and apparatus of a distributed file system according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of data processing of a distributed file system according to an embodiment of the present disclosure;
FIGS. 3A and 3B schematically illustrate diagrams of configuration data according to an embodiment of the disclosure;
FIGS. 4A and 4B schematically illustrate block diagrams of data processing apparatus of a distributed file system according to embodiments of the present disclosure; and
FIG. 5 schematically shows a block diagram of a data processing system of a distributed file system according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a data processing method of a distributed file system, which includes: the method comprises the steps of obtaining configuration data, wherein the configuration data comprise an archiving period corresponding to files under a specific path, determining an archiving file meeting an archiving condition in the files under the specific path based on the configuration data, wherein the archiving condition comprises that the non-access time of the files under the path is longer than the archiving period corresponding to the files under the path, and transferring the archiving file to a corresponding archiving path.
Fig. 1 schematically illustrates an application scenario 100 of a data processing method and apparatus of a distributed file system according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a server 104 and a storage system 105. The server 104 may be connected to the terminal apparatuses 101, 102, 103 and the storage system 105 through a network. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The storage System 105 may be a server cluster for storing data, for example, a server cluster for storing a Distributed File System (HDFS).
A user may interact with the server 104 over a network using the terminal devices 101, 102, 103. The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting data transfer, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
It should be noted that the data processing method of the distributed file system provided by the embodiment of the present disclosure may be generally executed by the server 104. Accordingly, the data processing apparatus of the distributed file system provided by the embodiments of the present disclosure may be generally disposed in the server 104. The data processing method of the distributed file system provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 104 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 104. Accordingly, the data processing apparatus of the distributed file system provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 104 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 104.
For example, the storage system 105 may store data of each business system. The staff of each business system can input configuration data through the terminal device 101, 102 or 103, for example, the staff can configure which files under the path can be archived and the archiving period of each file, etc. The server 104 may periodically obtain the configuration data, determine files to be archived based on the configuration data, and transfer the files to corresponding archive paths, so that the data in the storage system 105 may be efficiently managed.
It is understood that the server 104 in the embodiment of the present disclosure may be integrated in the storage system 105, or integrated in the terminal device 101, 102, or 103, or may be independent of the storage system 105 and the terminal device 101, 102, or 103, which is not limited in this disclosure, and a person skilled in the art may set the server according to actual needs.
It should be understood that the number of terminal devices, servers, and storage systems in fig. 1 is merely illustrative. There may be any number of terminal devices, servers, and storage systems, as desired for implementation.
It should be noted that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
Fig. 2 schematically shows a flow chart of a data processing method of a distributed file system according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S201 to S203.
In operation S201, configuration data is obtained, where the configuration data includes an archive period corresponding to a file in a specific path.
In operation S202, an archive file satisfying an archive condition is determined from the files under the specific path based on the configuration data, where the archive condition includes that an unaccessed time of the file under the path is longer than an archive period corresponding to the file under the path.
In operation S203, the archive file is transferred under a corresponding archive path.
It is understood that platforms such as the kyoto may use the distributed file system HDFS to store their relevant data. Due to the increasing data volume, in the distributed file system HDFS, the number of files is huge, which in the past would lead to an increasing pressure on cluster storage, and this situation would lead to a great increase in the demand for cluster expansion, and a short supply and demand of machines, and also lead to a great access pressure on the HDFS. Therefore, files in the HDFS need to be processed and archived periodically to clean up obsolete or unwanted data in a timely manner to relieve storage pressure.
According to the disclosed embodiments, a worker may configure files that need to be processed and archived. For example, as shown in fig. 3A, paths 310 of files that need to be processed and archived and archive periods 320 corresponding to the files under each path may be configured. For example, the values stored in hdfs: the archive period for files under// ns8/user/jdw _ traffic path is 90 days. It will be appreciated that files in this path are considered to be archived if they have not been accessed for 90 days.
In the example of the present disclosure, configuration data configured by a worker may be obtained at regular time, so that files under each specified path are scanned according to the configuration data, and an archived file meeting an archiving condition is determined.
For example, the configuration data includes a specific path hdfs: if the file in the path corresponds to an archiving period of 90 days and the path includes 100 files, the 100 files may be scanned, and a file with an unaccessed time longer than 90 days in the 100 files is determined to be an archived file.
According to the embodiment of the disclosure, the access time of each file can be determined through the metadata in the HDFS mirror image table, so that whether the non-access time of the file is greater than the archive period corresponding to the file is judged. For example, path hdfs: if the last access time of the file A of// ns8/user/jdw _ traffic is 100 days before, and the archive period corresponding to the file A is 90 days, the file A is considered as an archive file meeting the archive condition. As another example, path hdfs: if the last access time of the file B of// ns8/user/jdw _ traffic is 30 days before and the archive period corresponding to the file B is 90 days, the file B is considered not to belong to the archive file meeting the archive condition.
In the embodiment of the present disclosure, after the archive files meeting the archive condition are determined, the archive files may be further filtered based on the white list, and then the filtered archive files are transferred to the corresponding archive path.
It can be understood that, in general, due to the requirement of actual service, part of data cannot be archived or the archiving period is different from other files in the path, and the part of data can be protected by setting a white list.
For example, path hdfs: the/ns 8/user/jdw _ traffic includes 100 files, and the archive period corresponding to the path in the configuration data is 90 days. If the file C in the path belongs to an important file and cannot be archived, the file C may be set in a white list and configured as a file which cannot be archived. Or if the file D in the path belongs to a relatively important file and the archive period is slightly longer than that of other files in the path, for example, 370 days, the file D may be set in a white list, and the corresponding archive period is configured to 370 days.
For example, the storage path corresponding to the service department 1 is hdfs: if// ns8/user/jdw _ traffic, the department can configure the archive period corresponding to the path to be 90 days according to the update condition of the data. If important files of the department exist in the path and cannot be filed or the filing period needs to be prolonged, a white list can be configured, the files are screened out, and subsequent filing operation is not performed.
The embodiment of the disclosure can configure an archiving period for a storage path (i.e., a storage path including a plurality of files), and then perform white list configuration for a special file under the path, thereby reducing configuration data. The archive period may also be configured separately for each file. The present disclosure is not limited thereto, and those skilled in the art can configure the present disclosure according to practical situations.
As shown in fig. 3B, in the embodiment of the present disclosure, the configuration data may further include a temporary path 330 corresponding to a file under a specific path and an archive path 340 corresponding to a file under a specific path.
For example, path hdfs: the temporary path of the file under// ns8/user/jdw _ traffic is hdfs: // ns8/user/jdw _ traffic/10k _ cool, the archive path of the file under this path is hdfs: // ns1012/user/mart _ fro/10k _ ns8_ distcp.
In an example of the present disclosure, transferring the archive file under the respective archive path may include transferring the archive file under the archive path to which the file corresponds based on the configuration data. Specifically, the archive file may be transferred to the temporary path corresponding to the file based on the configuration data, and then the archive file in the temporary path may be copied to the archive path corresponding to the file.
In order to ensure the security of the file archiving process, in the embodiment of the disclosure, an archive file to be archived is transferred to a temporary path, and then the archive file in the temporary path is copied to the archive path, and if the size of the file copied to the archive path is consistent with the size of the file in the temporary path, it can be considered that no data is lost in the copying process, and at this time, the corresponding archive file in the temporary path can be deleted.
In the embodiment of the present disclosure, when the archive file is transferred to the archive path, the partitions of the archive files corresponding to the Hive table may be deleted.
According to an example of the present disclosure, transferring an archive file under a respective archive path may include: and performing copy reduction processing on the archived file and storing the archived file to a corresponding archive path. For example, path hdfs: if the number of copies corresponding to file a in// ns8/user/jdw _ traffic is 5, the number of copies corresponding to file a after the file is transferred to the archive path may be reduced to 3 (for example only).
In embodiments of the present disclosure, the transfer to a file in the archive path may gradually decrease the number of copies based on the length of time that is not accessed. For example, if a file transferred to the archive path is not accessed or restored to the original path for 30 days, the copy may be dropped again until the copy is dropped to 0, and the file is directly deleted. Alternatively, the file may be deleted directly after no access for a predetermined period, for example, the file may be deleted directly after no access for 150 days.
According to the method and the device for filing the files, whether the files belong to the filed files or not is determined according to the non-access time of the files and the corresponding filing period, so that the files which are out of date and not needed are filed, data in a storage system are effectively managed, labor of a user for cleaning regularly is reduced, and the user experience is improved.
Fig. 4A and 4B schematically show block diagrams of a data processing apparatus 400 of a distributed file system according to an embodiment of the present disclosure.
As shown in fig. 4A, the data processing apparatus 400 includes an acquisition module 410, a determination module 420, and a transfer module 430.
The obtaining module 410 obtains configuration data, where the configuration data includes an archive period corresponding to a file in a specific path.
The determining module 420 determines, based on the configuration data, an archive file satisfying an archive condition in the files under the specific path, where the archive condition includes that the unaccessed time of the file under the path is longer than an archive period corresponding to the file under the path.
The transfer module 430 transfers the archive file under a corresponding archive path.
According to an embodiment of the disclosure, the configuration data further comprises: the archive path to which the file under the particular path corresponds. Transferring the archive file to a corresponding archive path, comprising: and transferring the archive file to the archive path corresponding to the file based on the configuration data.
According to an embodiment of the disclosure, the configuration data further comprises: and the temporary path corresponds to the file under the specific path. Transferring the archive file to a corresponding archive path, comprising: and transferring the archive file to the temporary path corresponding to the file based on the configuration data, and copying the archive file in the temporary path to the archive path corresponding to the file.
According to an embodiment of the present disclosure, transferring an archive file to a corresponding archive path includes: and performing copy reduction processing on the archived file and storing the archived file to a corresponding archive path.
As shown in fig. 4B, the data processing apparatus 400 may further include a filtering module 440.
The screening module 440 screens the archive files based on the white list.
Transferring the archive file to a corresponding archive path, comprising: and transferring the screened archive files to the corresponding archive paths.
According to the embodiment of the disclosure, the data processing apparatus 400 shown in fig. 4A and 4B may implement the method described above with reference to fig. 2, for example, and will not be described herein again.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any of the obtaining module 410, the determining module 420, the transferring module 430, and the screening module 440 may be combined in one module to be implemented, or any one of the modules may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 410, the determining module 420, the transferring module 430, and the screening module 440 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or may be implemented in any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them. Alternatively, at least one of the obtaining module 410, the determining module 420, the transferring module 430 and the screening module 440 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
FIG. 5 schematically illustrates a block diagram of a data processing system of a distributed file system suitable for implementing the above-described method according to an embodiment of the present disclosure. The system illustrated in fig. 5 is only an example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.
As shown in fig. 5, a data processing system 500 according to an embodiment of the present disclosure includes a processor 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 501 may also include onboard memory for caching purposes. Processor 501 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the disclosure.
In the RAM 503, various programs and data necessary for the operation of the system 500 are stored. The processor 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 502 and/or the RAM 503. Note that the programs may also be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, system 500 may also include an input/output (I/O) interface 505, input/output (I/O) interface 505 also being connected to bus 504. The system 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program, when executed by the processor 501, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (12)

1. A data processing method of a distributed file system comprises the following steps:
acquiring configuration data, wherein the configuration data comprises an archiving period corresponding to a file under a specific path;
determining an archive file meeting an archive condition in the files under the specific path based on the configuration data, wherein the archive condition comprises that the non-access time of the files under the path is longer than an archive period corresponding to the files under the path;
and transferring the archive file to a corresponding archive path.
2. The method of claim 1, wherein:
the method further comprises the following steps: screening the archived files based on a white list;
the transferring the archive file to a corresponding archive path comprises: and transferring the screened archive files to the corresponding archive paths.
3. The method of claim 1, wherein:
the configuration data further comprises: the archiving path corresponding to the file under the specific path;
the transferring the archive file to a corresponding archive path comprises: and transferring the archived file to an archive path corresponding to the file based on the configuration data.
4. The method of claim 3, wherein:
the configuration data further comprises: a temporary path corresponding to the file under the specific path;
the transferring the archive file to a corresponding archive path comprises:
transferring the archived file to a temporary path corresponding to the file based on the configuration data;
and copying the archive file under the temporary path to the archive path corresponding to the file.
5. The method of claim 1, wherein said transferring the archive file under a respective archive path comprises:
and performing copy reduction processing on the archived file and storing the archived file in a corresponding archiving path.
6. A data processing apparatus of a distributed file system, comprising:
the acquisition module acquires configuration data, wherein the configuration data comprises an archiving period corresponding to a file under a specific path;
the determining module is used for determining an archive file meeting an archive condition in the files under the specific path based on the configuration data, wherein the archive condition comprises that the non-access time of the files under the path is longer than the archive period corresponding to the files under the path;
and the transfer module transfers the archive file to a corresponding archive path.
7. The apparatus of claim 6, wherein:
the device further comprises: the screening module screens the archived files based on a white list;
the transferring the archive file to a corresponding archive path comprises: and transferring the screened archive files to the corresponding archive paths.
8. The apparatus of claim 6, wherein:
the configuration data further comprises: the archiving path corresponding to the file under the specific path;
the transferring the archive file to a corresponding archive path comprises: and transferring the archived file to an archive path corresponding to the file based on the configuration data.
9. The method of claim 8, wherein:
the configuration data further comprises: a temporary path corresponding to the file under the specific path;
the transferring the archive file to a corresponding archive path comprises:
transferring the archived file to a temporary path corresponding to the file based on the configuration data;
and copying the archive file under the temporary path to the archive path corresponding to the file.
10. The apparatus of claim 6, wherein the transferring the archive file down a respective archive path comprises:
and performing copy reduction processing on the archived file and storing the archived file in a corresponding archiving path.
11. A data processing system of a distributed file system, comprising:
one or more memories storing executable instructions; and
one or more processors executing the executable instructions to implement the method of any one of claims 1-5.
12. A computer readable medium having stored thereon executable instructions which, when executed by a processor, implement a method according to any one of claims 1 to 5.
CN201811546173.8A 2018-12-17 2018-12-17 Data processing method, device, system and medium for distributed file system Pending CN111324590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811546173.8A CN111324590A (en) 2018-12-17 2018-12-17 Data processing method, device, system and medium for distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811546173.8A CN111324590A (en) 2018-12-17 2018-12-17 Data processing method, device, system and medium for distributed file system

Publications (1)

Publication Number Publication Date
CN111324590A true CN111324590A (en) 2020-06-23

Family

ID=71172456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811546173.8A Pending CN111324590A (en) 2018-12-17 2018-12-17 Data processing method, device, system and medium for distributed file system

Country Status (1)

Country Link
CN (1) CN111324590A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102713878A (en) * 2009-11-06 2012-10-03 皮斯佩斯有限公司 Apparatus and method for managing a file in a distributed storage system
CN103593351A (en) * 2012-08-15 2014-02-19 中国银联股份有限公司 Electronic file filing method and system
CN106294009A (en) * 2016-08-05 2017-01-04 北京小米移动软件有限公司 Database filing method and system
CN106484073A (en) * 2016-09-28 2017-03-08 华为技术有限公司 The method of energy saving of system and energy conserving system
JP2017072965A (en) * 2015-10-07 2017-04-13 株式会社バッファロー Archive system, archive device, and computer program for archive

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102713878A (en) * 2009-11-06 2012-10-03 皮斯佩斯有限公司 Apparatus and method for managing a file in a distributed storage system
CN103593351A (en) * 2012-08-15 2014-02-19 中国银联股份有限公司 Electronic file filing method and system
JP2017072965A (en) * 2015-10-07 2017-04-13 株式会社バッファロー Archive system, archive device, and computer program for archive
CN106294009A (en) * 2016-08-05 2017-01-04 北京小米移动软件有限公司 Database filing method and system
CN106484073A (en) * 2016-09-28 2017-03-08 华为技术有限公司 The method of energy saving of system and energy conserving system

Similar Documents

Publication Publication Date Title
US11734125B2 (en) Tiered cloud storage for different availability and performance requirements
US9852308B2 (en) Apparatuses, systems, methods, and computer readable media for providing secure file-deletion functionality
US11580133B2 (en) Cross cluster replication
US9569195B2 (en) Systems and methods for live operating system upgrades of inline cloud servers
CN109408205B (en) Task scheduling method and device based on hadoop cluster
US8375200B2 (en) Embedded device and file change notification method of the embedded device
US8930906B2 (en) Selectively allowing changes to a system
US9836516B2 (en) Parallel scanners for log based replication
CN108268211B (en) Data processing method and device
CN107656748B (en) Application publishing method and device
AU2014209697A1 (en) Method and system for using a recursive event listener on a node in hierarchical data structure
US10635638B2 (en) Systems, methods and media for deferred synchronization of files in cloud storage client device
WO2022082892A1 (en) Big data analysis method and system, and computer device and storage medium thereof
CN108363727B (en) Data storage method and device based on ZFS file system
CN113535726A (en) Database capacity expansion method and device
US20180101539A1 (en) Reducing read operations and branches in file system policy checks
CN113127438B (en) Method, apparatus, server and medium for storing data
CN117544507A (en) Multi-region distributed configuration method and system based on cloud object storage service
CN111324590A (en) Data processing method, device, system and medium for distributed file system
US10635637B1 (en) Method to use previously-occupied inodes and associated data structures to improve file creation performance
CN113986833A (en) File merging method, system, computer system and storage medium
CN109905443B (en) Data processing method, system, electronic device and computer readable medium
US9880904B2 (en) Supporting multiple backup applications using a single change tracker
US11144298B2 (en) Feature installer for software programs
WO2019144552A1 (en) Data task processing method, application server and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination