CN112948327A - File processing method, system, electronic device and storage medium - Google Patents

File processing method, system, electronic device and storage medium Download PDF

Info

Publication number
CN112948327A
CN112948327A CN202110356539.0A CN202110356539A CN112948327A CN 112948327 A CN112948327 A CN 112948327A CN 202110356539 A CN202110356539 A CN 202110356539A CN 112948327 A CN112948327 A CN 112948327A
Authority
CN
China
Prior art keywords
file
target
node
folder
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110356539.0A
Other languages
Chinese (zh)
Inventor
辛世友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202110356539.0A priority Critical patent/CN112948327A/en
Publication of CN112948327A publication Critical patent/CN112948327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The invention provides a file processing method, a file processing system, electronic equipment and a storage medium, wherein the method is applied to a distributed file management system, the distributed file management system comprises a management node, a simulation node and N data nodes, the simulation node is generated based on the management node, and the method comprises the following steps: the simulation node acquires file information of a file object stored by each data node in the N data nodes; determining M target file objects according to the file information; moving the M target file objects to a target folder; and modifying the paths of the M target file objects according to the paths of the target folders. In the embodiment of the invention, the file information of the file object stored by each data node is acquired through the simulation node, and then the target file object is determined according to the file information. And moving the target file object to a target folder to avoid each small file from occupying a corresponding index, so that the memory space of the system is released, and the performance of the system for processing the files is improved.

Description

File processing method, system, electronic device and storage medium
Technical Field
The present invention relates to the field of network technologies, and in particular, to a file processing method, a file processing system, an electronic device, and a storage medium.
Background
At present, with the development of internet technology, data to be stored and processed is increasing, and a Distributed File System (DFS) is produced at the same time. The distributed file system can store and manage files with large data volume, the distributed file system stores the files in a plurality of nodes, communication and data transmission among the nodes are carried out through a network, and users can conveniently access the files.
However, taking the Distributed File System as a Hadoop Distributed File System (HDFS) as an example, the HDFS has a large number of small files with a small data volume, and each small File occupies one index in a corresponding data node to store data information of the small File, which occupies a large amount of memory of the System and affects the performance of the System.
Disclosure of Invention
The embodiment of the invention aims to provide a file processing method, a file processing system, electronic equipment and a storage medium, and solves the technical problem that a small file in a distributed file system occupies a large amount of memory, so that the system performance is influenced. The specific technical scheme is as follows:
in a first aspect of the embodiments of the present invention, a file processing method is first provided, where the file processing method is applied to a distributed file management system, where the distributed file management system includes a management node, a simulation node, and N data nodes, where the simulation node is generated based on the management node, the data nodes store data files, and N is a positive integer, and the method includes:
the simulation node acquires file information of a file object stored by each data node in the N data nodes, and one file object occupies one index in the corresponding data node;
determining M target file objects according to the file information, wherein the M target file objects are files or folders which occupy memory values smaller than a preset value, and M is a positive integer larger than 1;
moving the M target file objects to a target folder, wherein the target folder is stored in any data node and occupies one index in the corresponding data node;
and modifying the paths of the M target file objects according to the path of the target folder.
In a second aspect of the embodiments of the present invention, a file processing system is further provided, where the file processing system includes a management node, a simulation node, and N data nodes, the simulation node is generated based on the management node, the data nodes store data files, and N is a positive integer;
the simulation node is used for acquiring file information of a file object stored by each data node in the N data nodes, and one file object occupies one index in the corresponding data node;
determining M target file objects according to the file information, wherein the M target file objects are files or folders which occupy memory values smaller than a preset value, and M is a positive integer larger than 1;
moving the M target file objects to a target folder, wherein the target folder is stored in any data node and occupies one index in the corresponding data node;
and modifying the paths of the M target file objects according to the path of the target folder.
In a third aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to execute the file processing method according to any one of the above-mentioned embodiments.
In a fourth aspect implemented by the present invention, there is also provided a computer program product containing instructions, which when run on a computer, cause the computer to execute the file processing method according to any one of the above embodiments.
In the embodiment of the invention, the file information of the file object stored by each data node is acquired through the simulation node, and then the target file object is determined according to the file information, wherein the target file object is a file or a folder, namely a small file, of which the occupied memory value is smaller than the preset value. The small files are moved to the target folder, each small file is prevented from occupying one index, a large amount of memory space of the system is released, optimization of all file objects in the system is achieved, and performance of the system is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart of a file processing method in an embodiment of the present invention;
FIG. 2 is a diagram illustrating an application scenario of the file processing method according to the embodiment of the present invention;
FIG. 3 is a diagram illustrating a structure of a document processing system according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
The file processing method provided in this embodiment is applied to a distributed file management system, and optionally, the following description is given by taking the method as an example for application to an HDFS, and it should be understood that the file processing method provided in this embodiment may also be applied to other file systems, and is not limited specifically here.
The HDFS is composed of a management node and N data nodes, wherein each data node stores a file object, and each file object occupies 1 index in the corresponding data node. The file object is also called metadata, and the file object includes but is not limited to a folder and a file, the folder stores the file, or the folder is a blank folder which does not store any file, wherein the file occupying the memory value larger than or equal to the preset value can be understood as a large file; and the file occupying the memory value less than the preset value is understood as a small file.
The management nodes comprise an activation node and a backup node, wherein the activation node is used for reading files stored by each data node or storing the files to any data node; the backup node can not interact with the data node, the backup node is used for copying the operation data of the activation node, and when the activation node fails, the normal work of the system can be ensured by activating the backup node.
At present, in a scenario where a large amount of file data is processed by using an HDFS, for example, data is processed by using the HDFS in the field of machine learning, or data is processed by using the HDFS in a data lake, each data node of the HDFS has a large number of small files, and each small file occupies a corresponding index to store data information, which occupies a large amount of memory of a system and affects the performance of the system.
Based on the technical problems, the invention provides the following technical concepts:
and combining the small files stored by the data nodes into a folder, and deleting the small files stored by the data nodes. Because one folder only occupies one index to store data information in the folder, a large amount of internal memory of the system can be released in a mode of combining a large amount of small files into one folder, and the performance of the system for processing the files is improved.
Based on the technical concept, the embodiment of the invention provides a file processing method. Referring to fig. 1, fig. 1 is a flowchart illustrating a file processing method according to an embodiment of the invention. The file processing method provided by the embodiment of the invention comprises the following steps:
s101, the simulation node acquires file information of a file object stored by each data node in the N data nodes.
In this embodiment, in order to merge the small files stored in each data node, a new simulation node may be added based on a part of functions of the management node. It should be understood that the management node includes an active node and a backup node, and the simulation node can be generated based on the active node, and has the function of reading the storage file of each data node of the active node,
the file object may be a file or a folder, and the file information may be a memory usage value of the file object, where one file object occupies one index in the corresponding data node.
It should be noted that the backup node acquires the mirror image data and the log data from the active node, and copies the operating data of the active node.
The image file reflects information of all directories and files of the whole distributed file management system, and the log data reflects operation records of all files in the system.
An optional implementation manner is to compile the activation node, and add a function of processing the small file provided by the embodiment of the present invention on the basis of the original function of the activation node.
S102, determining M target file objects according to the file information.
As described above, the file objects stored by each data node include, but are not limited to, folders and files, where the files include large files whose occupied memory value is greater than or equal to a preset value and small files whose occupied memory value is less than the preset value.
In some embodiments, the file object is a folder, such that the folder in which the small file is stored may be determined to be the target file object.
In other embodiments, the file object is a file, and thus, the doclet may be determined to be the target file object.
It should be understood that since merging of multiple target file objects is to be achieved, M is a positive integer greater than 1. Please refer to the following embodiments.
S103, moving the M target file objects to a target folder.
In this embodiment, after M target file objects are determined, the M target file objects are moved to a target folder, where the target folder may be stored in any data node, and the target folder occupies one index in the corresponding data node.
It should be noted that the target folder may be an HTTP Archive specification (HAR) folder, and the HAR is a general file format for storing HTTP request/response information, and the HAR format is compatible with most file management systems. And combining the M target file objects into one HAR file under the condition that the target folder is the HAR folder.
It should be further noted that the target folder may also be a folder in other file formats, and this embodiment is not limited in this embodiment.
S104, modifying the paths of the M target file objects according to the paths of the target file folders.
After the M target file objects are moved to the target folder, in order to ensure that each target file object can still be normally accessed by the user, the path of each target file object needs to be modified. Please refer to the following embodiments.
In the embodiment of the invention, the file information of the file object stored by each data node is acquired through the simulation node, and then the target file object is determined according to the file information, wherein the target file object is a file or a folder, namely a small file, of which the occupied memory value is smaller than the preset value. Moving the small files to a target folder, and avoiding each small file from occupying an index, so that a large amount of memory space of the system is released, and the performance of the system is improved; and after the target file object is moved to the target folder, modifying the path of the target file object, and further ensuring that the user can normally access each small file in the target folder.
As described above, the file object stored by the data node may be a folder or a file, and it should be noted that, in the case that the file object is a folder, there is an optional implementation manner to determine the target file object; in the case where the file object is a file, there is another alternative embodiment to determine the target file object.
These 2 possible cases are specifically described below, respectively:
one possible scenario is that the file object is a file:
optionally, the obtaining, by the simulation node, file information of the file object stored by each of the N data nodes includes:
and under the condition that the target file object is a file, the simulation node acquires a memory occupation value of the file stored by each data node in the N data nodes.
The determining M target file objects according to the file information includes:
and determining the file with the memory occupation value smaller than a preset value as a target file object.
In this embodiment, in the case that the file object is a file, the file information may be a memory occupied value of the file, and it should be understood that the memory occupied value of the file is in direct proportion to the memory occupied by the file, that is, the higher the memory occupied value of the file is, the more the system memory is occupied by the file.
In this embodiment, the simulation node obtains the memory occupancy value of the file stored in each data node, and determines the file with the memory occupancy value smaller than the preset value as the target file object, in other words, determines the file with the memory occupancy value smaller than the preset value as the small file. Alternatively, the preset value may be 256MB, for example, if the memory usage value of a file is 250MB, the file is determined as the target file object.
One possible scenario is that the file object is a folder:
optionally, the obtaining, by the simulation node, file information of the file object stored by each data node of the N data nodes includes:
and under the condition that the target file object is a folder, the simulation node acquires the memory occupation value and the file number of the folder stored by each data node of the N data nodes.
The determining M target file objects according to the file information includes:
and determining the folder with the ratio of the number of the files to the memory occupation value smaller than a preset ratio as a target file object.
In this embodiment, in the case that the file object is a folder, the file information may be a memory occupied value of the folder and a number of files stored in the folder.
In this embodiment, a ratio of the number of files to the memory usage value is calculated, and when the ratio is smaller than a preset ratio, it is indicated that all the files stored in the folder are small files, and the folder may be determined as a target file object, where the preset ratio is an empirical value that can be set by a user, and this embodiment is not limited specifically here.
For example, a folder includes 40 files, and the memory usage value of the folder is 400MB, and the predetermined ratio is 0.2. Then, the ratio of the number of files to the memory usage value is 0.1, and since 0.1 is smaller than 0.2, the folder can be determined as the target file object.
Optionally, the moving the M target file objects to the target folder includes:
copying the M target file objects into a temporary folder according to a preset sequence; after the M target file objects are all copied to the temporary folder, adjusting the path of the temporary folder, and converting the temporary folder into the target folder; and deleting the M target file objects stored by the N data nodes.
In this embodiment, the simulation node may enable to run a related thread, and generate a temporary folder, for example, a MapReduce thread, where the MapReduce thread is a common thread oriented to parallel processing of big data. The simulation node may also enable other threads to copy the M target file objects to the temporary folder, which is not specifically limited herein.
It should be understood that, in order to avoid that the target file object is processed by the system during the merging process, and thus the merging of the files or folders cannot be realized, the temporary folder may be an informal folder that is not recognized by the system, and optionally, the path of the temporary folder may be modified so that the temporary folder is not recognized by the system.
In this embodiment, after all the target file objects are copied to the temporary folder, the path of the temporary folder is adjusted, the temporary folder is renamed, and the temporary folder is converted into the target folder.
Illustratively, the temporary folder is an HAR file and the path of the temporary folder is/path/. target. HAR that is not recognized by the distributed file management system. And in the case that all target file objects are copied to the temporary folder, modifying the path of the temporary folder into a path/target.har which can be recognized by the system, thereby converting the temporary folder into the target folder.
After the target folder is obtained, the target file object is deleted in each data node, so that the number of small files is reduced, a large amount of memory space is released, and the performance of the system is improved.
Optionally, the modifying the paths of the M target file objects according to the path of the target folder includes:
and adjusting the paths of the M target file objects to be the paths of the target folders.
In this embodiment, after merging the target file object into the target folder, the path setting of the target file object is the same as that of the target folder.
To facilitate understanding of the technical solution of the embodiments of the present invention, the following are illustrated:
referring to fig. 2, as shown in fig. 2, the distributed file management system includes an active node, a backup node, and a journal node.
Wherein the activation node also establishes a communication connection with a data node not shown in fig. 2.
The log nodes are used for storing operation records, and the operation records are used for characterizing operations performed by users on files in the system, and it should be understood that the system may include a plurality of log nodes, and the 1 log node shown in fig. 2 is only an example and is not used to limit the number of log nodes.
The backup node acquires mirror image data stored by the activation node, acquires log data stored by the data node, and can be called as a new activation node when the activation node fails, so that normal operation of the system is ensured.
The system in fig. 2 further includes a simulation node, where the simulation node has a partial function of an active node, and the simulation node acquires the mirror image data stored in the active node at regular time and acquires the log data stored in the data node in real time.
The simulation node obtains a plurality of target file objects stored in the mirror image data by acquiring the mirror image data stored by the activation node, and the simulation node can call a resource management service through an external communication interface to move the plurality of target file objects to a target folder, such as a Yarn service. Subsequently, a corresponding instruction is sent to the activation node, so that the activation node deletes the target file object stored in each data node.
As shown in fig. 3, an embodiment of the present invention further provides a file processing apparatus 200, where the file processing apparatus 200 sets and simulates a node, where the simulation node is applied to a system further including a management node and N data nodes, the simulation node is generated based on the management node, the data nodes store data files, and N is a positive integer;
the document processing apparatus 200 includes:
an obtaining module 201, configured to obtain file information of a file object stored in each data node 201 of the N data nodes 201;
a determining module 202, configured to determine M target file objects according to the file information;
a moving module 203, configured to move the M target file objects to a target folder;
and a modifying module 204, configured to modify the paths of the M target file objects according to the path of the target folder.
Optionally, the obtaining module 201 is further configured to:
under the condition that the target file object is a file, acquiring a memory occupation value of the file stored by each data node 201 of the N data nodes 201;
and determining the file with the memory occupation value smaller than a preset value as a target file object.
Optionally, the obtaining module 201 is further configured to:
under the condition that the target file object is a folder, acquiring a memory occupation value and a file number of the folder stored by each data node 201 of the N data nodes 201;
and determining the folder with the ratio of the number of the files to the memory occupation value smaller than a preset ratio as a target file object.
Optionally, the moving module 203 is further configured to:
copying the M target file objects into a temporary folder according to a preset sequence;
after the M target file objects are all copied to the temporary folder, adjusting the path of the temporary folder, and converting the temporary folder into the target folder;
and deleting the M target file objects stored by the N data nodes.
Optionally, the modifying module 204 is further configured to:
and adjusting the paths of the M target file objects to be the paths of the target folders.
The embodiment of the present invention further provides an electronic device, as shown in fig. 4, including a processor 301, a communication interface 302, a memory 303, and a communication bus 304, where the processor 301, the communication interface 302, and the memory 303 complete mutual communication through the communication bus 304.
A memory 303 for storing a computer program;
a processor 301, configured to, when executing the program stored in the memory 303, acquire file information of a file object stored in each of the N data nodes when the computer program is executed by the processor 301;
determining M target file objects according to the file information;
moving the M target file objects to a target folder;
and modifying the paths of the M target file objects according to the path of the target folder.
Optionally, when being executed by the processor 301, the computer program is further configured to, when the target file object is a file, obtain a memory usage value of the file stored by each data node of the N data nodes;
and determining the file with the memory occupation value larger than a preset value as a target file object.
Optionally, when being executed by the processor 301, the computer program is further configured to, when the target file object is a folder, obtain a memory usage value and a file number of the folder stored in each data node of the N data nodes;
and determining the folder with the ratio of the number of the files to the memory occupation value smaller than a preset ratio as a target file object.
Optionally, when being executed by the processor 301, the computer program is further configured to copy the M target file objects into a temporary folder according to a preset order;
after the M target file objects are all copied to the temporary folder, adjusting the path of the temporary folder, and converting the temporary folder into the target folder;
and deleting the M target file objects stored by the N data nodes.
Optionally, the computer program, when executed by the processor 301, is further configured to adjust the paths of the M target file objects to the path of the target folder.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In another embodiment of the present invention, a computer-readable storage medium is further provided, in which instructions are stored, and when the instructions are executed on a computer, the computer is caused to execute the file processing method according to any one of the above embodiments.
In a further embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the file processing method described in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A file processing method is applied to a distributed file management system, the distributed file management system comprises a management node, a simulation node and N data nodes, the simulation node is generated based on the management node, the data nodes store data files, N is a positive integer, and the method comprises the following steps:
the simulation node acquires file information of a file object stored by each data node in the N data nodes, and one file object occupies one index in the corresponding data node;
determining M target file objects according to the file information, wherein the M target file objects are files or folders which occupy memory values smaller than a preset value, and M is a positive integer larger than 1;
moving the M target file objects to a target folder, wherein the target folder is stored in any data node and occupies one index in the corresponding data node;
and modifying the paths of the M target file objects according to the path of the target folder.
2. The method of claim 1, wherein the obtaining, by the simulation node, file information of the file object stored by each of the N data nodes comprises:
under the condition that the target file object is a file, the simulation node acquires a memory occupation value of the file stored by each data node in the N data nodes;
the determining M target file objects according to the file information includes:
and determining the file with the memory occupation value smaller than a preset value as a target file object.
3. The method of claim 1, wherein the obtaining, by the simulation node, file information of the file object stored by each of the N data nodes comprises:
under the condition that the target file object is a folder, the simulation node acquires a memory occupation value and the file number of the folder stored by each data node of the N data nodes;
the determining M target file objects according to the file information includes:
and determining the folder with the ratio of the number of the files to the memory occupation value smaller than a preset ratio as a target file object.
4. The method of claim 1, wherein said moving said M target file objects to a target folder comprises:
copying the M target file objects into a temporary folder according to a preset sequence;
after the M target file objects are all copied to the temporary folder, adjusting the path of the temporary folder, and converting the temporary folder into the target folder;
and deleting the M target file objects stored by the N data nodes.
5. The method of claim 1, wherein said modifying the path of the M target file objects according to the path of the target folder comprises:
and adjusting the paths of the M target file objects to be the paths of the target folders.
6. A file processing device is characterized by being arranged on a simulation node, wherein the simulation node is applied to a system which further comprises a management node and N data nodes, the simulation node is generated based on the management node, the data nodes store data files, and N is a positive integer;
the device comprises:
the acquisition module is used for acquiring file information of a file object stored by each data node in the N data nodes, and one file object occupies one index in the corresponding data node;
the determining module is used for determining M target file objects according to the file information, wherein the M target file objects are files or folders which occupy memory values smaller than a preset value, and M is a positive integer larger than 1;
a moving module, configured to move the M target file objects to a target folder, where the target folder is stored in any data node, and the target folder occupies one index in the corresponding data node;
and the modification module is used for modifying the paths of the M target file objects according to the paths of the target folders.
7. The system of claim 6, wherein the obtaining module is further configured to:
under the condition that the target file object is a file, acquiring a memory occupation value of the file stored by each data node of the N data nodes;
and determining the file with the memory occupation value smaller than a preset value as a target file object.
8. The system of claim 6, wherein the obtaining is further configured to:
under the condition that the target file object is a folder, acquiring a memory occupation value and the number of files of the folder stored by each data node of the N data nodes;
and determining the folder with the ratio of the number of the files to the memory occupation value smaller than a preset ratio as a target file object.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the file processing method according to any one of claims 1 to 5 when executing the program stored in the memory.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a file processing method according to any one of claims 1 to 5.
CN202110356539.0A 2021-04-01 2021-04-01 File processing method, system, electronic device and storage medium Pending CN112948327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110356539.0A CN112948327A (en) 2021-04-01 2021-04-01 File processing method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110356539.0A CN112948327A (en) 2021-04-01 2021-04-01 File processing method, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN112948327A true CN112948327A (en) 2021-06-11

Family

ID=76232100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110356539.0A Pending CN112948327A (en) 2021-04-01 2021-04-01 File processing method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN112948327A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861686A (en) * 2017-09-26 2018-03-30 深圳前海微众银行股份有限公司 File memory method, service end and computer-readable recording medium
CN108984686A (en) * 2018-07-02 2018-12-11 中国电子科技集团公司第五十二研究所 A kind of distributed file system indexing means and device merged based on log
CN110457281A (en) * 2019-08-14 2019-11-15 北京博睿宏远数据科技股份有限公司 Data processing method, device, equipment and medium
CN111723056A (en) * 2020-06-09 2020-09-29 北京青云科技股份有限公司 Small file processing method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861686A (en) * 2017-09-26 2018-03-30 深圳前海微众银行股份有限公司 File memory method, service end and computer-readable recording medium
CN108984686A (en) * 2018-07-02 2018-12-11 中国电子科技集团公司第五十二研究所 A kind of distributed file system indexing means and device merged based on log
CN110457281A (en) * 2019-08-14 2019-11-15 北京博睿宏远数据科技股份有限公司 Data processing method, device, equipment and medium
CN111723056A (en) * 2020-06-09 2020-09-29 北京青云科技股份有限公司 Small file processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111538719B (en) Data migration method, device, equipment and computer storage medium
US10628298B1 (en) Resumable garbage collection
CN111818175B (en) Enterprise service bus configuration file generation method, device, equipment and storage medium
US20230030856A1 (en) Distributed table storage processing method, device and system
CN111240892A (en) Data backup method and device
CN110888972A (en) Sensitive content identification method and device based on Spark Streaming
CN112463058B (en) Fragmented data sorting method and device and storage node
CN111143113A (en) Method, electronic device and computer program product for copying metadata
CN108234566B (en) Cluster data processing method and device
TWI571754B (en) Method for performing file synchronization control, and associated apparatus
CN114466083B (en) Data storage system supporting protocol interworking
CN112948327A (en) File processing method, system, electronic device and storage medium
US10452637B1 (en) Migration of mutable data sets between data stores
CN115391337A (en) Database partitioning method and device, storage medium and electronic equipment
CN113553314A (en) Service processing method, device, equipment and medium of super-convergence system
CN113448775A (en) Multi-source heterogeneous data backup method and device
US11645333B1 (en) Garbage collection integrated with physical file verification
CN112965939A (en) File merging method, device and equipment
CN107102898B (en) Memory management and data structure construction method and device based on NUMA (non Uniform memory Access) architecture
CN111858497A (en) Storage type conversion method, device and equipment
CN111435342A (en) Poster updating method, poster updating system and poster management system
CN112035119B (en) Data deleting method and device
CN113377500B (en) Resource scheduling method, device, equipment and medium
CN116069510B (en) Data processing method, device, electronic equipment and storage medium
CN112817923B (en) Application program data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination