CN114327950A - File system disk scanning method and device and file management system - Google Patents

File system disk scanning method and device and file management system Download PDF

Info

Publication number
CN114327950A
CN114327950A CN202111649493.8A CN202111649493A CN114327950A CN 114327950 A CN114327950 A CN 114327950A CN 202111649493 A CN202111649493 A CN 202111649493A CN 114327950 A CN114327950 A CN 114327950A
Authority
CN
China
Prior art keywords
directory
disk
scanned
file
message queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111649493.8A
Other languages
Chinese (zh)
Inventor
陈明
吴俊�
李萍
蔡晶
冯妮佳
曹志生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Novogene Technology Co ltd
Original Assignee
Beijing Novogene Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Novogene Technology Co ltd filed Critical Beijing Novogene Technology Co ltd
Priority to CN202111649493.8A priority Critical patent/CN114327950A/en
Publication of CN114327950A publication Critical patent/CN114327950A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method and a device for scanning a disk of a file system and a file management system. The method comprises the following steps: the method comprises the steps of constructing a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory; a starting step of starting a plurality of threads; the method comprises the steps of obtaining a plurality of to-be-scanned disk directories, wherein the to-be-scanned disk directories correspond to threads one by one; a control step, wherein a control thread acquires file information of a file under the uppermost directory of a to-be-scanned disk directory or returns an updated to-be-scanned disk directory to a shared message queue; and repeating the steps, namely repeating the acquisition step and the control step until the disc sweeping is finished. Compared with the prior art that a find tool single-thread tray sweeping and an fd tool tray sweeping need to be linked with each other to check file information, the tray sweeping method of the file system greatly improves tray sweeping efficiency, reduces tray sweeping time consumption, and solves the problem that the tray sweeping method in the prior art is time-consuming and overlong.

Description

File system disk scanning method and device and file management system
Technical Field
The present application relates to the field of software development, and in particular, to a method and an apparatus for scanning a disk of a file system, a computer-readable storage medium, a processor, and a file management system.
Background
In the Linux system, file systems are like tree structures, and gradually branch down from a common root directory, when a file system is very large, for example, Linux clusters of many companies often have very large and complex file systems, and the number of related files is often more than ten million. The Linux cluster administrator often needs to do work by scanning the file systems, which is to traverse all files, obtain files in all clusters and relevant information thereof, such as the last modification date of the files, the size of the files, the belongings and other information, and then perform management and decision based on the file information. The traditional method is to use a tool like find or fd, and the current disk-scanning method is mainly performed by using a find command carried by the Linux system, which can recursively list each problem of the file system, but the main problem is that only a single thread can be used for disk-scanning, and when the file system is too deep and the files are too many, the speed of disk-scanning increases linearly, and the performance is poor. However, the performance of the fd command in the disk scanning is also problematic in that detailed information of each file cannot be directly obtained, and other Linux commands, such as an ls command, must be linked to obtain file information, which greatly increases the disk scanning time and further causes the performance of the disk scanning to be reduced.
The above information disclosed in this background section is only for enhancement of understanding of the background of the technology described herein and, therefore, certain information may be included in the background that does not form the prior art that is already known in this country to a person of ordinary skill in the art.
Disclosure of Invention
The present application mainly aims to provide a method and an apparatus for scanning a disk of a file system, a computer-readable storage medium, a processor, and a file management system, so as to solve the problem that the time consumption of the disk scanning method in the prior art is too long.
According to an aspect of an embodiment of the present invention, there is provided a method for scanning a disk of a file system, including: the method comprises the steps of constructing a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory; a starting step of starting a plurality of threads; an obtaining step, in which a plurality of disc directories to be scanned are obtained, and the disc directories to be scanned correspond to the threads one by one; controlling the thread to acquire file information of files under the uppermost directory of the to-be-scanned disc directory or to return the updated to-be-scanned disc directory to the shared message queue, wherein the file information comprises a last modification date, a file size, a person and a group to which the to-be-scanned disc directory belongs, and the updated to-be-scanned disc directory is a directory under the uppermost directory; and repeating the step, namely repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is finished.
Optionally, the obtaining step includes: controlling the thread to send a disk scanning request to the shared message queue, wherein the disk scanning request corresponds to the thread one by one; and responding to the disk scanning request, and controlling the shared message queue to return the disk directory to be scanned to the thread.
Optionally, the acquiring step further includes: and closing the thread corresponding to the disk-sweeping request under the condition that the shared message queue does not respond to the disk-sweeping request within the preset time.
Optionally, repeating the acquiring step and the controlling step until the sweeping of the disc is completed, including: repeating the obtaining step and the controlling step until all the threads are closed.
Optionally, the controlling step comprises: under the condition that the content of the top directory of the to-be-scanned disk directory is the next-layer directory, controlling the thread to return the updated to-be-scanned disk directory to the shared message queue; and controlling the thread to acquire the file information of the file under the condition that the content under the uppermost directory of the to-be-scanned disk directory is the file.
Optionally, before the opening step, the method further comprises: and sending the directory of the file system to the shared message queue to obtain the directory of the disk to be scanned.
According to another aspect of the embodiments of the present invention, a disk scanning apparatus of a file system is further provided, including a construction unit, an opening unit, an obtaining unit, a control unit, and a processing unit, where the construction unit is configured to execute a construction step, construct a shared message queue, and the shared message queue is configured to store a to-be-scanned disk directory; the starting unit is used for executing a starting step and starting a plurality of threads; the acquisition unit is used for executing an acquisition step to acquire a plurality of to-be-scanned disk directories, wherein the to-be-scanned disk directories correspond to the threads one by one; the control unit is used for executing a control step, controlling the thread to acquire file information of a file under the uppermost directory of the to-be-scanned disc directory or controlling the thread to return the updated to-be-scanned disc directory to the shared message queue, wherein the file information comprises a last modification date, a file size, an affiliated person and an affiliated group, and the updated to-be-scanned disc directory is a directory under the uppermost directory; the processing unit is used for executing the repeating step, and repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is completed.
According to yet another aspect of embodiments of the present invention, there is also provided a computer-readable storage medium including a stored program, wherein the program performs any one of the methods.
According to yet another aspect of the embodiments of the present invention, there is also provided a processor for executing a program, wherein the program executes to perform any one of the methods.
According to still another aspect of the embodiments of the present invention, there is also provided a file management system including: a file system and one or more processors, memory, a display device, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods described herein.
In the embodiment of the present invention, in the disk scanning method for a file system, first, a step of constructing a shared message queue is performed, where the shared message queue is used to store a to-be-scanned disk directory; then, a starting step of starting a plurality of threads; then, an obtaining step, in which a plurality of the to-be-scanned disk directories are obtained, wherein the to-be-scanned disk directories correspond to the threads one by one; then, a control step of controlling the thread to acquire file information of a file under the uppermost directory of the disk directory to be scanned or controlling the thread to return the updated disk directory to be scanned to the shared message queue, wherein the file information includes a last modification date, a file size, a person and a group to which the file belongs, and the updated disk directory to be scanned is a directory under the uppermost directory; and finally, repeating the step, namely repeating the acquiring step and the controlling step until the disc sweeping is finished. According to the method, the top directory of the disk to be scanned is processed through thread disk scanning, file information of files in the directory of the layer is obtained, the next layer of directory is returned until the disk scanning of all directory layers is completed, and multi-thread parallel layer-by-layer disk scanning is achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 shows a flow diagram of a method for sweeping a disk of a file system according to an embodiment of the present application;
FIG. 2 illustrates a logical flow diagram of a method of sweeping a disk of a file system according to an embodiment of the present application;
fig. 3 shows a schematic diagram of a disk scanning apparatus of a file system according to an embodiment of the present application.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" another element, it can be directly on the other element or intervening elements may also be present. Also, in the specification and claims, when an element is described as being "connected" to another element, the element may be "directly connected" to the other element or "connected" to the other element through a third element.
As mentioned in the background, the disk scanning method in the prior art is time-consuming and lengthy, and in order to solve the above problems, in an exemplary embodiment of the present application, a disk scanning method, apparatus, computer-readable storage medium, processor, and file management system of a file system are provided.
According to an embodiment of the present application, a method of sweeping a disk of a file system is provided.
Fig. 1 is a flowchart of a method for scanning a disk of a file system according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
step S101, a step of constructing a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory;
step S102, a starting step, wherein a plurality of threads are started;
step S103, an obtaining step, in which a plurality of to-be-scanned disk directories are obtained, and the to-be-scanned disk directories correspond to the threads one by one;
step S104, controlling the thread to acquire file information of files under the uppermost directory of the to-be-scanned disc directory or controlling the thread to return the updated to-be-scanned disc directory to the shared message queue, wherein the file information comprises a last modification date, a file size, a belonged person and a belonged group, and the updated to-be-scanned disc directory is a directory under the uppermost directory;
and S105, repeating the steps, namely repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is finished.
In the disk scanning method of the file system, firstly, a construction step is carried out, namely a shared message queue is constructed, and the shared message queue is used for storing a to-be-scanned disk directory; then, a starting step of starting a plurality of threads; then, an obtaining step, in which a plurality of the to-be-scanned disk directories are obtained, wherein the to-be-scanned disk directories correspond to the threads one by one; then, a control step of controlling the thread to acquire file information of a file under the uppermost directory of the disk directory to be scanned or controlling the thread to return the updated disk directory to be scanned to the shared message queue, wherein the file information includes a last modification date, a file size, a person and a group to which the file belongs, and the updated disk directory to be scanned is a directory under the uppermost directory; and finally, repeating the step, namely repeating the acquiring step and the controlling step until the disc sweeping is finished. According to the method, the top directory of the disk to be scanned is processed through thread disk scanning, file information of files in the directory of the layer is obtained, the next layer of directory is returned until the disk scanning of all directory layers is completed, and multi-thread parallel layer-by-layer disk scanning is achieved.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Specifically, the above-mentioned construction steps construct a shared message queue in the python programming language environment, and open multiple threads at the same time, and utilize the data structure of the queue to implement inter-thread communication, so that multiple threads share the information of the disk-scanning directory at the same time, and no repeated disk-scanning or collision occurs, therefore, opening as many threads as possible, and the process of disk-scanning data will be reduced by approaching exponential level.
In an embodiment of the present application, as shown in fig. 2, the obtaining step includes: controlling the thread to send a disk-sweeping request to the shared message queue, wherein the disk-sweeping request corresponds to the thread one by one; and responding to the disk scanning request, and controlling the shared message queue to return the to-be-scanned disk directory to the thread. Specifically, multiple threads all request the disk to be scanned from one shared message queue, and the multiple threads share the information of the disk scanning directory at the same time, so that the repeated disk scanning or collision of the multiple threads of the same disk to be scanned can be avoided.
In another embodiment of the present application, the acquiring step further includes: and closing the thread corresponding to the disk-sweeping request when the shared message queue does not respond to the disk-sweeping request within a preset time. And if the shared message queue does not respond to the disk scanning request within the preset time, the directory does not need to be scanned, and at the moment, the thread corresponding to the disk scanning request is closed.
In another embodiment of the present application, repeating the obtaining step and the controlling step until the sweeping is completed includes: and repeating the acquiring step and the controlling step until all the threads are closed. Specifically, all the threads are closed, which indicates that all the threads are not in disk-sweeping and that there is no to-be-swept disk directory in the shared message queue, and that disk-sweeping is complete may be determined.
In still another embodiment of the present application, as shown in fig. 2, the controlling step includes: controlling the thread to return the updated disk directory to be scanned to the shared message queue under the condition that the content under the uppermost directory of the disk directory to be scanned is the next-layer directory; and controlling the thread to acquire the file information of the file when the content under the uppermost directory of the to-be-scanned disk directory is the file.
In particular, the above control steps are accomplished by creating the processing logic of a single thread by means of a built-in module threading provided by the python programming language, each thread processing only one layer of directory, i.e. the uppermost directory, and the next layer of directory is no longer processed. Judging the type of each file in the layer of directory, if the file is the file, obtaining the detailed information of the file, outputting the statistical data to a result file, and if the file is the directory, returning the directory to the queue.
In another embodiment of the present application, as shown in fig. 2, before the opening step, the method further includes: and sending the directory of the file system to the shared message queue to obtain the directory of the disk to be scanned. And sending the directory of the file system to the shared message queue to obtain the directory of the disk to be scanned, so that the disk scanning information is more comprehensive.
In practical application, the method also provides a function of comparing a plurality of scanning results, and the function is mainly used for providing the change of occupied storage of files at different time points.
The embodiment of the present application further provides a disk scanning device for a file system, and it should be noted that the disk scanning device for a file system according to the embodiment of the present application can be used to execute the disk scanning method for a file system according to the embodiment of the present application. The following describes a disk scanning device of a file system according to an embodiment of the present application.
FIG. 3 is a schematic diagram of a disk scanning apparatus of a file system according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:
a constructing unit 10, configured to execute a constructing step to construct a shared message queue, where the shared message queue is used to store a to-be-scanned disk directory;
a starting unit 20 for executing a starting step to start a plurality of threads;
an obtaining unit 30, configured to perform an obtaining step, to obtain a plurality of to-be-scanned disk directories, where the to-be-scanned disk directories correspond to the threads one to one;
a control unit 40, configured to execute a control step, where the thread is controlled to obtain file information of a file under a top-level directory of the to-be-scanned disc directory, or the thread is controlled to return the updated to-be-scanned disc directory to the shared message queue, where the file information includes a last modification date, a file size, an affiliate, and an affiliate group, and the updated to-be-scanned disc directory is a directory under the top-level directory;
and the processing unit 50 is used for executing a repeating step, and repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is completed.
In the disk scanning device of the file system, a construction unit 10 executes a construction step to construct a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory; then, the starting unit 20 executes the starting steps to start a plurality of threads; then, the obtaining unit 30 executes the obtaining step to obtain a plurality of the to-be-scanned disk directories, which correspond to the threads one by one; then, the control unit 40 executes a control step to control the thread to acquire file information of a file under the uppermost directory of the to-be-scanned disc directory or control the thread to return the updated to-be-scanned disc directory to the shared message queue, where the file information includes a last modification date, a file size, an affiliated person and an affiliated group, and the updated to-be-scanned disc directory is a directory under the uppermost directory; finally, the processing unit 50 executes a repeating step, and repeats the acquiring step and the controlling step until the disc sweeping is completed. The device processes the top directory of the disk to be scanned through thread disk scanning, obtains file information of files under the directory of the layer, returns to the directory of the next layer, and achieves multi-thread parallel layer disk scanning until the disk scanning of all directory layers is completed.
Specifically, the building unit builds a shared message queue in a python programming language environment, simultaneously starts a plurality of threads, and realizes inter-thread communication by using a data structure of the queue, so that the threads share information of a disk scanning directory at the same time, and repeated disk scanning or conflict does not occur, therefore, the process of starting as many threads as possible and scanning disk data is reduced by approaching exponential level.
In an embodiment of the present application, the obtaining unit includes a first control module and a second control module, where the first control module is configured to control the thread to send a tray-sweeping request to the shared message queue, and the tray-sweeping requests correspond to the threads one to one; the second control module is configured to control the shared message queue to return the to-be-scanned disk directory to the thread in response to the disk scanning request. Specifically, multiple threads all request the disk to be scanned from one shared message queue, and the multiple threads share the information of the disk scanning directory at the same time, so that the repeated disk scanning or collision of the multiple threads of the same disk to be scanned can be avoided.
In another embodiment of the application, the obtaining unit further includes a closing module, where the closing module is configured to close the thread corresponding to the scan request when the shared message queue does not respond to the scan request within a predetermined time. And if the shared message queue does not respond to the disk scanning request within the preset time, the directory does not need to be scanned, and at the moment, the thread corresponding to the disk scanning request is closed.
In yet another embodiment of the present application, the processing unit includes a processing module, and the processing module is configured to repeat the obtaining step and the controlling step until all the threads are closed. Specifically, all the threads are closed, which indicates that all the threads are not in disk-sweeping and that there is no to-be-swept disk directory in the shared message queue, and that disk-sweeping is complete may be determined.
In yet another embodiment of the present application, the control unit includes a third control module and a fourth control module, where the third control module is configured to control the thread to return the updated disk directory to be scanned to the shared message queue when the content in the topmost directory of the disk directory to be scanned is the next-layer directory; the fourth control module is configured to control the thread to obtain the file information of the file when the content in the uppermost directory of the to-be-scanned disk directory is the file.
In particular, the control unit creates the processing logic of a single thread by means of the built-in module threading provided by the python programming language, each thread only processes one layer of directory, i.e. the uppermost directory, and the next layer of directory is not processed any more. Judging the type of each file in the layer of directory, if the file is the file, obtaining the detailed information of the file, outputting the statistical data to a result file, and if the file is the directory, returning the directory to the queue.
In another embodiment of the present application, the apparatus further includes a sending unit, where the sending unit is configured to send a directory of the file system to the shared message queue before the opening step, so as to obtain the to-be-scanned disk directory. And sending the directory of the file system to the shared message queue to obtain the directory of the disk to be scanned, so that the disk scanning information is more comprehensive.
In practical application, the device also provides a function of comparing a plurality of scanning results, and the function is mainly used for providing the change of occupied storage of files at different time points.
The disk scanning device of the file system comprises a processor and a memory, wherein the building unit, the opening unit, the acquiring unit, the control unit, the processing unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. One or more than one kernel can be set, and the problem that the time consumption of the disk scanning method in the prior art is too long is solved by adjusting kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the above-mentioned disk scanning method for a file system.
The embodiment of the invention provides a processor, which is used for running a program, wherein the program executes a disk scanning method of the file system when running.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein when the processor executes the program, at least the following steps are realized:
step S101, a step of constructing a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory;
step S102, a starting step, wherein a plurality of threads are started;
step S103, an obtaining step, in which a plurality of to-be-scanned disk directories are obtained, and the to-be-scanned disk directories correspond to the threads one by one;
step S104, controlling the thread to acquire file information of files under the uppermost directory of the to-be-scanned disc directory or controlling the thread to return the updated to-be-scanned disc directory to the shared message queue, wherein the file information comprises a last modification date, a file size, a belonged person and a belonged group, and the updated to-be-scanned disc directory is a directory under the uppermost directory;
and S105, repeating the steps, namely repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is finished.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program of initializing at least the following method steps when executed on a data processing device:
step S101, a step of constructing a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory;
step S102, a starting step, wherein a plurality of threads are started;
step S103, an obtaining step, in which a plurality of to-be-scanned disk directories are obtained, and the to-be-scanned disk directories correspond to the threads one by one;
step S104, controlling the thread to acquire file information of files under the uppermost directory of the to-be-scanned disc directory or controlling the thread to return the updated to-be-scanned disc directory to the shared message queue, wherein the file information comprises a last modification date, a file size, a belonged person and a belonged group, and the updated to-be-scanned disc directory is a directory under the uppermost directory;
and S105, repeating the steps, namely repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is finished.
In another exemplary embodiment of the present application, there is also provided a file management system including: a file system and one or more processors, memory, a display device, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods described above.
According to the file management system, the top directory of the disk to be scanned is processed through thread disk scanning, the file information of the files under the directory of the layer is obtained, the next layer of directory is returned until the disk scanning of all directory layers is completed, multi-thread parallel layer-by-layer disk scanning is achieved, compared with a tool in the prior art that a find tool single-thread disk scanning and an fd tool disk scanning need to be linked to check the file information, the disk scanning efficiency is greatly improved, the disk scanning time is reduced, and the problem that the disk scanning method in the prior art is long in time consumption is solved.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned computer-readable storage media comprise: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
In order to make the technical solutions of the present application more clearly understood by those skilled in the art, the technical solutions of the present application will be described in detail below with reference to specific examples and comparative examples.
Examples
This embodiment provides a method of scanning a disk of a file system, named mutiscan, comprising: a construction step, namely constructing a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory; a starting step of starting a plurality of threads; an obtaining step, obtaining a plurality of the to-be-scanned disk directories, wherein the to-be-scanned disk directories correspond to the threads one by one; a control step of controlling the thread to acquire file information of a file under the uppermost directory of the to-be-scanned disc directory or controlling the thread to return the updated to-be-scanned disc directory to the shared message queue, wherein the file information includes a last modification date, a file size, a belonged person and a belonged group, and the updated to-be-scanned disc directory is a directory under the uppermost directory; and repeating the step, namely repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is finished.
The time consumed for scanning a directory containing about 61 ten thousand files by using the above-mentioned method for scanning a disk by using the file system and the find command and the fd command carried by the Linux system is as follows:
disc sweeping method Run command When in use
mutiscan mutiscan scan-s scan_path-n 60 10min47s
mutiscan mutiscan scan-s scan_path-n 3000 7min19s
find command find scan_path-type f>find_files_personal_dir 1hour21min
fd Command fd.scan_path>fd_scan_files.txt 7min41s
fd Command fd.scan_path-l>fd_scan_files.txt 37min
In the above table, scan _ path represents a scan directory, and n 60 and n 3000 represent 60 threads being opened and 3000 threads being opened, respectively. As can be seen from the above table, the time consumption for scanning the same directory is the longest in the find command; the more the number of the mutiscan starting threads is, the shorter the disk scanning time is; the time for scanning the disk when the fd command and the mutiscan start 3000 threads is close, but the fd command cannot directly acquire the information of the file, and a-l parameter or an lss command must be added, so that the speed for scanning the disk is increased by 37 min. Therefore, the method for scanning the disk of the file system can solve the problem that the time consumption of the disk scanning method in the prior art is too long, improve the disk scanning efficiency and increase the timeliness of the disk scanning result.
From the above description, it can be seen that the above-described embodiments of the present application achieve the following technical effects:
1) the method for scanning the disk of the file system comprises the following steps of firstly, constructing a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory; then, a starting step of starting a plurality of threads; then, an obtaining step, in which a plurality of the to-be-scanned disk directories are obtained, wherein the to-be-scanned disk directories correspond to the threads one by one; then, a control step of controlling the thread to acquire file information of a file under the uppermost directory of the disk directory to be scanned or controlling the thread to return the updated disk directory to be scanned to the shared message queue, wherein the file information includes a last modification date, a file size, a person and a group to which the file belongs, and the updated disk directory to be scanned is a directory under the uppermost directory; and finally, repeating the step, namely repeating the acquiring step and the controlling step until the disc sweeping is finished. According to the method, the top directory of the disk to be scanned is processed through thread disk scanning, file information of files in the directory of the layer is obtained, the next layer of directory is returned until the disk scanning of all directory layers is completed, and multi-thread parallel layer-by-layer disk scanning is achieved.
2) In the disk scanning device of the file system of the application, a construction unit 10 executes a construction step to construct a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory; then, the starting unit 20 executes the starting steps to start a plurality of threads; then, the obtaining unit 30 executes the obtaining step to obtain a plurality of the to-be-scanned disk directories, which correspond to the threads one by one; then, the control unit 40 executes a control step to control the thread to acquire file information of a file under the uppermost directory of the to-be-scanned disc directory or control the thread to return the updated to-be-scanned disc directory to the shared message queue, where the file information includes a last modification date, a file size, an affiliated person and an affiliated group, and the updated to-be-scanned disc directory is a directory under the uppermost directory; finally, the processing unit 50 executes a repeating step, and repeats the acquiring step and the controlling step until the disc sweeping is completed. The device processes the top directory of the disk to be scanned through thread disk scanning, obtains file information of files under the directory of the layer, returns to the directory of the next layer, and achieves multi-thread parallel layer disk scanning until the disk scanning of all directory layers is completed.
3) According to the file management system, the top directory of the to-be-scanned disk is processed through thread scanning, file information of files under the directory of the layer is obtained, the directory of the next layer is returned until scanning of all directory layers is completed, multi-thread parallel layer-by-layer scanning is achieved, compared with a tool in the prior art that a fin tool single-thread scanning disk and an fd tool scanning disk need to be linked to check file information, disk scanning efficiency is greatly improved, disk scanning time is reduced, and the problem that a disk scanning method in the prior art is long in time consumption is solved.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for scanning a file system, comprising:
the method comprises the steps of constructing a shared message queue, wherein the shared message queue is used for storing a to-be-scanned disk directory;
a starting step of starting a plurality of threads;
an obtaining step, in which a plurality of disc directories to be scanned are obtained, and the disc directories to be scanned correspond to the threads one by one;
controlling the thread to acquire file information of files under the uppermost directory of the to-be-scanned disc directory or to return the updated to-be-scanned disc directory to the shared message queue, wherein the file information comprises a last modification date, a file size, a person and a group to which the to-be-scanned disc directory belongs, and the updated to-be-scanned disc directory is a directory under the uppermost directory;
and repeating the step, namely repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is finished.
2. The method of claim 1, wherein the obtaining step comprises:
controlling the thread to send a disk scanning request to the shared message queue, wherein the disk scanning request corresponds to the thread one by one;
and responding to the disk scanning request, and controlling the shared message queue to return the disk directory to be scanned to the thread.
3. The method of claim 2, wherein the obtaining step further comprises:
and closing the thread corresponding to the disk-sweeping request under the condition that the shared message queue does not respond to the disk-sweeping request within the preset time.
4. The method of claim 3, wherein repeating the acquiring and controlling steps until the sweeping of the disc is complete comprises:
repeating the obtaining step and the controlling step until all the threads are closed.
5. The method of claim 1, wherein the controlling step comprises:
under the condition that the content of the top directory of the to-be-scanned disk directory is the next-layer directory, controlling the thread to return the updated to-be-scanned disk directory to the shared message queue;
and controlling the thread to acquire the file information of the file under the condition that the content under the uppermost directory of the to-be-scanned disk directory is the file.
6. The method of claim 1, wherein prior to the opening step, the method further comprises:
and sending the directory of the file system to the shared message queue to obtain the directory of the disk to be scanned.
7. A disk cleaning apparatus of a file system, comprising:
the building unit is used for executing the building steps and building a shared message queue, and the shared message queue is used for storing the directory of the disk to be scanned;
the starting unit is used for executing the starting step and starting a plurality of threads;
an obtaining unit, configured to perform an obtaining step to obtain a plurality of to-be-scanned disk directories, where the to-be-scanned disk directories correspond to the threads one to one;
a control unit, configured to execute a control step, control the thread to acquire file information of a file under an uppermost directory of the to-be-scanned disc directory or control the thread to return the updated to-be-scanned disc directory to the shared message queue, where the file information includes a last modification date, a file size, an affiliated person, and an affiliated group, and the updated to-be-scanned disc directory is a directory under the uppermost directory;
and the processing unit is used for executing the repeating step, and repeating the acquiring step and the controlling step at least once in sequence until the disc sweeping is completed.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program performs the method of any one of claims 1 to 6.
9. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 6.
10. A file management system, comprising: a file system and one or more processors, memory, a display device, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-6.
CN202111649493.8A 2021-12-29 2021-12-29 File system disk scanning method and device and file management system Pending CN114327950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111649493.8A CN114327950A (en) 2021-12-29 2021-12-29 File system disk scanning method and device and file management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111649493.8A CN114327950A (en) 2021-12-29 2021-12-29 File system disk scanning method and device and file management system

Publications (1)

Publication Number Publication Date
CN114327950A true CN114327950A (en) 2022-04-12

Family

ID=81019829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111649493.8A Pending CN114327950A (en) 2021-12-29 2021-12-29 File system disk scanning method and device and file management system

Country Status (1)

Country Link
CN (1) CN114327950A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526948A (en) * 2009-04-23 2009-09-09 山东中创软件商用中间件股份有限公司 Multithreading file traversal technology
CN111769933A (en) * 2020-06-29 2020-10-13 北京天融信网络安全技术有限公司 Method and device for monitoring file change, electronic equipment and storage medium
CN113835613A (en) * 2020-06-24 2021-12-24 浙江宇视科技有限公司 File reading method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526948A (en) * 2009-04-23 2009-09-09 山东中创软件商用中间件股份有限公司 Multithreading file traversal technology
CN113835613A (en) * 2020-06-24 2021-12-24 浙江宇视科技有限公司 File reading method and device, electronic equipment and storage medium
CN111769933A (en) * 2020-06-29 2020-10-13 北京天融信网络安全技术有限公司 Method and device for monitoring file change, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2021147288A1 (en) Container cluster management method, device and system
US20220391260A1 (en) Method and Apparatus for Creating Container, Device, Medium, and Program Product
CA3104353A1 (en) Storage volume creation method and apparatus, server, and storage medium
US9836516B2 (en) Parallel scanners for log based replication
US20180006963A1 (en) Network-accessible data volume modification
EP3879875A1 (en) Resource change method and device, apparatus, and storage medium
WO2017050064A1 (en) Memory management method and device for shared memory database
US20180004698A1 (en) Network-accessible data volume modification
CN102073461A (en) Input-output request scheduling method, memory controller and memory array
CN109814896A (en) Server updating method, apparatus, computer system and readable storage system
CN105718561A (en) Particular distributed data storage file structure redundancy removing construction method and system
US20240012813A1 (en) Dynamic prefetching for database queries
US10642817B2 (en) Index table update method, and device
CN112363820A (en) Uniform resource pooling container scheduling engine based on heterogeneous hardware and scheduling method thereof
CN115039091A (en) Multi-key-value command processing method and device, electronic equipment and storage medium
CN116088758A (en) Optimization method, optimization device, optimization computer device, optimization storage medium, and optimization program product
CN113535087B (en) Data processing method, server and storage system in data migration process
US11561843B2 (en) Automated performance tuning using workload profiling in a distributed computing environment
CN113254223A (en) Resource allocation method and system after system restart and related components
US11900155B2 (en) Method, device, and computer program product for job processing
CN114327950A (en) File system disk scanning method and device and file management system
DE112021000408T5 (en) PREDICTIVE DELIVERY OF REMOTELY STORED FILES
JP6720357B2 (en) Change network accessible data volume
CN114296965A (en) Feature retrieval method, feature retrieval device, electronic equipment and computer storage medium
CN105183666A (en) Scheduling control method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination