CN111159109A - Method and system for detecting file occupied by disk space - Google Patents

Method and system for detecting file occupied by disk space Download PDF

Info

Publication number
CN111159109A
CN111159109A CN201911176281.5A CN201911176281A CN111159109A CN 111159109 A CN111159109 A CN 111159109A CN 201911176281 A CN201911176281 A CN 201911176281A CN 111159109 A CN111159109 A CN 111159109A
Authority
CN
China
Prior art keywords
file
disk
folder
data
analyzing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911176281.5A
Other languages
Chinese (zh)
Inventor
陶壮壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201911176281.5A priority Critical patent/CN111159109A/en
Publication of CN111159109A publication Critical patent/CN111159109A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to a method and a system for detecting a file occupied by a disk space, wherein the method comprises the following steps: collecting file information in current equipment, wherein the file information is obtained by analyzing a disk state snapshot, and/or the file information is obtained by analyzing the characteristics of a file system, and/or the file information is obtained by analyzing the characteristics of an operating system; if the file information is encrypted, after the encrypted file information is decrypted into a plaintext, analyzing to obtain the file change condition in the current equipment within a specified time period; identifying the nesting structure of the folders in the current equipment, determining a target folder in which data growth really occurs according to the file change condition of each folder in the nesting structure, and displaying the target folder to a user. The technical scheme provided by the application can improve the efficiency and accuracy of file detection.

Description

Method and system for detecting file occupied by disk space
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and a system for detecting a file occupied by a disk space.
Background
With the continuous development of computer devices, the storage of data has become a more serious problem. At present, many software tools can count the data storage condition in a disk, and can perform sorting based on the data volume, access frequency or access times of a folder, so that the data storage condition of the folder is displayed for a user, and a basis is provided for the user to perform data cleaning. The existing software tool can label files or folders with larger volume and lower access frequency for users to screen. However, in most cases, the listed large files or low-frequency files cannot be deleted easily, and the files which can be deleted are often less top-ranked files. In addition, the software tool in the prior art lists a folder, which may have a plurality of subfolders therein, and the increase of the data amount of the folder may be caused by one or more subfolders, but only the most original folder can be counted in the prior art, and the user cannot know which folder the data growth actually occurs.
Disclosure of Invention
The application aims to provide a method and a system for detecting a file occupied by a disk space, which can improve the efficiency and accuracy of file detection.
In order to achieve the above object, the present application provides a method for detecting a file occupied by a disk space, where the method includes:
collecting file information in current equipment, wherein the file information is obtained by analyzing a disk state snapshot, and/or the file information is obtained by analyzing the characteristics of a file system, and/or the file information is obtained by analyzing the characteristics of an operating system;
if the file information is encrypted, after the encrypted file information is decrypted into a plaintext, analyzing to obtain the file change condition in the current equipment within a specified time period;
identifying the nesting structure of the folders in the current equipment, determining a target folder in which data growth really occurs according to the file change condition of each folder in the nesting structure, and displaying the target folder to a user.
Further, the file information obtained through the analysis of the disk state snapshot includes:
recording the sizes of all files and folders in the magnetic disk of the current equipment according to a specified time period, and obtaining the files or folders with data growth in a specified time period by comparing the sizes of the files and folders in the current magnetic disk with the sizes of the recorded files and folders.
Further, the file information is obtained based on the characteristic analysis of the file system, and includes:
reading a record file from a log file system of the current equipment, and determining data change of the file in the disk in a specified time period by analyzing the record file;
or
Counting files created or modified in a specified time period, and analyzing the counted files with data change.
Further, the file information is obtained based on the characteristic analysis of the operating system, and the method comprises the following steps:
starting a daemon process/background service/background process in the current equipment, recording the current file read-write operation through the daemon process/background service/background process when the file read-write operation occurs in a disk, and analyzing the file with data change in a specified time period through a recorded result;
or
Setting a hook function for a file operation API or a disk operation API in an operating system, and when the file operation API is called by the disk operation API, acquiring calling time, a changed file/disk address and changed contents through the hook function so as to analyze a file with data change in a specified time period according to acquired information;
or
If a file change notification function is provided in an operating system, when a file change occurs in the operating system, receiving a system notification transmitted by the file change notification function, and analyzing the file with data change in a specified time period according to the system notification.
Further, the file information is encrypted in the following manner:
randomly generating a random number, and writing the random number into a designated storage space; when file information is generated in the system, reading the random number from the designated storage space, and encrypting the generated file information by using the random number;
accordingly, decrypting the encrypted file information into plaintext includes:
and reading the random number from the designated storage space, and decrypting the encrypted file information by using the random number.
Further, determining a target folder in which data growth really occurs according to the file change condition of each folder in the nested structure includes:
selecting a time period to be analyzed, and putting a folder at the topmost layer in the disk at the tail of a candidate list to be analyzed;
selecting a target folder from the beginning of the candidate list, and calculating the total data volume of the target folder increased in the time period; calculating the data increment of each subfolder under the target folder in the time period;
if the proportion of the data increment of a subfolder in the increased total data volume reaches a specified proportion threshold, the subfolder is placed at the tail of the candidate list;
if the proportion of the data increment of any subfolder in the increased total data volume does not reach the specified proportion threshold, sorting the absolute values of the data increments of the subfolders;
if the proportion of the sum of the data increments of the sorted subfolders in the increased total data volume reaches the specified proportion threshold value, placing the subfolders at the tail of the candidate list;
if the ratio of the sum of the data increments of a plurality of sub-folders in the sorted result to the increased total data volume still does not reach the specified ratio threshold, adding the name, the path and the changed data volume of the target folder into a disk space report;
and judging whether the candidate list is empty or not, if so, displaying the disk space report to a user, and if not, continuously analyzing the next folder in the candidate list.
Further, the method further comprises:
calculating the data increment of each folder, and displaying the data increment of each folder to a user;
or
Displaying each folder in a file manager, wherein the file size of each folder is the data increment of each folder in a specified period.
In order to achieve the above object, the present application further provides a system for detecting a file occupied by a disk space, where the system includes:
the file information collection unit is used for collecting file information in current equipment, wherein the file information is obtained by analyzing a disk state snapshot and/or is obtained by analyzing the characteristics of a file system and/or is obtained by analyzing the characteristics of an operating system;
the file encryption and decryption unit is used for decrypting the encrypted file information into a plaintext if the file information is encrypted, and analyzing to obtain the file change condition in the current equipment within a specified time period;
and the nested structure analysis unit is used for identifying the nested structure of the folders in the current equipment, determining a target folder in which data growth really occurs according to the file change condition of each folder in the nested structure, and displaying the target folder to a user.
Further, the file encryption and decryption unit includes:
the encryption module is used for randomly generating a random number and writing the random number into a designated storage space; when file information is generated in the system, reading the random number from the designated storage space, and encrypting the generated file information by using the random number;
and the decryption module is used for reading the random number from the designated storage space and decrypting the encrypted file information by using the random number.
Further, the file information collecting unit includes:
the background monitoring module is used for starting a daemon process/background service/background process in the current equipment, recording the current file read-write operation through the daemon process/background service/background process when the file read-write operation occurs in a disk, and analyzing the file with data change in a specified time period through the recorded result;
a hook module, configured to set a hook function for a file operation API or a disk operation API in an operating system, and when the disk operation API is called in the file operation API, obtain, through the hook function, a call time, a changed file/disk address, and changed content, so as to analyze a file with data change in a specified time period according to obtained information;
and the change notification module is used for receiving the system notification transmitted by the file change notification function when the file change occurs in the operating system if a file change notification function is provided in the operating system, so as to analyze the file with data change in a specified time period according to the system notification.
Therefore, the file information in the current equipment can be efficiently collected in various modes. Specifically, a general method may be adopted to analyze the data change state of each disk by running a disk state snapshot. The log file system can be analyzed from the view point of the file system characteristics or the last modification time of the file is read, so that the disk with changed data can be obtained. In addition, from the aspect of operating system characteristics, the information of data change can be obtained in time by running a background process or setting a hook function and the like. In order to prevent the file information from being leaked, the file information can be encrypted in the current device, and subsequently, the collected encrypted file information can be decrypted through authority authentication, so that the plaintext of the file information is obtained. In order to determine the folder which really causes the data change, the nesting structure of the folder in the current equipment can be identified, and the folder which really causes the data increase is determined according to the file change condition of each file in the nesting structure, so that the accuracy of file detection can be improved.
Drawings
FIG. 1 is a diagram illustrating steps of a method for detecting a file occupied by disk space according to an embodiment of the present application;
FIG. 2 is a tree diagram illustrating a method for detecting a file occupied by disk space according to an embodiment of the present invention;
fig. 3 is a schematic functional block diagram of a system for detecting a file occupied by a disk space according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.
The present application provides a method for detecting a file occupied by a disk space, please refer to fig. 1 and fig. 2, where the method includes:
s1: collecting file information in current equipment, wherein the file information is obtained by analyzing a disk state snapshot, and/or the file information is obtained by analyzing the characteristics of a file system, and/or the file information is obtained by analyzing the characteristics of an operating system;
s2: if the file information is encrypted, after the encrypted file information is decrypted into a plaintext, analyzing to obtain the file change condition in the current equipment within a specified time period;
s3: identifying the nesting structure of the folders in the current equipment, determining a target folder in which data growth really occurs according to the file change condition of each folder in the nesting structure, and displaying the target folder to a user.
Specifically, the obtaining of the file information through the disk state snapshot analysis includes:
recording the sizes of all files and folders in the magnetic disk of the current equipment according to a specified time period, and obtaining the files or folders with data growth in a specified time period by comparing the sizes of the files and folders in the current magnetic disk with the sizes of the recorded files and folders.
In one embodiment, the file information is obtained based on a characteristic analysis of the file system, and the file information includes:
reading a record file from a log file system of the current equipment, and determining data change of the file in the disk in a specified time period by analyzing the record file;
or
Counting files created or modified in a specified time period, and analyzing the counted files with data change.
In one embodiment, the obtaining of the file information based on the feature analysis of the operating system includes:
starting a daemon process/background service/background process in the current equipment, recording the current file read-write operation through the daemon process/background service/background process when the file read-write operation occurs in a disk, and analyzing the file with data change in a specified time period through a recorded result;
or
Setting a hook function for a file operation API or a disk operation API in an operating system, and when the file operation API is called by the disk operation API, acquiring calling time, a changed file/disk address and changed contents through the hook function so as to analyze a file with data change in a specified time period according to acquired information;
or
If a file change notification function is provided in an operating system, when a file change occurs in the operating system, receiving a system notification transmitted by the file change notification function, and analyzing the file with data change in a specified time period according to the system notification.
In one embodiment, the file information is encrypted in the following manner:
randomly generating a random number, and writing the random number into a designated storage space; when file information is generated in the system, reading the random number from the designated storage space, and encrypting the generated file information by using the random number;
accordingly, decrypting the encrypted file information into plaintext includes:
and reading the random number from the designated storage space, and decrypting the encrypted file information by using the random number.
In one embodiment, determining the target folder in which data growth really occurs according to the file change condition of each folder in the nested structure comprises:
selecting a time period to be analyzed, and putting a folder at the topmost layer in the disk at the tail of a candidate list to be analyzed;
selecting a target folder from the beginning of the candidate list, and calculating the total data volume of the target folder increased in the time period; calculating the data increment of each subfolder under the target folder in the time period;
if the proportion of the data increment of a subfolder in the increased total data volume reaches a specified proportion threshold, the subfolder is placed at the tail of the candidate list;
if the proportion of the data increment of any subfolder in the increased total data volume does not reach the specified proportion threshold, sorting the absolute values of the data increments of the subfolders;
if the proportion of the sum of the data increments of the sorted subfolders in the increased total data volume reaches the specified proportion threshold value, placing the subfolders at the tail of the candidate list;
if the ratio of the sum of the data increments of a plurality of sub-folders in the sorted result to the increased total data volume still does not reach the specified ratio threshold, adding the name, the path and the changed data volume of the target folder into a disk space report;
and judging whether the candidate list is empty or not, if so, displaying the disk space report to a user, and if not, continuously analyzing the next folder in the candidate list.
In one embodiment, the method further comprises:
calculating the data increment of each folder, and displaying the data increment of each folder to a user;
or
Displaying each folder in a file manager, wherein the file size of each folder is the data increment of each folder in a specified period.
In a practical application scenario, considering that a file to be cleaned by a user is mainly a recently downloaded large file or a recently downloaded temporary file, if an old large file is still not deleted, it is mostly a file that the user cannot easily delete in practice. Therefore, when using the present technology, the software can list a report to the user, including but not limited to:
the fastest growing files/folders in the last 1 month;
newly created/downloaded large files within the last 1 month;
the disk space has decreased by 12% in the last 1 month, with 5% being due to a certain television show, 9% being due to a certain game, and-2% being due to a recently uninstalled software. In fact, a user of a certain series has finished watching, so that the user is reminded that the part of the space is available for release. In conventional cleaning, the series may not be located far forward.
Referring to fig. 2, when the technical solution of the present application is implemented, the incremental information of each file in a certain time of the disk is obtained by using other characteristics, which may include screening and counting files modified/added in a time range, description records of disk changes, and the like. This information can also be used to generate this disk space usage change report and is often more efficient and faster than the snapshot approach in the "generic approach" (so the generic approach has to be taken in the actual world). An example of this is as follows: windows itself is an "operating system supporting other features" in that it provides a FindFirstChangNotification function that allows Windows to notify a designated program when a file changes. By calling the function, the system can be directly informed when the file in the system changes, and the underlying API of the active hook operating system is not needed.
The hook technology is also called a hook function, and refers to that a designed hook program is firstly run by a system in a mode of replacing a function, an entry code and the like before the code to be run is executed originally, so that the hook program can capture the message firstly and further obtain the control right firstly. In this way, the operation effect of the code to be executed can be changed because the hook program can execute some operations before the code to be executed is executed. In this case, the hook function may process (change) the execution behavior of the function, or may forcibly end the message transfer.
The Hook technology is widely applied to a plurality of fields of safety, for example, the active defense function of antivirus software relates to monitoring of sensitive APIs, and the APIs need to be Hook, when a program calls the sensitive APIs, the antivirus software checks whether the program is suspicious, and can directly cut off suspicious access of the program according to conditions so as to protect the computer safety; the Trojan horse program can record the input of a user when the user presses the keyboard, and the Trojan horse program is used for storing the Trojan horse program information; for example, the Windows system and some application programs also need to use the Hook technology when patching. Therefore, if the hook is performed for all the file modification operations of the program, the records of all the file modification operations can be monitored and recorded.
There are many implementations of implementing Hook on file operations and vary with the operating system and the depth of the API to be held. In particular, different operating systems (such as Windows/MacOS/Linux/Android/iOS, etc.) provide different hook methods or hook-technology-like functions; for example, file operation and disk operation are two APIs with different depths, but reading and writing of most disk files call the APIs of file operation and disk operation in sequence, so that hook is performed on any one of the APIs of file operation and disk operation, but the implementation method naturally has some differences. The above operating system and hook-requiring API, when different, result in a variety of combinations:
(1) take the Windows operating system + hook file operating API as an example. The underlying APIs that are basically used in file operations in Windows are NtOpen, NtCreate, NtSetInformationFile, NtWrite, NtRead, and so on, so that the unhaok APIs, especially NtWrite and NtCreate, are functions related to file changes. One practical implementation method of Hook is (the method is not unique) to search the corresponding function name in the Detours ntdll and replace the function name with another self-defined function.
(2) Then, for example, the Windows operating system + hook disk operating API is taken as an example. The underlying APIs that are basically used in disk operations in Windows are ReadFile, WriteFile, etc., so that the unhaok API, especially WriteFile, is a function related to disk writing. One specific possible implementation method of Hook is as above.
When the hook function runs, it indicates that the operating system has changed files/disks, and at this time, the changed files/disk addresses, and the changed contents (mainly, the change of the file size) need to be saved, and the file change is collected by registering hook (actually, for the disk read-write API, the disk address is obtained during the operation of the hook program, and is not a corresponding file.
Meanwhile, the reason for the hook file operation API and the disk operation API is that some special programs do not need to call the file operation API and then let the operating system read and write the disk as in the conventional program when modifying the file, but directly call the disk read and write API (such as some file shredder tools) so that if only the hook file system is operated, some special programs may miss records for modifying the file. It is therefore desirable that both take hook, but that the file operation API dominates, and if a file modification has been captured by the file operation API, then the disk operations resulting from it are not recorded; if a file modification is found not to pass through the file operation API, its disk operation is recorded and the corresponding file needs to be checked back. For example, if a user has shredded a file using a shredder tool that does not pass through the file operation API, the operation may not call the file operation API, but instead call the disk operation API directly. At this time, it is necessary to find out which file corresponds to the address by the disk address and record the file. Therefore, reading and writing of all files cannot be omitted, and accuracy of the report is improved.
In the present application, each time a file in a disk is changed, a record is generated. This record includes the name of the file that changed, the path of the file, the time of modification, the content of the change, etc., and is also saved to disk in the form of a file. If the file can be accessed by others, the information can be restored by analyzing the log content (for example, the user saves 'XXXX' in 2019, 11, 7, 01:52, and the file size is increased by 23k), which has certain privacy hidden danger. Therefore, it is ensured that the file storing these "records" is not kept in the clear, so that even if someone takes this file, the content cannot be restored by a simple method. The solution is mainly to encrypt these records. The encryption process involves the selection of an encryption algorithm, and the selection of a key. Different encryption algorithms and encryption keys will combine different solutions.
Take symmetric encryption algorithm + fixed key as an example: when the space change detection program runs for the first time, a complex random number is randomly generated, the random number is stored in a DPAPI (the DPAPI is a special safe space provided by Windows, and data written by a specific program can only be accessed by the specific program; besides the DPAPI, a safe server can also be provided by a program operation company as the special safe space), each time before a newly added record is stored in a file, the random number is read from the DPAPI, and the records are encrypted by an AES algorithm by using the random number as a key and then are written into a disk. When the program needs to display the record, the key is taken out by the same method, and the program is displayed after being decrypted. Thus, even if this recorded file is retrieved by another user (e.g., a visitor or hacker), he still cannot get the true content of the file. When he tries to read that random number from the DPAPI, it is rejected by the system. Thereby ensuring the privacy security of the user.
In practical applications, folders have a nested relationship, which can present problems when generating reports. For example, after a game CSGO updates a game file, the Resource folder is added with 5G files, and the Audio folder is added with 2G files, which results in the volume increase of some folders (such as D:/, D:/MyFiles/Games/CSGO/Resources, D:/MyFiles/Games/CSGO/Audio). In this example, the available space of the D-disc is reduced by 7G. This is due to the increase in resources and Audio, but in general this is due to the increase in CSGO folders. Again, this is not meaningful as the result of the increase in the Games folder, or MyFiles growth. In this example, it is desirable to let the user know that the reduction in available space for D-discs is mainly due to the increase in the CSGO folder, and more specifically, the increase in the Resource and Audio folders under CSGO.
It is therefore desirable to know to find a truly growing folder from nested folders, rather than simply reporting to the user that "myFiles" is increased by 7G. One algorithm to achieve this function is as follows:
1. the user selects the time interval to be analyzed (last month, etc.)
2. Placing the topmost folder in the disk into the end of the candidate list of the target folder to be analyzed
3. Taking out a folder from the head of the candidate list
4. Calculate the volume of increase of the folder in the time S0 (which may be positive or negative)
5. Calculate the volume Sx (which may be positive or negative) that all subfolders under the folder increase over the period of time
6. If: s0 × 90% < increment Sk < S0 × 100% with one subfolder k
7. The growth of the folder is approximately considered to be entirely caused by the deltask of the subfolder S0. This growing subfolder needs to be further analyzed. This sub-folder is placed at the end of the candidate list.
8. Otherwise: ordering | Sx | (absolute value of Sx)
a) If: sum of Sx of the first 5 Sx ∑ Sx > S0%
b) The growth of the folder may be considered to be indirectly caused by the growth of some folder therein, so that the folder is considered not to be really grown, and the several grown subfolders need to be further analyzed, like "MyFiles" above. So, the first 5 folders are placed at the end of the candidate list.
c) Otherwise: it is believed that the growth of the folder is not caused by some of the folders, which may be caused by too many subfolders growing separately, perhaps due to the files therein being enlarged. In this case, consider that a "truly growing folder" is found, and add the name, path, and increased volume of the folder to the disk space report.
9. If the candidate list is not empty, go to 3; otherwise, the disk space report is displayed on the screen.
Another possible solution is to calculate the deltas for all folders and display them directly to the user using data visualization techniques (e.g., tree/histogram, etc.). A reference display method is shown in APP 'DiskUage' on an Android platform, and provides a display method for absolute values of occupied spaces of files (folders) in a disk, but the method provided by the invention can be used for displaying the increment of the occupied spaces of the files (folders). The significance of the increment compared to the absolute value has been mentioned above and will not be described further here.
Another possible solution is to display the folders as in a conventional file manager, but with the difference that the "file size" column of folders/files shows no more the actual size of the file, but rather the increment of the file over a period of time.
Referring to fig. 3, the present application further provides a system for detecting a file occupied by a disk space, where the system includes:
the file information collection unit is used for collecting file information in current equipment, wherein the file information is obtained by analyzing a disk state snapshot and/or is obtained by analyzing the characteristics of a file system and/or is obtained by analyzing the characteristics of an operating system;
the file encryption and decryption unit is used for decrypting the encrypted file information into a plaintext if the file information is encrypted, and analyzing to obtain the file change condition in the current equipment within a specified time period;
and the nested structure analysis unit is used for identifying the nested structure of the folders in the current equipment, determining a target folder in which data growth really occurs according to the file change condition of each folder in the nested structure, and displaying the target folder to a user.
In one embodiment, the file encryption and decryption unit includes:
the encryption module is used for randomly generating a random number and writing the random number into a designated storage space; when file information is generated in the system, reading the random number from the designated storage space, and encrypting the generated file information by using the random number;
and the decryption module is used for reading the random number from the designated storage space and decrypting the encrypted file information by using the random number.
In one embodiment, the file information collecting unit includes:
the background monitoring module is used for starting a daemon process/background service/background process in the current equipment, recording the current file read-write operation through the daemon process/background service/background process when the file read-write operation occurs in a disk, and analyzing the file with data change in a specified time period through the recorded result;
a hook module, configured to set a hook function for a file operation API or a disk operation API in an operating system, and when the disk operation API is called in the file operation API, obtain, through the hook function, a call time, a changed file/disk address, and changed content, so as to analyze a file with data change in a specified time period according to obtained information;
and the change notification module is used for receiving the system notification transmitted by the file change notification function when the file change occurs in the operating system if a file change notification function is provided in the operating system, so as to analyze the file with data change in a specified time period according to the system notification.
Therefore, the file information in the current equipment can be efficiently collected in various modes. Specifically, a general method may be adopted to analyze the data change state of each disk by running a disk state snapshot. The log file system can be analyzed from the view point of the file system characteristics or the last modification time of the file is read, so that the disk with changed data can be obtained. In addition, from the aspect of operating system characteristics, the information of data change can be obtained in time by running a background process or setting a hook function and the like. In order to prevent the file information from being leaked, the file information can be encrypted in the current device, and subsequently, the collected encrypted file information can be decrypted through authority authentication, so that the plaintext of the file information is obtained. In order to determine the folder which really causes the data change, the nesting structure of the folder in the current equipment can be identified, and the folder which really causes the data increase is determined according to the file change condition of each file in the nesting structure, so that the accuracy of file detection can be improved.
The foregoing description of various embodiments of the present application is provided for the purpose of illustration to those skilled in the art. It is not intended to be exhaustive or to limit the invention to a single disclosed embodiment. As described above, various alternatives and modifications of the present application will be apparent to those skilled in the art to which the above-described technology pertains. Thus, while some alternative embodiments have been discussed in detail, other embodiments will be apparent or relatively easy to derive by those of ordinary skill in the art. This application is intended to cover all alternatives, modifications, and variations of the invention that have been discussed herein, as well as other embodiments that fall within the spirit and scope of the above-described application.

Claims (10)

1. A method for detecting a file occupied by a disk space is characterized by comprising the following steps:
collecting file information in current equipment, wherein the file information is obtained by analyzing a disk state snapshot, and/or the file information is obtained by analyzing the characteristics of a file system, and/or the file information is obtained by analyzing the characteristics of an operating system;
if the file information is encrypted, after the encrypted file information is decrypted into a plaintext, analyzing to obtain the file change condition in the current equipment within a specified time period;
identifying the nesting structure of the folders in the current equipment, determining a target folder in which data growth really occurs according to the file change condition of each folder in the nesting structure, and displaying the target folder to a user.
2. The method of claim 1, wherein the file information obtained by the disk state snapshot analysis comprises:
recording the sizes of all files and folders in the magnetic disk of the current equipment according to a specified time period, and obtaining the files or folders with data growth in a specified time period by comparing the sizes of the files and folders in the current magnetic disk with the sizes of the recorded files and folders.
3. The method of claim 1, wherein the file information is derived based on a characteristic analysis of a file system comprising:
reading a record file from a log file system of the current equipment, and determining data change of the file in the disk in a specified time period by analyzing the record file;
or
Counting files created or modified in a specified time period, and analyzing the counted files with data change.
4. The method of claim 1, wherein the obtaining of the file information based on a feature analysis of an operating system comprises:
starting a daemon process/background service/background process in the current equipment, recording the current file read-write operation through the daemon process/background service/background process when the file read-write operation occurs in a disk, and analyzing the file with data change in a specified time period through a recorded result; or
Setting a hook function for a file operation API or a disk operation API in an operating system, and when the file operation API is called by the disk operation API, acquiring calling time, a changed file/disk address and changed contents through the hook function so as to analyze a file with data change in a specified time period according to acquired information;
or
If a file change notification function is provided in an operating system, when a file change occurs in the operating system, receiving a system notification transmitted by the file change notification function, and analyzing the file with data change in a specified time period according to the system notification.
5. The method according to claim 1, wherein the file information is encrypted in the following manner:
randomly generating a random number, and writing the random number into a designated storage space; when file information is generated in the system, reading the random number from the designated storage space, and encrypting the generated file information by using the random number;
accordingly, decrypting the encrypted file information into plaintext includes:
and reading the random number from the designated storage space, and decrypting the encrypted file information by using the random number.
6. The method of claim 1, wherein determining the target folder in which data growth really occurs according to the file change condition of each folder in the nested structure comprises:
selecting a time period to be analyzed, and putting a folder at the topmost layer in the disk at the tail of a candidate list to be analyzed;
selecting a target folder from the beginning of the candidate list, and calculating the total data volume of the target folder increased in the time period; calculating the data increment of each subfolder under the target folder in the time period;
if the proportion of the data increment of a subfolder in the increased total data volume reaches a specified proportion threshold, the subfolder is placed at the tail of the candidate list;
if the proportion of the data increment of any subfolder in the increased total data volume does not reach the specified proportion threshold, sorting the absolute values of the data increments of the subfolders;
if the proportion of the sum of the data increments of the sorted subfolders in the increased total data volume reaches the specified proportion threshold value, placing the subfolders at the tail of the candidate list;
if the ratio of the sum of the data increments of a plurality of sub-folders in the sorted result to the increased total data volume still does not reach the specified ratio threshold, adding the name, the path and the changed data volume of the target folder into a disk space report;
and judging whether the candidate list is empty or not, if so, displaying the disk space report to a user, and if not, continuously analyzing the next folder in the candidate list.
7. The method of claim 1, further comprising:
calculating the data increment of each folder, and displaying the data increment of each folder to a user;
or
Displaying each folder in a file manager, wherein the file size of each folder is the data increment of each folder in a specified period.
8. A system for detecting a file occupied by disk space, the system comprising:
the file information collection unit is used for collecting file information in current equipment, wherein the file information is obtained by analyzing a disk state snapshot and/or is obtained by analyzing the characteristics of a file system and/or is obtained by analyzing the characteristics of an operating system;
the file encryption and decryption unit is used for decrypting the encrypted file information into a plaintext if the file information is encrypted, and analyzing to obtain the file change condition in the current equipment within a specified time period;
and the nested structure analysis unit is used for identifying the nested structure of the folders in the current equipment, determining a target folder in which data growth really occurs according to the file change condition of each folder in the nested structure, and displaying the target folder to a user.
9. The system according to claim 8, wherein the file encryption/decryption unit comprises:
the encryption module is used for randomly generating a random number and writing the random number into a designated storage space; when file information is generated in the system, reading the random number from the designated storage space, and encrypting the generated file information by using the random number;
and the decryption module is used for reading the random number from the designated storage space and decrypting the encrypted file information by using the random number.
10. The system according to claim 8, wherein the file information collecting unit includes:
the background monitoring module is used for starting a daemon process/background service/background process in the current equipment, recording the current file read-write operation through the daemon process/background service/background process when the file read-write operation occurs in a disk, and analyzing the file with data change in a specified time period through the recorded result;
a hook module, configured to set a hook function for a file operation API or a disk operation API in an operating system, and when the disk operation API is called in the file operation API, obtain, through the hook function, a call time, a changed file/disk address, and changed content, so as to analyze a file with data change in a specified time period according to obtained information;
and the change notification module is used for receiving the system notification transmitted by the file change notification function when the file change occurs in the operating system if a file change notification function is provided in the operating system, so as to analyze the file with data change in a specified time period according to the system notification.
CN201911176281.5A 2019-11-26 2019-11-26 Method and system for detecting file occupied by disk space Pending CN111159109A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911176281.5A CN111159109A (en) 2019-11-26 2019-11-26 Method and system for detecting file occupied by disk space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911176281.5A CN111159109A (en) 2019-11-26 2019-11-26 Method and system for detecting file occupied by disk space

Publications (1)

Publication Number Publication Date
CN111159109A true CN111159109A (en) 2020-05-15

Family

ID=70556160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911176281.5A Pending CN111159109A (en) 2019-11-26 2019-11-26 Method and system for detecting file occupied by disk space

Country Status (1)

Country Link
CN (1) CN111159109A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688000A (en) * 2021-07-22 2021-11-23 成都鲁易科技有限公司 Method and device for displaying use information of magnetic disk, storage medium and computer equipment
CN115309702A (en) * 2022-10-09 2022-11-08 中孚信息股份有限公司 File retrieval management method, device, system and equipment based on USN log

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2130147A2 (en) * 2007-02-13 2009-12-09 STG Interactive File management method
CN104714864A (en) * 2015-03-20 2015-06-17 成都云祺科技有限公司 Intelligent computer data backup method
CN106815126A (en) * 2015-11-30 2017-06-09 南京壹进制信息技术股份有限公司 A kind of universal document system log recording method and device
CN110018989A (en) * 2017-11-13 2019-07-16 华为技术有限公司 A kind of method and apparatus that snapshot compares

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2130147A2 (en) * 2007-02-13 2009-12-09 STG Interactive File management method
CN104714864A (en) * 2015-03-20 2015-06-17 成都云祺科技有限公司 Intelligent computer data backup method
CN106815126A (en) * 2015-11-30 2017-06-09 南京壹进制信息技术股份有限公司 A kind of universal document system log recording method and device
CN110018989A (en) * 2017-11-13 2019-07-16 华为技术有限公司 A kind of method and apparatus that snapshot compares

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688000A (en) * 2021-07-22 2021-11-23 成都鲁易科技有限公司 Method and device for displaying use information of magnetic disk, storage medium and computer equipment
CN115309702A (en) * 2022-10-09 2022-11-08 中孚信息股份有限公司 File retrieval management method, device, system and equipment based on USN log
CN115309702B (en) * 2022-10-09 2023-03-24 中孚信息股份有限公司 File retrieval management method, device, system and equipment based on USN log

Similar Documents

Publication Publication Date Title
Chen et al. Uncovering the face of android ransomware: Characterization and real-time detection
US11204997B2 (en) Retention and accessibility of data characterizing events on an endpoint computer
US10387648B2 (en) Ransomware key extractor and recovery system
US8484737B1 (en) Techniques for processing backup data for identifying and handling content
US11182478B2 (en) Systems and methods for tracking and recording events in a network of computing systems
Sindhu et al. Digital forensics and cyber crime datamining
KR101410442B1 (en) A Digital Forensic Audit System for Analyzing User’s Behaviors
CN110851833A (en) Lesovirus detection method, device and related equipment
CN111400714B (en) Virus detection method, device, equipment and storage medium
CN106548070A (en) A kind of method and system that blackmailer&#39;s virus is defendd in stand-by time
CN111159109A (en) Method and system for detecting file occupied by disk space
CN106844185A (en) The storage method and device of a kind of journal file
Prasanthi et al. Cyber forensic science to diagnose digital crimes-a study
Dweikat et al. Digital Forensic Tools Used in Analyzing Cybercrime
CN114780922A (en) Method and device for identifying lasso software, electronic equipment and storage medium
US20140208427A1 (en) Apparatus and methods for detecting data access
Dhaka et al. CRIB: Cyber crime investigation, data archival and analysis using big data tool
ALJAHDALI et al. Mobile device forensics.
Verma et al. Preserving dates and timestamps for incident handling in android smartphones
Duan et al. Research on computer forensics technology based on data recovery
Şentürk et al. Image and file system support framework for a digital mobile forensics software
JP4690226B2 (en) Information processing apparatus, confidential data monitoring method and program
CN110149308B (en) External equipment management method, device and system based on network database
JP4972074B2 (en) File management system
Kayabaş et al. Cyber Wars and Cyber Threats Against Mobile Devices: Analysis of Mobile Devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination