CN113704176B - File scanning method, device, electronic equipment and storage medium - Google Patents

File scanning method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113704176B
CN113704176B CN202110778728.7A CN202110778728A CN113704176B CN 113704176 B CN113704176 B CN 113704176B CN 202110778728 A CN202110778728 A CN 202110778728A CN 113704176 B CN113704176 B CN 113704176B
Authority
CN
China
Prior art keywords
file
scanning
file block
information
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110778728.7A
Other languages
Chinese (zh)
Other versions
CN113704176A (en
Inventor
刘锦锋
师庆志
周飘龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxin Technology Group Co Ltd, Secworld Information Technology Beijing Co Ltd filed Critical Qianxin Technology Group Co Ltd
Priority to CN202110778728.7A priority Critical patent/CN113704176B/en
Publication of CN113704176A publication Critical patent/CN113704176A/en
Application granted granted Critical
Publication of CN113704176B publication Critical patent/CN113704176B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a file scanning method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: after receiving a triggering message of file scanning, scanning a file block corresponding to a file to be scanned for the first time, and determining that a changed file block exists; and scanning the file block with the change for the second time to determine the file with the change. The method provided by the invention can reduce the scanning times, accurately determine the changed files or catalogues, and improve the efficiency of file detection.

Description

File scanning method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of information technologies, and in particular, to a method and apparatus for scanning a file, an electronic device, and a storage medium.
Background
In the service transmission process based on the gatekeeper platform, in order to ensure the accuracy of the transmitted file, the attribute such as creation, addition, deletion or modification of the file directory or the file needs to be monitored in real time, and operations such as reading, writing, deleting, renaming and the like are performed according to the change of the attribute.
In the prior art, the change of the file directory and the file itself is scanned and monitored by a timing polling mode. The method is characterized in that the file catalogue and the file information are recorded in a list, the last scanning result and the list information are compared after each polling scanning, the attribute change of the catalogue or the file is screened out in a one-to-one comparison mode, corresponding processing is carried out, and the processing mode occupies a large amount of memory and time, so that the detection efficiency is low.
Disclosure of Invention
The invention provides a file scanning method, a device, electronic equipment and a storage medium, which are used for solving the technical problem that the detection efficiency is low due to the adoption of a mode of timing polling scanning and one-to-one comparison of scanning results and list information in the prior art, so as to achieve the purpose of improving the detection efficiency while ensuring the accuracy of transmitted files.
In a first aspect, the present invention provides a document scanning method, including:
after receiving a triggering message of file scanning, scanning a file block corresponding to a file to be scanned for the first time, and determining that a changed file block exists;
and scanning the file block with the change for the second time to determine the file with the change.
According to the method for scanning the file provided by the invention, the file block corresponding to the file to be scanned is scanned for the first time, and the file block with the change is determined, which comprises the following steps:
acquiring directory information of the first file block and/or information of each file contained in the directory information; wherein the first file block is any one of file blocks corresponding to the file to be scanned;
calculating first abstract data of the first file block according to the directory information of the first file block and/or the information of each file contained in the first file block;
determining whether the first file block is a file block with variation according to the first abstract data of the first file block and the second abstract data of the first file block; the second summary data of the first file block is scanned or pre-stored under the condition that the first file block is unchanged.
According to the method for scanning files provided by the invention, the first summary data of the first file block is calculated according to the directory information of the first file block and/or the information of each contained file, and the method comprises the following steps:
and calculating first abstract data of the first file block according to the directory information of the first file block and/or the names, the sizes and the modification time of all the contained files through a fuzzy hash algorithm or cyclic redundancy check.
According to the method for scanning the file provided by the invention, the second scanning is performed on the file block with the change, and the file with the change is determined, which comprises the following steps:
receiving identification information of a second file block; wherein the second file block is a file block with a change determined by the first scanning;
acquiring directory information of the second file block and/or information of each file contained in the directory information according to the identification information of the second file block;
determining that a changed file exists in the second file block according to the directory information of the second file block and/or the information of each file contained in the second file block and the first directory information of the second file block and/or the first information of each file contained in the second file block; wherein the first directory information of the second file block and/or the first information of each file contained in the second file block is information stored before the current file scanning operation.
According to the method for scanning the file provided by the invention, the first scanning of the file block corresponding to the file to be scanned comprises the following steps: carrying out first scanning on a plurality of file blocks corresponding to the file to be scanned in parallel;
correspondingly, the receiving the identification information of the second file block includes:
and respectively receiving the identification information of the second file block from the results of the plurality of parallel first scanning by means of a message queue.
According to the file scanning method provided by the invention, the first directory information of the second file block and/or the first information of each contained file is stored in a red-black tree mode.
According to the file scanning method provided by the invention, the triggering message of the file scanning is generated under the condition that the file to be scanned is newly added or modified or deleted or the directory information of the file block corresponding to the file to be scanned is newly added or modified or deleted.
In a second aspect, the present invention further provides a document scanning apparatus, including:
the first scanning module is used for carrying out first scanning on the file block corresponding to the file to be scanned after receiving the triggering message of file scanning, and determining that the file block has variation;
and the second scanning module is used for scanning the file block with the change for the second time and determining the file with the change.
In a third aspect, the present invention provides an electronic device comprising:
a processor, a memory, and a bus, wherein,
the processor and the memory complete communication with each other through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform a method as described in any of the above.
In a fourth aspect, the invention also provides a computer program product comprising computer executable instructions, characterized in that the instructions, when executed, are for implementing the steps of the file filtering method as described in any of the preceding claims.
In a fifth aspect, the present invention also provides a non-transitory computer readable storage medium storing computer instructions that cause the computer to perform a method as described in any of the above.
The invention provides a file scanning method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: after receiving a triggering message of file scanning, scanning a file block corresponding to a file to be scanned for the first time, and determining that a changed file block exists; and scanning the file block with the change for the second time to determine the file with the change. The file scanning method provided by the invention realizes the first scanning and the second scanning of the file in a message triggering mode, can accurately determine the changed catalogue and the changed file, improves the detection efficiency of the file and improves the user experience.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a document scanning method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a file directory scanning method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a second scanning process according to an embodiment of the present invention;
FIG. 4 is a flow chart of data communication during multi-directory scanning according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an application deployment architecture diagram provided by the present invention;
FIG. 6 is a schematic diagram of a document scanning device according to the present invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a document scanning method provided by the invention. As shown in fig. 1, the file filtering method provided by the invention comprises the following steps:
step 101: after receiving a triggering message of file scanning, scanning a file block corresponding to a file to be scanned for the first time, and determining that a changed file block exists;
step 102: and scanning the file block with the change for the second time to determine the file with the change.
Specifically, the trigger message is a signal message sent according to the occurred trigger event, and is used for notifying the scanning component to implement the scanning operation.
In the embodiment of the invention, the scanning assembly is triggered by an event triggering mode, the corresponding file block in the file to be scanned is scanned for the first time, the changed file block is determined, and then the scanning assembly is used for scanning the file block determined to be changed for the second time, so that the changed file is accurately screened. It should be noted that, in this embodiment, an event triggering manner is adopted, when the file information or the file block information changes, the file to be scanned is scanned, and if the file or the file block does not change, the scanning component is not triggered to perform the scanning operation.
In step 101, when a trigger message is received, a first scanning is performed on a file block corresponding to a file to be scanned, so as to determine that a changed file block exists, where the file block may be a folder or other manner of storing a plurality of files. It should be noted that, the change of the file block may be directory addition, directory modification or directory deletion of the file block, or may be addition, modification or deletion of a file in the file block.
In step 102, the second scanning only needs to scan the file block that has been determined to be changed in the first scanning, and the file that has been changed is determined, and it should be noted that the change of the file may be addition, modification, deletion, or the like of the file. The setting may be specifically performed according to the needs of the user, and is not particularly limited herein.
In the embodiment of the invention, the scanning component is triggered to perform the first scanning treatment on the file block corresponding to the file to be scanned in an event triggering mode to determine that the file block with the change exists, and then performs the second scanning on the file block with the change to determine that the file with the change exists. The method provided by the invention can reduce the scanning times, more quickly and accurately determine the changed file blocks or files, and improve the efficiency of file detection.
In another embodiment of the present invention, the first scanning of the file block corresponding to the file to be scanned, determining that there is a changed file block, includes:
acquiring directory information of the first file block and/or information of each file contained in the directory information; wherein the first file block is any one of file blocks corresponding to the file to be scanned;
calculating first abstract data of the first file block according to the directory information of the first file block and/or the information of each file contained in the first file block;
determining whether the first file block is a file block with variation according to the first abstract data of the first file block and the second abstract data of the first file block; the second summary data of the first file block is scanned or pre-stored under the condition that the first file block is unchanged.
Specifically, the file to be scanned may include a plurality of file blocks, and the first file block may be any one of them. The information of each file comprises information such as file name, size, modification time and the like of each file.
In the embodiment of the invention, according to the acquired directory information of the first file block and/or the information of each file contained in the directory information, first abstract data of the first file block is calculated, the first abstract data is compared with second abstract data, and if the first abstract data is not equal to the second abstract data, the first file block is determined to be the file block with variation. In the embodiment of the invention, the first digest data and the second digest data may be hash value data obtained by calculation.
It should be noted that, the second summary data is obtained by scanning the first file block without change, for example, the second summary data may be obtained when the first file block is scanned last time, or may be obtained by any previous scanning; the second summary data can be obtained in the first scanning after the file block is generated and used as standard data for verifying the first summary data; in other embodiments, the second summary data may also be pre-stored standard data, such as pre-calculated using a hash algorithm.
For example, as shown in fig. 2, in the service running process, if the designated a folder is the first file block, the scanning component xscaner starts the first scanning to traverse the directory information of the a folder, stores the directory information and calculates the hash value of the a folder, and if no file exists in the a folder, determines that the hash value is the initial value;
A/B, A/B/C, A/D is established, the files are blocked in the form of folders to form file blocks, the A folder is scanned once, hash values of the A folder are calculated according to the directory information of the A folder and the information of all files under the directory, and the hash values are stored and recorded in a list to be used as second abstract data for data basis in comparison of the follow-up scanning results each time.
When the directory information of the A folder or the internal files are changed, trigger information is generated, the A folder is scanned for the first time according to the trigger information to obtain the directory information of the A folder and/or the file information under the directory, the hash value of the A folder is recalculated to obtain first abstract data, the first abstract data and the second abstract data are compared and analyzed, when the first abstract data and the second abstract data are unequal, the A folder is determined to be a file block with the change, the change is generated to generate a corresponding event, and the corresponding event is pushed to other processing modules for processing through a callback function.
In the embodiment of the invention, the first abstract data of the first file block is obtained through calculation according to the directory information of the first file block and/or the information of each file under the directory, and the first abstract data is compared with the second abstract data to determine whether the first file block is a changed file block or not, so that the cost of using a memory can be reduced, the scanning efficiency is improved, and the sensing range is compressed.
In another embodiment of the present invention, the calculating the first summary data of the first file block according to the directory information of the first file block and/or the information of each included file includes:
and calculating first abstract data of the first file block according to the directory information of the first file block and/or the names, the sizes and the modification time of all the contained files through a fuzzy hash algorithm or cyclic redundancy check.
Specifically, the fuzzy hash algorithm is called a content segmentation-based shard hash algorithm (Context Triggered Piecewise Hashing, CTPH for short), and the similarity of two fuzzy hash values is obtained through a character string similarity comparison algorithm to judge the similarity degree of two files.
The cyclic redundancy check (Cyclic Redundancy Check, abbreviated as CRC) is a channel coding technique for generating a short fixed-bit check code according to data such as a network data packet or a computer file, and is mainly used for detecting or checking errors possibly occurring after data transmission or storage.
In the embodiment of the invention, the first summary data of the first file block is calculated based on a fuzzy hash algorithm or cyclic redundancy check according to the directory information of the first file block and/or the names, the sizes and the modification time of the contained files. It should be noted that, the first digest data may be a hash value, or may be a check code obtained by cyclic redundancy check.
If the first file block is the A folder, scanning the A folder for the first time to obtain directory information and contained file information of the A folder, storing information such as file name, size, modification time and the like of each file in the A folder into a buffer, and calculating a hash value of the A folder according to the stored file information, namely obtaining first abstract data of the A folder.
In the embodiment of the invention, the first abstract data of the first file block is calculated based on a fuzzy hash algorithm or a cyclic redundancy check mode, so that the blocking processing of the file to be scanned is realized, a large amount of memory is released, the changed folder can be positioned more quickly and accurately, and the detection efficiency is improved.
In another embodiment of the present invention, the second scanning the file block with the change, determining that the file with the change exists, includes:
receiving identification information of a second file block; wherein the second file block is a file block with a change determined by the first scanning;
acquiring directory information of the second file block and/or information of each file contained in the directory information according to the identification information of the second file block;
determining that a changed file exists in the second file block according to the directory information of the second file block and/or the information of each file contained in the second file block and the first directory information of the second file block and/or the first information of each file contained in the second file block; wherein the first directory information of the second file block and/or the first information of each file contained in the second file block is information stored before the current file scanning operation.
Specifically, the identification information refers to information such as the name of the file block which changes after the first scanning is finished, and the file block which needs to be scanned for the second time can be accurately determined according to the identification information.
In the embodiment of the invention, a second file block needing to be scanned for the second time is determined according to the identification information, the second file block is scanned for the second time, directory information of the second file block and/or all file information contained in the directory information are obtained, and the directory information is compared with first directory information of the second file block and/or first information of all files contained in the first directory information of the second file block obtained by the last scanning, so that files with changes are screened. It should be noted that the file information is detailed information of all files under the directory, such as file name, file size, modification time, and the like.
For example, as shown in FIG. 3, after the first scan of the A folder is completed, the A folder is determined to be a file block in which a change exists and sent to the scan component that performs the second scan. The scanning assembly of the second scanning receives the instruction and the corresponding identification information, performs the second scanning processing on the A folder to obtain the directory information of the A folder and the information of each file contained in the directory information, compares the current scanning result with the last scanning result of the A folder, namely, screens according to the directory information and the detailed information of each file to determine the file with the change, and then generates the corresponding event according to the change to provide the event for other processing modules for processing.
In the embodiment of the invention, the directory information and/or the information of each file contained in the second file block are obtained by scanning the second file block, and the result obtained by the current scanning is compared with the record information stored in the last scanning to screen the file with the change. According to the invention, only the file blocks with changes are scanned for the second time, so that the changed files are accurately determined, and the scanning efficiency is improved.
In another embodiment of the present invention, the first scanning of the file block corresponding to the file to be scanned includes: carrying out first scanning on a plurality of file blocks corresponding to the file to be scanned in parallel;
correspondingly, the receiving the identification information of the second file block includes:
and respectively receiving the identification information of the second file block from the results of the plurality of parallel first scanning by means of a message queue.
Specifically, a message queue is a container that holds messages during their transmission.
In the embodiment of the invention, the scanning assembly performs first scanning on a plurality of file blocks corresponding to the file to be scanned, and pushes scanning results obtained by a plurality of parallel scanning to the scanning assembly performing second scanning in a message queue mode. It should be noted that, the scanning component adopts XFPI as a file monitoring manner, and can support file transfer protocol, software for implementing SMB protocol, network file system, VSFTP service and local directory, and in addition, adopts a uniform resource location system to register the monitoring directory. The following detailed description will be given by way of specific examples.
For example, as shown in fig. 4, when a plurality of file blocks are simultaneously scanned for the first time, the identification information of the file block which is determined to be changed by the first time scanning is pushed to the scanning component for the second time scanning by means of a message queue, and the message queue is mainly used as a hub. The method comprises the steps of firstly determining changed file block information, then inquiring and comparing directory file information obtained by the second scanning with file information stored by the last second scanning according to the directory file information obtained by the second scanning, and determining a changed file, wherein the file block information which is transmitted by the message queue and needs to be scanned for the second time is the left file block information which is scanned for the first time and comprises a plurality of directory information and directory hash values obtained by monitoring a plurality of file blocks to be scanned in real time. The monitoring catalogue of each file block has a storage list, and takes a catalogue number (dirid) as an identification, so that 63 catalogues can be monitored simultaneously and scanned concurrently at maximum.
In this embodiment, the identification information of the changed file block determined in the first scan is received through the message queue, so that the mode of multithreading and multiprocessing can be completely covered, and the transmission efficiency is higher.
In another embodiment of the present invention, the first directory information of the second file block and/or the first information of each file included is stored in a red-black tree manner.
Specifically, red Black Tree (Red Black Tree) is a self-balancing binary Tree, a data structure used in computer science.
In the embodiment of the invention, the first catalog information obtained by each scanning and the first information of each file are stored in a red-black tree model, and the detailed information of all the files under the monitoring catalog of each file block, such as file names, file sizes, modification time and the like, is stored to provide data support for the next file scanning.
In another embodiment of the present invention, the triggering message of the file scanning is generated under the condition that the file to be scanned is newly added or modified or deleted, or under the condition that the directory information of the file block corresponding to the file to be scanned is newly added or modified or deleted.
In the embodiment of the invention, the triggering message for triggering the file scanning operation is generated when the directory information of the file to be scanned or the file block corresponding to the file to be scanned is changed, wherein the change can be the condition that the directory information is newly added, deleted or modified, or the condition that the file is newly added, deleted or modified, for example, the scanning assembly is triggered to perform the first scanning operation after the file or the directory is newly added. The changes to be made are not limited to the addition, modification, or deletion, but may be other changes, and are not particularly limited herein.
In the embodiment of the invention, the first scanning operation is triggered by setting the trigger message generated by the file or directory information to be scanned under the condition of newly adding, modifying or deleting, so that the scanning period is reduced, and the working efficiency is improved.
In another embodiment of the present invention, there is provided an application deployment architecture diagram, as shown in fig. 5, including: a business logic layer, a scanning component layer, a distribution component layer, a transmission component layer and a file service layer, wherein,
the business logic layer: the method is used for embedded file transmission service, client file transmission service, server file transmission service and the like, and has the functions of carrying out service configuration, registering callback functions, integrally processing data and the like on the scanning assembly, the distribution assembly and the transmission assembly; pushing the file processing event acquired by the scanning component to the distributing component, and distributing the file processing event to the transmission component by the distributing component to realize the file synchronization logic.
Distribution component layer: the system can be composed of message queues, provides a solution for multiplexing, realizes multiplexing, fully utilizes network bandwidth, and improves file transmission efficiency.
Transmission component layer: the file transfer function is realized.
Scanning component layer: the method can realize the first scanning of the file block corresponding to the file to be scanned, and the second scanning of the file block with the change is used for monitoring and capturing events such as adding, deleting and the like of the file directory on the local directory or the remote NAS server, and timely, efficiently and accurately supplying the file change to the transmission assembly for business processing operation.
File service layer: and providing a set of function interfaces to realize operations such as read-write deletion and the like of the local file and the remote NAS service file.
In the embodiment of the invention, the document scanning detection work can be better realized by providing the application deployment architecture diagram.
In another embodiment of the present invention, the dual-threaded mode of operation is divided into a first scan and a second scan, wherein,
first scanning: the method is mainly used for calculating hash values of files to be scanned, namely hash values of a root directory and a subdirectory, storing the hash values in a hash table, and monitoring changes of the directory and the subdirectory.
Second scan: and storing detailed information of all files under the monitoring directory, such as file names, file sizes, modification time and the like, by adopting a red-black tree data model, and accurately searching the changed files for subsequent scanning, comparison and analysis.
Fig. 6 is a diagram of a document scanning device according to the present invention, as shown in fig. 6, the document filtering device according to the present invention includes:
the first scanning module 601 is configured to, after receiving a triggering message of file scanning, perform a first scanning on a file block corresponding to a file to be scanned, and determine that a changed file block exists;
and a second scanning module 602, configured to perform a second scanning on the file block with the change, and determine that the file with the change exists.
The file scanning device provided by the embodiment of the invention is characterized in that the first scanning module is used for carrying out first scanning on the file block corresponding to the file to be scanned after receiving the triggering message of file scanning to determine that the file block with variation exists, and the second scanning module is used for carrying out second scanning on the file block with variation to determine that the file with variation exists. The device provided by the invention can reduce the scanning times, accurately determine the changed files or catalogues, and improve the efficiency of file detection.
Since the apparatus according to the embodiment of the present invention is the same as the method according to the above embodiment, the details of the explanation will not be repeated here.
Fig. 7 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the present invention provides an electronic device, including: a processor (processor) 701, a memory (memory) 702, and a bus 703;
wherein, the processor 701 and the memory 702 complete communication with each other through the bus 703;
the processor 701 is configured to invoke program instructions in the memory 702 to perform the methods provided by the above-described method embodiments, for example, including: after receiving a triggering message of file scanning, scanning a file block corresponding to a file to be scanned for the first time, and determining that a changed file block exists; and scanning the file block with the change for the second time to determine the file with the change.
The present embodiments provide a computer program product comprising computer executable instructions, characterized in that the instructions, when executed, are for implementing the steps of the file filtering method according to any of the embodiments described above, for example comprising: after receiving a triggering message of file scanning, scanning a file block corresponding to a file to be scanned for the first time, and determining that a changed file block exists; and scanning the file block with the change for the second time to determine the file with the change.
The present embodiment provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: after receiving a triggering message of file scanning, scanning a file block corresponding to a file to be scanned for the first time, and determining that a changed file block exists; and scanning the file block with the change for the second time to determine the file with the change.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A document scanning method, comprising:
after receiving a triggering message of file scanning, scanning a file block corresponding to a file to be scanned for the first time, and determining that a changed file block exists; the first scanning is performed on the file block corresponding to the file to be scanned, and the determining that the file block has the change includes: calculating first abstract data of a first file block according to directory information of the first file block and/or information of each file contained in the file block; determining whether the first file block is a file block with change according to the first abstract data and the second abstract data of the first file block; the second summary data is obtained by scanning or is stored in advance under the condition that the first file block is unchanged;
scanning the file block with the change for the second time to determine the file with the change; the second scanning is performed on the file block with the change, and the determining of the file with the change comprises the following steps:
acquiring directory information of the second file block and/or information of each file contained in the second file block; wherein the second file block is a file block in which a change exists in the first file block determined by the first scanning;
determining that a changed file exists in the second file block according to the directory information of the second file block and/or the information of each file contained in the second file block and the first directory information of the second file block and/or the first information of each file contained in the first directory information of the second file block obtained by the last scanning; wherein the first directory information of the second file block and/or the first information of each file contained in the second file block is information stored before the current file scanning operation.
2. The method according to claim 1, wherein calculating the first summary data of the first file block according to the directory information of the first file block and/or the information of each file included includes:
and calculating first abstract data of the first file block according to the directory information of the first file block and/or the names, the sizes and the modification time of all the contained files through a fuzzy hash algorithm or cyclic redundancy check.
3. The method of claim 1, wherein the first scanning of the file block corresponding to the file to be scanned includes: carrying out first scanning on a plurality of file blocks corresponding to the file to be scanned in parallel;
correspondingly, the receiving the identification information of the second file block includes:
and respectively receiving the identification information of the second file block from the results of the plurality of parallel first scanning by means of a message queue.
4. The method according to claim 1, wherein the first directory information of the second file block and/or the first information of each file included is stored in a red-black tree manner.
5. The method according to any one of claims 1 to 4, wherein the triggering message of the file scanning is generated when the file to be scanned is newly added or modified or deleted, or when directory information of a file block corresponding to the file to be scanned is newly added or modified or deleted.
6. A document scanning apparatus, comprising:
the first scanning module is used for carrying out first scanning on the file block corresponding to the file to be scanned after receiving the triggering message of file scanning, and determining that the file block has variation; the first scanning module is specifically configured to calculate first summary data of a first file block according to directory information of the first file block and/or information of each file included in the file block; determining whether the first file block is a file block with change according to the first abstract data and the second abstract data of the first file block; the second summary data is obtained by scanning or is stored in advance under the condition that the first file block is unchanged;
the second scanning module is used for scanning the file block with the change for the second time and determining the file with the change; the second scanning module is specifically configured to obtain directory information of the second file block and/or information of each included file; wherein the second file block is a file block in which a change exists in the first file block determined by the first scanning; determining that a changed file exists in the second file block according to the directory information of the second file block and/or the information of each file contained in the second file block and the first directory information of the second file block and/or the first information of each file contained in the first directory information of the second file block obtained by the last scanning; wherein the first directory information of the second file block and/or the first information of each file contained in the second file block is information stored before the current file scanning operation.
7. An electronic device, comprising:
a processor, a memory, and a bus, wherein,
the processor and the memory complete communication with each other through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the steps of the file scanning method of any of claims 1-5.
8. A non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the steps of the file scanning method of any one of claims 1 to 5.
CN202110778728.7A 2021-07-09 2021-07-09 File scanning method, device, electronic equipment and storage medium Active CN113704176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110778728.7A CN113704176B (en) 2021-07-09 2021-07-09 File scanning method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110778728.7A CN113704176B (en) 2021-07-09 2021-07-09 File scanning method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113704176A CN113704176A (en) 2021-11-26
CN113704176B true CN113704176B (en) 2023-10-31

Family

ID=78648382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110778728.7A Active CN113704176B (en) 2021-07-09 2021-07-09 File scanning method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113704176B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765740A (en) * 2014-01-03 2015-07-08 腾讯科技(深圳)有限公司 File scanning control method and device
CN107247722A (en) * 2017-04-25 2017-10-13 北京金山安全软件有限公司 File scanning method and device and intelligent terminal
CN108446407A (en) * 2018-04-12 2018-08-24 北京百度网讯科技有限公司 Database audit method based on block chain and device
CN108932236A (en) * 2017-05-22 2018-12-04 北京金山云网络技术有限公司 A kind of file management method, scratch file delet method and device
CN111382123A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 File storage method, device, equipment and storage medium
CN111769933A (en) * 2020-06-29 2020-10-13 北京天融信网络安全技术有限公司 Method and device for monitoring file change, electronic equipment and storage medium
CN112416787A (en) * 2020-11-27 2021-02-26 平安普惠企业管理有限公司 JAVA-based project source code scanning analysis method, system and storage medium
CN112905539A (en) * 2021-03-25 2021-06-04 芝麻链(北京)科技有限公司 Automatic data storage method and device based on message digest

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052575A (en) * 2017-12-08 2018-05-18 深圳市创维软件有限公司 File scanning method, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765740A (en) * 2014-01-03 2015-07-08 腾讯科技(深圳)有限公司 File scanning control method and device
CN107247722A (en) * 2017-04-25 2017-10-13 北京金山安全软件有限公司 File scanning method and device and intelligent terminal
CN108932236A (en) * 2017-05-22 2018-12-04 北京金山云网络技术有限公司 A kind of file management method, scratch file delet method and device
CN108446407A (en) * 2018-04-12 2018-08-24 北京百度网讯科技有限公司 Database audit method based on block chain and device
CN111382123A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 File storage method, device, equipment and storage medium
CN111769933A (en) * 2020-06-29 2020-10-13 北京天融信网络安全技术有限公司 Method and device for monitoring file change, electronic equipment and storage medium
CN112416787A (en) * 2020-11-27 2021-02-26 平安普惠企业管理有限公司 JAVA-based project source code scanning analysis method, system and storage medium
CN112905539A (en) * 2021-03-25 2021-06-04 芝麻链(北京)科技有限公司 Automatic data storage method and device based on message digest

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于区块链的对账系统设计与实现;周硙;;软件工程(02);全文 *

Also Published As

Publication number Publication date
CN113704176A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
CN109034993B (en) Account checking method, account checking equipment, account checking system and computer readable storage medium
US9983941B2 (en) Method and apparatus for recovering data
CN106936441B (en) Data compression method and device
RU2536664C2 (en) System and method for automatic modification of antivirus database
JP6301256B2 (en) Processing method, computer program, and metadata support server
US9614866B2 (en) System, method and computer program product for sending information extracted from a potentially unwanted data sample to generate a signature
CN107977473B (en) Logback-based distributed system log retrieval method and system
US10795860B1 (en) WAN optimized micro-service based deduplication
CN110888838A (en) Object storage based request processing method, device, equipment and storage medium
CN110019873B (en) Face data processing method, device and equipment
US9058330B2 (en) Verification of complex multi-application and multi-node deployments
WO2018113210A1 (en) Repeated medical documentation deletion system and method in medical informationization
US10540325B2 (en) Method and device for identifying junk picture files
CN110188103A (en) Data account checking method, device, equipment and storage medium
CN105868056B (en) Obtain the method, apparatus and secure virtual machine of deleted document in Windows virtual machine
US20170199903A1 (en) System for backing out data
CN109669795A (en) Crash info processing method and processing device
CN110618974A (en) Data storage method, device, equipment and storage medium
CN112685612A (en) Feature code searching and matching method, device and storage medium
CN108573172B (en) Data checking and storing method and device
CN113704176B (en) File scanning method, device, electronic equipment and storage medium
CN111460436B (en) Unstructured data operation method and system based on blockchain
CN112688905B (en) Data transmission method, device, client, server and storage medium
CN110517010B (en) Data processing method, system and storage medium
CN111309689A (en) File duplicate checking method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Applicant after: Qianxin Technology Group Co.,Ltd.

Applicant after: Qianxin Wangshen information technology (Beijing) Co.,Ltd.

Address before: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Applicant before: Qianxin Technology Group Co.,Ltd.

Applicant before: LEGENDSEC INFORMATION TECHNOLOGY (BEIJING) Inc.

GR01 Patent grant
GR01 Patent grant