CN102594809A - Method and system for rapidly scanning files - Google Patents

Method and system for rapidly scanning files Download PDF

Info

Publication number
CN102594809A
CN102594809A CN2012100265855A CN201210026585A CN102594809A CN 102594809 A CN102594809 A CN 102594809A CN 2012100265855 A CN2012100265855 A CN 2012100265855A CN 201210026585 A CN201210026585 A CN 201210026585A CN 102594809 A CN102594809 A CN 102594809A
Authority
CN
China
Prior art keywords
file
characteristic information
scanning
information
packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100265855A
Other languages
Chinese (zh)
Other versions
CN102594809B (en
Inventor
邹贵强
付旻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qizhi Software Beijing Co Ltd filed Critical Qizhi Software Beijing Co Ltd
Priority to CN201210026585.5A priority Critical patent/CN102594809B/en
Publication of CN102594809A publication Critical patent/CN102594809A/en
Priority to US14/377,014 priority patent/US9355250B2/en
Priority to PCT/CN2013/071383 priority patent/WO2013117151A1/en
Application granted granted Critical
Publication of CN102594809B publication Critical patent/CN102594809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides a method and a system for rapidly scanning files and relates to the technical field of networks. The method includes the steps of obtaining a data package, wherein the data package contains secure file characteristic information which is used for determining whether files in the system are secure files or not; and scanning file characteristic information of files in the system one by one and matching the file characteristic information which is scanned currently with the secure file characteristic information in the data package which marks files as secure files; skipping antivirus scanning of the current file if the file characteristic information matches with the secure file characteristic information, and going on scanning the next file; and performing antivirus scanning for the current file if the file characteristic information does not match with the secure file characteristic information, and adding the secure file characteristic information of the file into the data package when the scanning time > or >= the time threshold and the file is secure. The method and the system can reduce the time for first scanning and timely update the data package for usage by local clients or other clients.

Description

A kind of file fast scanning method and system
Technical field
The application relates to networking technology area, particularly relates to a kind of file fast scanning method and system.
Background technology
Along with popularizing of computer, user side all need be installed antivirus software basically the file in the computer is scanned.When antivirus software scans, need a large amount of CPU computing and disk operating, make that scanning process is very long and influence system speed.And for the file in the computer, it all is identical that a lot of files are arranged, such as the file of Windows, and the installation kit file of a lot of softwares, help file, compressed file etc.
In the prior art; All Files when the antivirus software of user side scans for the first time in the meeting scan full hard disk computer; And the various contents in the meeting scanning document; If for the file that comprises that content is bigger, the time consumption of its scanning quite a lot of, thus cause the time length very of scanning for the first time.Such as for compressed package, prior art can scan the content decompress(ion) in the compressed package according to normal scanning flow process one by one, is safe to guarantee file; Like this for a compressed package; The sweep time of prior art is long, thereby has prolonged the time of whole scanning, and for the user; The speed that CPU computing that takies for a long time owing to scan for a long time and disk operating have influenced system, thus the user of influence is to the use of computer.
Summary of the invention
The application's technical problem to be solved provides a kind of file fast scanning method and system, has solved the new user side long problem sweep time first time.
In order to address the above problem, the application discloses a kind of file fast scanning method, comprising:
Obtain packet; Said packet comprises and is used for confirming whether system's file is the secure file characteristic information of secure file;
The file feature information of scanning system file one by one is that the secure file characteristic information of secure file matees with markup document in the file feature information of current scanning and the said packet;
If during coupling, then skip virus killing scanning to current file, continue the next file of scanning;
If when not mating, to the current file scanning of killing virus, when sweep time greater than or more than or equal to time threshold and when safe, the secure file characteristic information of this document is updated to packet.
Preferably, said characteristic information sends to the secure file characteristic information acquisition in the processing center server through adding up each user side.
Preferably, the secure file characteristic information that sends in the processing center server through each user side of following steps statistics obtains said characteristic information:
Receive the secure file characteristic information that each user side sends; When said characteristic information comprises user side complete scan file, the sweep time of file greater than or more than or equal to the characteristic information of the file of the safety of time threshold;
For identical characteristic information, add up its number of repetition;
Extract number of repetition greater than or more than or equal to the characteristic information of amount threshold.
Preferably, when sweep time of said file greater than or during more than or equal to time threshold:
Obtain the characteristic information of file, said characteristic information comprises filename, file size, file modification time and file content descriptor;
Filename is carried out CRC calculate, obtain the filename crc value;
The file content descriptor is carried out Message Digest 5 calculate, obtain content match information;
To comprise file size, file modification time, filename crc value and content match information as the secure file characteristic information.
Preferably, for identical characteristic information, add up its number of repetition:
The characteristic information that receives is sorted by gross weight is several again;
For each same characteristic features information, will belong to the heavily computing that disappears of same characteristic features message that same user side sends;
To each the same characteristic features information that disappears after weighing, add up the number of repetition of each characteristic message.
Preferably, extract number of repetition greater than or characteristic information more than or equal to amount threshold after comprise:
Deposit the said characteristic information that extracts in data file, and generate packet according to said data file.
Preferably,, said packet also comprises before generating:
Receive the characteristic information of the unsafe file of user side transmission, do not deposit this characteristic information in data file and perhaps characteristic information identical with this characteristic information in the packet is deleted.
Preferably, on generating, behind packet, behind the characteristic information that counts on new secure file, be updated to a packet.
Preferably, through following method the characteristic information of said file and the characteristic information in the said packet are mated:
In buffer memory, begin to mate from the highest characteristic information of matching efficiency.
Preferably; When characteristic information coupling in file size and file modification time and the database; Then filename is carried out said CRC and calculate, obtain the filename crc value, and the filename crc value of said filename crc value and this characteristic information is mated;
When the filename crc value of said filename crc value and this characteristic information matees; Then the file content descriptor being carried out said Message Digest 5 calculates; Obtain content match information, and the content match information of said content match information and this characteristic information is mated.
Preferably, when first user side carries out scanning the first time, point out first user side whether to select quick scanning,, and call said packet and mate if select then the characteristic information of scanning document.
Preferably, when first user side scanned, the characteristic information that will work as secure file in time scanning result deposited the secure file information list of said packet in; , first user side scans when scanning next time according to the record after the last time scanning.
Accordingly, disclosed herein as well is the quick scanning system of a kind of file, comprising:
First user side, said first user side comprises:
Acquisition module is used to obtain packet; Said packet comprises and is used for confirming whether system's file is the secure file characteristic information of secure file;
The scan process module is used for the file feature information of scanning system file one by one, is that the secure file characteristic information of secure file matees with markup document in the file feature information of current scanning and the said packet; If during coupling, then skip virus killing scanning to current file, continue the next file of scanning; If when not mating, to the current file scanning of killing virus, when sweep time greater than or more than or equal to time threshold and when safe, the secure file characteristic information of this document is updated to packet.
Preferably, also comprise:
The second user side crowd and processing center server;
Said processing center server is used for adding up the secure file characteristic information that each user side sends to the processing center server and obtains the characteristic information in the said packet;
The characteristic information of the secure file that the said second user side crowd is used to send.
Preferably, described processing center server comprises:
The characteristic information module is used to receive the characteristic information of the secure file that each user side sends; When said characteristic information comprises user side complete scan file, the sweep time of file greater than or more than or equal to the characteristic information of the file of the safety of time threshold;
Statistical module is used for adding up its number of repetition for identical characteristic information;
Extraction module, be used to extract number of repetition greater than or more than or equal to the characteristic information of amount threshold.
Preferably, each user side of the said second user side crowd comprises:
Characteristic acquisition unit, be used for when sweep time of said file greater than or during more than or equal to time threshold, obtain the characteristic information of file, said characteristic information comprises filename, file size, file modification time and file content descriptor;
The filename computing unit is used for that filename is carried out CRC and calculates, and obtains the filename crc value;
The content description information computing unit is used for that the file content descriptor is carried out Message Digest 5 and calculates, and obtains content match information;
The characteristic information transmitting element is used for the characteristic information that comprises file size, file modification time, filename crc value and content match information is sent.
Preferably, said statistical module comprises: sequencing unit is used for the characteristic information that receives is sorted by gross weight is several again;
The heavy unit that disappears is used for for each same characteristic features information, will belong to the heavily computing that disappears of same characteristic features message that same user side sends;
Statistic unit is used for adding up the number of repetition of each characteristic message to each the same characteristic features information that disappears after weighing.
Preferably, after extraction module, also comprise:
Generation unit is used for depositing the said characteristic information that extracts in data file, and generates packet according to said data file.
Preferably,, said packet also comprises before generating:
Remove the unit, be used to receive the characteristic information of the unsafe file that user side sends, do not deposit this characteristic information in data file or characteristic information identical with this characteristic information in the packet is deleted.
Preferably, also comprise:
Update module, be used on generating a packet after, behind the characteristic information that counts on new secure file, be updated to a packet.
Preferably, through following method the characteristic information of said file and the characteristic information in the said packet are mated:
In buffer memory, begin to mate from the highest characteristic information of matching efficiency.
Preferably; When characteristic information coupling in file size and file modification time and the database; Then filename is carried out said CRC and calculate, obtain the filename crc value, and the filename crc value of said filename crc value and this characteristic information is mated;
When the filename crc value of said filename crc value and this characteristic information matees; Then the file content descriptor being carried out said Message Digest 5 calculates; Obtain content match information, and the content match information of said content match information and this characteristic information is mated.
Preferably, also comprise:
Prompting module is used for when scanning for the first time, points out first user side whether to select quick scanning, if select then the characteristic information of scanning document, and calls said packet and matees.
Preferably, also comprise:
Upgrade logging modle, be used for when first user side scans, the characteristic information that will work as secure file in time scanning result deposits the secure file information list of said packet in; , first user side scans when scanning next time according to the safety records result after the last time scanning.
Compared with prior art, the application comprises following advantage:
The application comprises through utilization and is used for confirming whether system's file is the packet of the secure file characteristic information of secure file; New user is when scanning for the first time; If scan with said packet in just can skip the file of long and safety of time of its representative during the identical file of characteristic information; Can reduce the time of scanning first, and the said packet that can upgrade in time offers local user's end or other user sides use.
Description of drawings
Fig. 1 is the schematic flow sheet of a kind of file fast scanning method of the application;
Fig. 2 is the schematic flow sheet of the preferred a kind of packet generation method of the application;
Fig. 3 is a kind of example of compressed package head data;
Fig. 4 is the structural representation of the quick scanning system of a kind of file of the application;
Fig. 5 is the structural representation of the quick scanning system of the preferred a kind of file of the application.
Embodiment
For above-mentioned purpose, the feature and advantage that make the application can be more obviously understandable, the application is done further detailed explanation below in conjunction with accompanying drawing and embodiment.
In reality, for being in some levels (such as 10 5) above user side; If when this user side more than order of magnitude all scans the file with same characteristic (such as comprising filename, file size, file modification time, file content descriptor, content information etc.); If this kind file is safe, other users also are safe at the file that use has same characteristic basically so.The application is promptly consuming time long to complete scan according to this specific character statistics flood tide user; And the special information of the file of safety generates packet (upgrade patch etc.) to be matched based on said characteristic information then, and the user is after using said packet; But the characteristic information in the characteristic information of scanning document and the said packet matees; If on the coupling, can skip long file of these normal scan times so, thereby can reach the purpose of saving sweep time.
With reference to Fig. 1, the schematic flow sheet that it shows a kind of file fast scanning method of the application comprises:
Step 110 is obtained packet; Said packet comprises and is used for confirming whether system's file is the secure file characteristic information of secure file.
New user side at first obtains the packet that comprises security feature information, and new then user side then can scan according to this packet, to save the time of scanning for the first time.Wherein new user side adopts first user side to describe in this application; Said first user side mainly comprises user side that antivirus software is not installed and the user side of antivirus software being installed but not being carried out overall file scan; Can also comprise the user side of antivirus software being installed and being carried out overall file scan, but occur the user side of new not scanned file in the user side after carrying out overall file scan in the last time.
First user side can deposit the installation kit of said packet through installation in, perhaps mounted upgrading antivirus software is obtained described packet.In reality; First user side possibly not carry out overscanning fully, possibly in use may get access to a lot of new files, and these file user sides also possibly not carry out complete scan yet; When this user side is thought to scan fast, then can carry out through the application's packet.
Wherein, in the said packet really the fixed system file be that the characteristic information of secure file can obtain through the complete scan result of each user side of statistics.Such as; Carry out the result of complete scan for a plurality of users; For the file that in each user side, has same characteristic features information (such as characteristic informations such as file size, file modification time, filename crc value and content match information); If each user side can deposit this characteristic information in said packet so to the equal safety of the scanning result of this document, be used for confirming that the file that system has this characteristic information is a secure file.
Preferably, said characteristic information sends to the secure file characteristic information acquisition in the processing center server through adding up each user side.Promptly carry out the characteristic information of the file of the affirmation safety that complete scan obtains for each user side, the processing center server carries out statistics and analysis to it automatically, can the characteristic information greater than some threshold values be generated corresponding packet.In this application, at first need generate packet,, show the schematic flow sheet of the preferred a kind of packet generation method of the application, comprise with reference to Fig. 2:
Step 210 receives the characteristic information of the secure file that each user side sends; When said characteristic information comprises user side complete scan file, the sweep time of file greater than or more than or equal to the characteristic information of the file of the safety of time threshold.
In reality, there is the second user side crowd, comprising a large number of users end, these user sides all can be sent to the qualified characteristic information that the file in the own system is carried out after the complete scan in the processing center server.Promptly when each user side of the second user side crowd is selected the file in its computer of complete scan, can with scanning consuming time greater than or send to the processing center server more than or equal to the characteristic information of the secure file of certain hour threshold value.In the reality; The application adds up through the normal scanning of the user side scanning characteristic information of long file consuming time to flood tide; Then based on greater than or generate packet to be matched more than or equal to the characteristic information of the respective file of threshold value; New then user when scanning for the first time, if scan with said packet in just can skip the file of long and safety of time of its representative during the identical file of characteristic information, can reduce the time of scanning first.
Preferably, when user side complete scan file, when said sweep time greater than or more than or equal to time threshold and when safe:
Described complete scan is the scanning of killing virus of the full content of each file in the user terminal system.
Step S11 obtains the characteristic information of file, and said characteristic information comprises filename, file size, file modification time and file content descriptor.
When user side is selected the complete scan file; Scanning engine scans each file in the user end computer one by one, for a lot of files, such as for compressed package; Scanning engine need be according to requirement first decompress(ion) in engine of compressed package; Again each file that comprises in the compressed package is scanned, generally speaking, suitable many of the time that this scan mode needs; Such as the installation kit for software, scanning engine also need come out the information decompress(ion) in the installation kit to scan again, need expend the considerable time equally again.So when time that user side scans the cost of whole file greater than or during more than or equal to threshold value; The said characteristic information of characteristic information that then can obtain current file comprises filename, file size, file modification time and file content descriptor, and said characteristic information comprises filename, file size, file modification time and file content descriptor.
Wherein, when calculating user side for sweep time of a file, the time point in the time of can be according to the beginning scanning document when finishing scanning document, calling system api function GetTickCount calculated from the sweep time of beginning scanning document when finishing scanning document.Such as beginning scanning document time point do; 21 o'clock 50 minutes 30 seconds and 00 millisecond Tuesday of on January 17th, 2012; Time point is 21 o'clock 51 minutes 30 seconds and 00 millisecond Tuesday of on January 17th, 2012 when finishing scanning document, and user side is for being 1 minute the sweep time of current file so.If time threshold is set is 30 seconds, and so for this document, user side promptly obtains the characteristic information of this document, comprises filename, file size, file modification time and file content descriptor.
Wherein the file content descriptor only accounts for the very little part of whole file, and when user side scanning document content description information part, the spent time is far smaller than the time of the whole file of scanning.Such as for compressed file; Its file content descriptor is in compressed package head data; The byte number that can take according to the current file content description information that header data is informed during scanning, user side only need scan byte address space, current file header data place and get final product, with reference to Fig. 3; It is a kind of header data example of compressed package; In the header data of compressed package, described the fileinfo in the compressed package (filename, size, data check value etc.),, just explained that the compressed package content does not change as long as these critical datas do not change.Such as the installation kit for software, content description information is in the tail data of installation kit again, and the byte number that can take according to the tail data that file is informed during scanning scans the respective byte address space and gets final product.
Step S12 carries out CRC with filename and calculates, and obtains the filename crc value.
Because filename relates to user's privacy, the application calculates CRC of file masterpiece (CRC, Cyclical Redundancy Check), and obtaining one does not have readable filename crc value.
Step S13 carries out Message Digest 5 with the file content descriptor and calculates, and obtains content match information.
For the file content descriptor, whole relatively file is though the file content descriptor is very little; If but itself go to mate with the file content descriptor; Then possibly cause the time more relatively owing to the file content descriptor is huge, the application then carries out Message Digest 5 (Message-Digest Algorithm) with the file content descriptor and calculates, and obtains content match information; General the application carries out MD5 (Message-Digest Algorithm 5; Message Digest Algorithm 5) calculate, the content match information that obtains is the MD5 value, can significantly reduce match time when mating through the MD5 value; Guarantee contrast coupling fast also to have guaranteed simultaneously safety of files.
Step S14 will comprise that the characteristic information of file size, file modification time, filename crc value and content match information sends.
After obtaining the above-mentioned characteristic information that comprises file size, file modification time, filename crc value and content match information, can said characteristic information be sent to the processing center server, wait for the processing center server process.
The application's applied environment has comprised provides the antivirus software installation kit of being correlated with; The processing center server of data such as AKU; A large amount of user sides that passes through the connection of network language processing center server, the complete scan work of therefore for the online user file being carried out all can be carried out above-mentioned steps.
Step 220 for identical characteristic information, is added up its number of repetition.
In this step, the processing center server can go heavily computing to the data of receiving, a plurality of data with same characteristic features information that are about to the same client transmission are gone heavily, and making its number of times is 1.
Preferably, for identical characteristic information, add up its number of repetition:
S21 sorts the characteristic information that receives by gross weight is several again.
At first the characteristic information that receives is counted by gross weight again and sorted, such as the file of characteristic information (m, 100kb hold to(for) corresponding certain user; 2012/1/11/21:50:30:10, n) wherein m is the filename crc value, n is that the content match information of file content descriptor is the MD5 value; The corresponding user side that sends this information of corresponding every characteristic information; Such as with to user side A, its information that sends to the processing center server can A-(m, 100kb; 2012/1/11/21:50:30:10, form n) shows.All bars that the processing center server is received have, and identical (2012/1/11/21:50:30:10 B) adds up it and repeats total degree, sorts according to the statistics total degree then for A, 100kb.In the process of this time statistics ordering, can easily find out the data that the same subscriber end sends with same characteristic features information.
S22 for each same characteristic features information, will belong to the heavily computing that disappears of same characteristic features message that same user side sends.
For a plurality of data with same characteristic features information that same client is sent, sending characteristic information such as customer end A is (m, 100kb; 2012/1/11/21:50:30:10 n) has 10, so it is gone heavily; (the m that makes processing center that this user side is repeated to send; 100kb, 2012/1/11/21:50:30:10, n) characteristic information is designated as 1 time.So can guarantee the accuracy of number of users when adding up, guarantee the application's validity for certain characteristic information.
S23 to each the same characteristic features information that disappears after weighing, adds up the number of repetition of each characteristic message.
After going to weigh, can add up the number of repetition of each characteristic message, this number of times is consistent with the user side quantity that scanning obtains this characteristic information again.
The application also can add up the number of repetition of each file feature information through the additive method heavily computing that disappears, and this application is not limited it.
Step 230, extract number of repetition greater than or more than or equal to the characteristic information of amount threshold.
After obtaining the number of repetition of characteristic information, number of repetition and amount threshold are compared, if said number of repetition then deposits it in data file greater than threshold value.Such as for aforementioned (m, 100kb, 2012/1/11/21:50:30:10, n) number of repetition of characteristic information is 150,000, and threshold value is set to 100,000, so can be with this characteristic information data extract.
Step 240 deposits the said characteristic information that extracts in data file, and generates packet according to said data file.
In this application data file can in the form that can tabulate of data exist; Every characteristic information comprises a plurality of dimensions, such as for (m, 100kb; 2012/1/11/21:50:30:10; N), then this characteristic information comprises four dimensions in the tabulation, is respectively file size, file modification time, filename crc value and content match information., follow up scan matees when handling then with this four dimensions.
For the data file that generates, can be bundled to the installation kit of antivirus software, also can generate the form of upgrade patch.
In addition,, said packet also comprises before generating:
Step S210, the characteristic information of the unsafe file that the reception user side sends does not deposit this characteristic information in data file and perhaps characteristic information identical with this characteristic information in the packet is deleted.
User side for the complete scan time greater than the appearance of threshold value viral file; Also can the characteristic information of this unsafe file be labeled as dangerous; Redispatch to the processing center server; The processing center server can all not handled for all the bar data with this characteristic information, does not deposit data file in yet.Such as there being 50,000 user sides to send aforementioned (m, 100kb, 2012/1/11/21:50:30:10, n) data of characteristic information; But the Data Identification of this characteristic information that one of them user side sends is dangerous, promptly finds virus, and then the processing center server then can be to not having (m, 100kb; 2012/1/11/21:50:30:10, n) data of characteristic information are handled, and will not comprise (m; 100kb, 2012/1/11/21:50:30:10, characteristic information n) deposits in the packet.
In addition, in reality, when the processing center server is added up at the characteristic information that each user side is sent, be to add up in the cycle generally with certain hour length, and at the packet that the statistics in this cycle is generated the characteristic information of secure file.Such as the processing center server is that chronomere adds up the characteristic information data that each user side sends with the sky, generates packet then, and first user side that can offer following one-period uses.
Step 120, the file feature information of scanning system file one by one is that the secure file characteristic information of secure file matees with markup document in the file feature information of current scanning and the said packet; If during coupling, then skip virus killing scanning to current file, continue the next file of scanning; If when not mating, to the current file scanning of killing virus, when sweep time greater than or more than or equal to time threshold and when safe, the secure file characteristic information of this document is updated to packet.。
Such as to scannings of killing virus such as the malicious code that comprises in the file content or viruses.
After getting access to said data, then can scan fast:
Step S121, the system of first user side is the characteristic information of scanning system file one by one;
Step S122 is that the secure file characteristic information of secure file matees with markup document in the file feature information of current scanning and the said packet; If during coupling, then change step S123 over to, skip current file, continue the next file of scanning; If when not matching, then change step S124 over to, current file is carried out complete virus killing scanning, when sweep time greater than or more than or equal to time threshold and when safe, change step S125 over to, the secure file characteristic information of this document is updated to packet.
In reality, the characteristic information among the step S125 can obtain through following steps:
Step M1 obtains the characteristic information of file, and said characteristic information comprises filename, file size, file modification time and file content descriptor;
Step M2 carries out CRC with filename and calculates, and obtains the filename crc value;
Step M3 carries out Message Digest 5 with the file content descriptor and calculates, and obtains content match information;
Step M4, will comprise file size, file modification time, filename crc value and content match information as the secure file characteristic information.Step M1 is similar with abovementioned steps S11 to S14 to step M4 concrete grammar.
Wherein, In the process of wrapping that Updates Information; Can be directly the characteristic information of said secure file be updated in the packet in the local system; Also can the characteristic information of said secure file be sent in the processing center server, in the packet of processing center server according to extremely corresponding each application end of its Policy Updates.
In reality, also comprise before scanning at first user side: confirm whether to select quick scanning, if, the characteristic information of scanning document then, and call said packet and mate.
Promptly first user side can be selected to scan fast or to the complete scan of file, if select scanning fast, but the characteristic information of scanning document then, and call said packet and mate the coupling of carrying out characteristic information.When the characteristic information of said file and the coupling of the characteristic information in the said packet, then skip current file, continue the next file of scanning.
In this application, when scanning for the first time, can point out first user side whether to select quick scanning,, and call said packet and mate if select then the characteristic information of scanning document.
If first user side is selected to scan fast, first user side characteristic information that when scanning, then at first obtains the first user side file matees so, and without the full content of scanning document.
Wherein, in this application, during coupling,, there is a dimension not mate, representes that promptly this document does not mate if in the characteristic information, can be with this document scanning of killing virus.Four dimensions such as aforesaid characteristic information: file size; The file modification time, filename crc value and content match information, the matching order of this four dimensions is: 1, file size; 2, the file modification time, 3, filename crc value 4, content match information; If one when when promptly file size is mated, can be carried out the coupling of 2,3 and 4 dimensions with the file size of file A and first dimension of the characteristic information in the packet so, the scanning of can file A need killing virus; If on the file size of the file A coupling; Again with the file modification time of file A and the 2nd dimension of this characteristic information in the packet; Be when mating the file modification time, can carry out the coupling of 3 and 4 dimensions so, can be with the file A scanning of killing virus.Other situation can the rest may be inferred.
When first user side matees the characteristic information of said file and the characteristic information in the said packet:
In buffer memory, begin to mate from the highest characteristic information of matching efficiency.
Because characteristic information is a multidimensional in the packet; When whether judgement matees, can begin comparison from most effective data, the outer dimension of calculating such as not carrying out; For example file size and file modification time; Because file size and modification time are the system datas that when the traversal file, can obtain, do not need extra computing, so so operation can improve the efficient of the comparison match that presets buffer memory.
Preferably, when coupling:
Step S31; When characteristic information coupling in file size and file modification time and the database; Then filename is carried out said CRC and calculate, obtain the filename crc value, and the filename crc value of said filename crc value and this characteristic information is mated.Wherein, the CRC computing of filename is an internal memory operation, and data volume is very little.
In when coupling, at first mate, in characteristic information dimension in the packet from the higher dimension of matching efficiency; File size and file modification time need not carried out extra computation; Can directly mate, so when user side scans, for the file size and the file modification time of the characteristic information that gets access to; Such as at first comparison document is big or small, at the comparison document modification time.When if the file size of the file of current scanning is identical with the file modification time, then compare the less relatively dimension of amount of calculation, such as passing through CRC computing calculation document name crc value; Filename crc value with said filename crc value and this characteristic information matees then; If on the coupling, then do not get into complete scan, if on the coupling; Then get into the more relatively dimension of amount of calculation and mate, such as changing step S32 over to.
Step S32; When the filename crc value of said filename crc value and this characteristic information matees; Then the file content descriptor is carried out said Message Digest 5 and calculate, obtain content match information, and the content match information of said content match information and this characteristic information is mated.
Work as file size, when file modification time and filename crc value all mate, then the file content descriptor is carried out Message Digest 5 and calculate; Generally be to carry out MD5 to calculate; Obtain the file content match information, then the content match information of said content match information and this characteristic information is mated, on coupling; Then skip current file, change the next file of scanning over to.
In addition, preferred when each scanning, when first user side scanned, the characteristic information that will work as secure file in time scanning result deposited the secure file information list of said packet in; , first user side scans when scanning next time according to the record after the last time scanning.
When first user side combines current packet to scan; Be not included in the file in the packet for characteristic information; If when time detection safety; Then can the characteristic information of its this secure file be deposited in the secure file information list of packet, when first user side scans, can scan more fast next time according to the scanning result of last time.In addition, when user side carried out complete scan to each file, if it is dangerous to detect certain file, but the characteristic information of this document was again in packet, then can be with deleting in its secure file information list from packet.
Accordingly,, disclosed herein as well is the structural representation of the quick scanning system of a kind of file, comprising with reference to Fig. 4:
First user side 410, said first user side comprises:
Acquisition module S411 is used to obtain packet; Said packet comprises and is used for confirming whether system's file is the secure file characteristic information of secure file;
Scan process module S412 is used for the file feature information of scanning system file one by one, is that the secure file characteristic information of secure file matees with markup document in the file feature information of current scanning and the said packet; If during coupling, then skip virus killing scanning to current file, continue the next file of scanning; If when not mating, to the current file scanning of killing virus, when sweep time greater than or more than or equal to time threshold and when safe, the secure file characteristic information of this document is updated to packet.
With reference to Fig. 5, the structural representation that it shows the quick scanning system of the preferred a kind of file of the application comprises:
First user side, 510, the second user side crowds 520 and processing center server 530;
Said first user side comprises 510:
Acquisition module S511 is used to obtain packet; Said packet comprises and is used for confirming whether system's file is the secure file characteristic information of secure file;
Scan process module S512; Be used for the file feature information of scanning system file one by one; When if markup document is the secure file characteristic information coupling of secure file in the file feature information of current scanning and the said packet; Then skip virus killing scanning, continue the next file of scanning current file;
Said processing center server 530 is used for adding up the secure file characteristic information that each user side sends to the processing center server and obtains the characteristic information in the said packet;
The characteristic information of the secure file that the said second user side crowd 520 is used to send.
Preferably, described processing center server comprises:
The characteristic information module is used to receive the characteristic information of the secure file that each user side sends; When said characteristic information comprises user side complete scan file, the sweep time of file greater than or more than or equal to the characteristic information of the file of the safety of time threshold;
Statistical module is used for adding up its number of repetition for identical characteristic information;
Extraction module, be used to extract number of repetition greater than or more than or equal to the characteristic information of amount threshold.
Preferably, each user side of the said second user side crowd comprises:
Characteristic acquisition unit, be used for when said sweep time greater than or more than or equal to time threshold and when safe, obtain the characteristic information of file, said characteristic information comprises filename, file size, file modification time and file content descriptor;
The filename computing unit is used for that filename is carried out CRC and calculates, and obtains the filename crc value;
The content description information computing unit is used for that the file content descriptor is carried out Message Digest 5 and calculates, and obtains content match information;
The characteristic information transmitting element is used for the characteristic information that comprises file size, file modification time, filename crc value and content match information is sent.
Preferably, said statistical module comprises: sequencing unit is used for the characteristic information that receives is sorted by gross weight is several again;
The heavy unit that disappears is used for for each same characteristic features information, will belong to the heavily computing that disappears of same characteristic features message that same user side sends;
Statistic unit is used for adding up the number of repetition of each characteristic message to each the same characteristic features information that disappears after weighing.
Preferably, when user side complete scan file:
According to beginning time point when finishing scanning document during scanning document, calling system api function GetTickCount calculated from the sweep time of beginning scanning document when finishing scanning document.
Preferably, after extraction module, also comprise:
Generation unit is used for depositing the said characteristic information that extracts in data file, and generates packet according to said data file.
Preferably,, said packet also comprises before generating:
Remove the unit, be used to receive the characteristic information of the unsafe file that user side sends, do not deposit this characteristic information in data file or characteristic information identical with this characteristic information in the packet is deleted.
Preferably, through following method the characteristic information of said file and the characteristic information in the said packet are mated:
In buffer memory, begin to mate from the highest characteristic information of matching efficiency.
Preferably; When characteristic information coupling in file size and file modification time and the database; Then filename is carried out said CRC and calculate, obtain the filename crc value, and the filename crc value of said filename crc value and this characteristic information is mated;
When the filename crc value of said filename crc value and this characteristic information matees; Then the file content descriptor being carried out said Message Digest 5 calculates; Obtain content match information, and the content match information of said content match information and this characteristic information is mated.
Preferably, also comprise:
Prompting module is used for when scanning for the first time, points out first user side whether to select quick scanning, if select then the characteristic information of scanning document, and calls said packet and matees.
Preferably, also comprise: upgrade logging modle, be used for when first user side scans, the characteristic information that will work as secure file in time scanning result deposits the secure file information list of said packet in; , first user side scans when scanning next time according to the safety records result after the last time scanning.
For system embodiment, because it is similar basically with method embodiment, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.
Each embodiment in this specification all adopts the mode of going forward one by one to describe, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.
More than to a kind of file fast scanning method and system that the application provided; Carried out detailed introduction; Used concrete example among this paper the application's principle and execution mode are set forth, the explanation of above embodiment just is used to help to understand the application's method and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to the application's thought, the part that on embodiment and range of application, all can change, in sum, this description should not be construed as the restriction to the application.

Claims (24)

1. a file fast scanning method is characterized in that, comprising:
Obtain packet; Said packet comprises and is used for confirming whether system's file is the secure file characteristic information of secure file;
The file feature information of scanning system file one by one is that the secure file characteristic information of secure file matees with markup document in the file feature information of current scanning and the said packet;
If during coupling, then skip virus killing scanning to current file, continue the next file of scanning;
If when not mating, to the current file scanning of killing virus, when sweep time greater than or more than or equal to time threshold and when safe, the secure file characteristic information of this document is updated to packet.
2. method according to claim 1 is characterized in that:
Said characteristic information sends to the secure file characteristic information acquisition in the processing center server through adding up each user side.
3. method according to claim 2 is characterized in that, the secure file characteristic information that sends in the processing center server through each user side of following steps statistics obtains said characteristic information:
Receive the secure file characteristic information that each user side sends; When said characteristic information comprises user side complete scan file, the sweep time of file greater than or more than or equal to the characteristic information of the file of the safety of time threshold;
For identical characteristic information, add up its number of repetition;
Extract number of repetition greater than or more than or equal to the characteristic information of amount threshold.
4. according to claim 1 or 3 described methods, it is characterized in that, when sweep time of said file greater than or during more than or equal to time threshold:
Obtain the characteristic information of file, said characteristic information comprises filename, file size, file modification time and file content descriptor;
Filename is carried out CRC calculate, obtain the filename crc value;
The file content descriptor is carried out Message Digest 5 calculate, obtain content match information;
To comprise file size, file modification time, filename crc value and content match information as the secure file characteristic information.
5. according to claim 2 or 4 described methods, it is characterized in that,, add up its number of repetition for identical characteristic information:
The characteristic information that receives is sorted by gross weight is several again;
For each same characteristic features information, will belong to the heavily computing that disappears of same characteristic features message that same user side sends;
To each the same characteristic features information that disappears after weighing, add up the number of repetition of each characteristic message.
6. method according to claim 3 is characterized in that, extract number of repetition greater than or characteristic information more than or equal to amount threshold after comprise:
Deposit the said characteristic information that extracts in data file, and generate packet according to said data file.
7. method according to claim 2 is characterized in that, before said packet generates, also comprises:
Receive the characteristic information of the unsafe file of user side transmission, do not deposit this characteristic information in data file and perhaps characteristic information identical with this characteristic information in the packet is deleted.
8. method according to claim 2 is characterized in that:
On generating, behind packet, behind the characteristic information that counts on new secure file, be updated to a packet.
9. method according to claim 1 is characterized in that, through following method the characteristic information of said file and the characteristic information in the said packet is mated:
In buffer memory, begin to mate from the highest characteristic information of matching efficiency.
10. method according to claim 9 is characterized in that:
When characteristic information coupling in file size and file modification time and the database; Then filename being carried out said CRC calculates; Obtain the filename crc value, and the filename crc value of said filename crc value and this characteristic information is mated;
When the filename crc value of said filename crc value and this characteristic information matees; Then the file content descriptor being carried out said Message Digest 5 calculates; Obtain content match information, and the content match information of said content match information and this characteristic information is mated.
11., it is characterized in that according to claim 1 or 10 described methods:
When first user side carries out scanning the first time, point out first user side whether to select quick scanning, if select then the characteristic information of scanning document, and call said packet and mate.
12. method according to claim 1 is characterized in that:
When first user side scanned, the characteristic information that will work as secure file in time scanning result deposited the secure file information list of said packet in; , first user side scans when scanning next time according to the record after the last time scanning.
13. the quick scanning system of file is characterized in that, comprising:
First user side, said first user side comprises:
Acquisition module is used to obtain packet; Said packet comprises and is used for confirming whether system's file is the secure file characteristic information of secure file;
The scan process module is used for the file feature information of scanning system file one by one, is that the secure file characteristic information of secure file matees with markup document in the file feature information of current scanning and the said packet; If during coupling, then skip virus killing scanning to current file, continue the next file of scanning; If when not mating, to the current file scanning of killing virus, when sweep time greater than or more than or equal to time threshold and when safe, the secure file characteristic information of this document is updated to packet.
14. system according to claim 13 is characterized in that, also comprises:
The second user side crowd and processing center server;
Said processing center server is used for adding up the secure file characteristic information that each user side sends to the processing center server and obtains the characteristic information in the said packet;
The characteristic information of the secure file that the said second user side crowd is used to send.
15. system according to claim 14 is characterized in that, described processing center server comprises:
The characteristic information module is used to receive the characteristic information of the secure file that each user side sends; When said characteristic information comprises user side complete scan file, the sweep time of file greater than or more than or equal to the characteristic information of the file of the safety of time threshold;
Statistical module is used for adding up its number of repetition for identical characteristic information;
Extraction module, be used to extract number of repetition greater than or more than or equal to the characteristic information of amount threshold.
16. system according to claim 14 is characterized in that, each user side of the said second user side crowd comprises:
Characteristic acquisition unit, be used for when sweep time of said file greater than or during more than or equal to time threshold, obtain the characteristic information of file, said characteristic information comprises filename, file size, file modification time and file content descriptor;
The filename computing unit is used for that filename is carried out CRC and calculates, and obtains the filename crc value;
The content description information computing unit is used for that the file content descriptor is carried out Message Digest 5 and calculates, and obtains content match information;
The characteristic information transmitting element is used for the characteristic information that comprises file size, file modification time, filename crc value and content match information is sent.
17. system according to claim 15 is characterized in that, said statistical module comprises:
Sequencing unit is used for the characteristic information that receives is sorted by gross weight is several again;
The heavy unit that disappears is used for for each same characteristic features information, will belong to the heavily computing that disappears of same characteristic features message that same user side sends;
Statistic unit is used for adding up the number of repetition of each characteristic message to each the same characteristic features information that disappears after weighing.
18. system according to claim 15 is characterized in that, after extraction module, also comprises:
Generation unit is used for depositing the said characteristic information that extracts in data file, and generates packet according to said data file.
19. system according to claim 15 is characterized in that, before said packet generates, also comprises:
Remove the unit, be used to receive the characteristic information of the unsafe file that user side sends, do not deposit this characteristic information in data file or characteristic information identical with this characteristic information in the packet is deleted.
20. system according to claim 15 is characterized in that, also comprises:
Update module, be used on generating a packet after, behind the characteristic information that counts on new secure file, be updated to a packet.
21. system according to claim 16 is characterized in that, through following method the characteristic information of said file and the characteristic information in the said packet is mated:
In buffer memory, begin to mate from the highest characteristic information of matching efficiency.
22. system according to claim 21 is characterized in that:
When characteristic information coupling in file size and file modification time and the database; Then filename being carried out said CRC calculates; Obtain the filename crc value, and the filename crc value of said filename crc value and this characteristic information is mated;
When the filename crc value of said filename crc value and this characteristic information matees; Then the file content descriptor being carried out said Message Digest 5 calculates; Obtain content match information, and the content match information of said content match information and this characteristic information is mated.
23. system according to claim 13 is characterized in that, also comprises:
Prompting module is used for when scanning for the first time, points out first user side whether to select quick scanning, if select then the characteristic information of scanning document, and calls said packet and matees.
24. system according to claim 13 is characterized in that, also comprises:
Upgrade logging modle, be used for when first user side scans, the characteristic information that will work as secure file in time scanning result deposits the secure file information list of said packet in; , first user side scans when scanning next time according to the safety records result after the last time scanning.
CN201210026585.5A 2012-02-07 2012-02-07 Method and system for rapidly scanning files Active CN102594809B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201210026585.5A CN102594809B (en) 2012-02-07 2012-02-07 Method and system for rapidly scanning files
US14/377,014 US9355250B2 (en) 2012-02-07 2013-02-05 Method and system for rapidly scanning files
PCT/CN2013/071383 WO2013117151A1 (en) 2012-02-07 2013-02-05 Method and system for rapidly scanning files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210026585.5A CN102594809B (en) 2012-02-07 2012-02-07 Method and system for rapidly scanning files

Publications (2)

Publication Number Publication Date
CN102594809A true CN102594809A (en) 2012-07-18
CN102594809B CN102594809B (en) 2015-02-18

Family

ID=46483010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210026585.5A Active CN102594809B (en) 2012-02-07 2012-02-07 Method and system for rapidly scanning files

Country Status (1)

Country Link
CN (1) CN102594809B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136477A (en) * 2013-03-06 2013-06-05 北京奇虎科技有限公司 Scanning method and scanning system for file samples
WO2013117151A1 (en) * 2012-02-07 2013-08-15 北京奇虎科技有限公司 Method and system for rapidly scanning files
CN103514270A (en) * 2013-09-13 2014-01-15 广东欧珀移动通信有限公司 Multimedia database scanning method and device
CN103699837A (en) * 2012-09-27 2014-04-02 腾讯科技(深圳)有限公司 Method for scanning files and terminal equipment
CN104239793A (en) * 2014-09-10 2014-12-24 珠海市君天电子科技有限公司 Virus detection method and virus detection device
CN104573514A (en) * 2013-10-29 2015-04-29 腾讯科技(深圳)有限公司 Compressed file detecting method and device
CN105337960A (en) * 2015-09-25 2016-02-17 浪潮(北京)电子信息产业有限公司 PaaS platform security system
CN105718800A (en) * 2016-01-18 2016-06-29 北京金山安全管理系统技术有限公司 Rapid virus scanning and killing method and apparatus
CN106682505A (en) * 2016-05-04 2017-05-17 腾讯科技(深圳)有限公司 Virus detection method, terminal, server and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1964357A (en) * 2006-12-04 2007-05-16 北京金山软件有限公司 A method to process file and information processing device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1964357A (en) * 2006-12-04 2007-05-16 北京金山软件有限公司 A method to process file and information processing device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杜昕: "防毒软件该瘦身了", 《电脑爱好者》, no. 13, 1 July 2008 (2008-07-01) *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013117151A1 (en) * 2012-02-07 2013-08-15 北京奇虎科技有限公司 Method and system for rapidly scanning files
CN103699837B (en) * 2012-09-27 2016-12-21 腾讯科技(深圳)有限公司 A kind of method of scanning file and terminal unit
CN103699837A (en) * 2012-09-27 2014-04-02 腾讯科技(深圳)有限公司 Method for scanning files and terminal equipment
CN103136477A (en) * 2013-03-06 2013-06-05 北京奇虎科技有限公司 Scanning method and scanning system for file samples
CN103136477B (en) * 2013-03-06 2015-09-02 北京奇虎科技有限公司 The scan method of paper sample and system
CN103514270A (en) * 2013-09-13 2014-01-15 广东欧珀移动通信有限公司 Multimedia database scanning method and device
CN103514270B (en) * 2013-09-13 2017-05-31 广东欧珀移动通信有限公司 Multimedia database scan method and device
CN104573514A (en) * 2013-10-29 2015-04-29 腾讯科技(深圳)有限公司 Compressed file detecting method and device
CN104573514B (en) * 2013-10-29 2018-09-04 腾讯科技(深圳)有限公司 The detection method and device of compressed file
CN104239793A (en) * 2014-09-10 2014-12-24 珠海市君天电子科技有限公司 Virus detection method and virus detection device
CN104239793B (en) * 2014-09-10 2017-05-31 珠海市君天电子科技有限公司 Method for detecting virus and device
CN105337960A (en) * 2015-09-25 2016-02-17 浪潮(北京)电子信息产业有限公司 PaaS platform security system
CN105718800A (en) * 2016-01-18 2016-06-29 北京金山安全管理系统技术有限公司 Rapid virus scanning and killing method and apparatus
CN106682505A (en) * 2016-05-04 2017-05-17 腾讯科技(深圳)有限公司 Virus detection method, terminal, server and system
CN106682505B (en) * 2016-05-04 2020-06-12 腾讯科技(深圳)有限公司 Virus detection method, terminal, server and system
US10803171B2 (en) 2016-05-04 2020-10-13 Tencent Technology (Shenzhen) Company Limited Virus detection method, terminal and server

Also Published As

Publication number Publication date
CN102594809B (en) 2015-02-18

Similar Documents

Publication Publication Date Title
CN102609515B (en) Quick file scanning method and quick file scanning system
CN102594809A (en) Method and system for rapidly scanning files
CN102609653A (en) File quick-scanning method and file quick-scanning system
CN111930700A (en) Distributed log processing method, server, system and computing equipment
US8670614B2 (en) Method and system for providing tile map service using image fusion
US20140259142A1 (en) Systems and methods for detecting undesirable network traffic content
CN109981653B (en) Web vulnerability scanning method
CN104219316A (en) Method and device for processing call request in distributed system
CN104640092A (en) Spam short message identifying method, client end, cloud server and system
CN102819713A (en) Method and system for detecting security of popup window
CN106789849B (en) CC attack identification method, node and system
CN111885133A (en) Data processing method and device based on block chain and computer storage medium
CN101364237A (en) Multi-keyword matching method and device
CN102932391A (en) Method and device for processing data in peer to server/peer (P2SP) system, and P2SP system
CN103546449A (en) E-mail virus detection method and device based on attachment formats
CN103678372A (en) Method and equipment for obtaining application performance of page
CN111667141A (en) Pending task case processing method, device, equipment and storage medium
CN102882988A (en) Method, device and equipment for acquiring address information of resource information
CN102754488A (en) User access control method, apparatus and system
CN102299869A (en) Method, client and system for storing network link in instant messaging
CN103841006A (en) Method and device for intercepting junk mails in cloud computing system
US20140310699A1 (en) Method And Apparatus For Upgrading A Plug-In Based On An Instant Messaging Application
CN104715197A (en) Quick file scanning method and system
CN105224573B (en) A kind of regular data configuration method and device, system
CN109508548B (en) Threat behavior collecting system and method based on simulator technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: QIZHI SOFTWARE (BEIJING) CO., LTD.

Effective date: 20121107

Owner name: BEIJING QIHU TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: QIZHI SOFTWARE (BEIJING) CO., LTD.

Effective date: 20121107

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100016 CHAOYANG, BEIJING TO: 100088 XICHENG, BEIJING

TA01 Transfer of patent application right

Effective date of registration: 20121107

Address after: 100088 Beijing city Xicheng District xinjiekouwai Street 28, block D room 112 (Desheng Park)

Applicant after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Applicant after: Qizhi software (Beijing) Co.,Ltd.

Address before: The 4 layer 100016 unit of Beijing city Chaoyang District Jiuxianqiao Road No. 14 Building C

Applicant before: Qizhi software (Beijing) Co.,Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220727

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.