Summary of the invention
The present invention has overcome deficiency of the prior art, first purpose of the present invention provides a kind of file screening system, use this system can the software on the computing machine in the network that total system distributed be screened, the file that screens can be used for follow-up rogue program and differentiates.
Second purpose of the present invention provides a kind of document screening method of using above-mentioned file screening system.
In order to realize above-mentioned first purpose, the present invention adopts following technical scheme:
Be used for the file screening system of killing rogue program, it is characterized in that, comprise server end and a plurality of client;
Server end comprises:
Communication module, it is used for realizing jointly with the communication module of each client the information interaction of server end and client;
The file collection storehouse, it is used to deposit the file that meets collection condition;
The file logging table, it writes down the distributed quantity information of each file, and the start time information of each file and last triggered time information; The distributed quantity of said each file is meant the number of computers that client is housed of depositing this document, the said start time is meant that there is the time of this file for the first time in the computing machine of depositing this document by the report of user end to server end, and the said last triggered time is meant that there is the time of this file for the last time in the computing machine of depositing this document by the report of user end to server end;
The screening module, it is used to calculate the distribution range and the rate of propagation of each file, and will meet the file that imposes a condition and put into the file collection storehouse, described imposing a condition set by the programming personnel, judges whether to filter out this document to the file collection storehouse according to the rate of propagation of file distribution range and file; Described file distribution range is meant the ratio of the number of computers that has this document and the total number of computers that client is installed, and the rate of propagation of described file is meant that the quantity of this document is divided by last triggered time of this document and the difference of start time;
Client comprises:
Communication module, it is used for realizing jointly with the communication module of server end the information interaction of server end and client;
The file characteristic acquisition module, the file feature information of All Files in its continuous circle collection client place computing machine, this document is corresponding one by one with its file characteristic, and file feature information sends to server end by the client communication module.
In order to realize above-mentioned second purpose, the present invention adopts following technical scheme:
Use above-mentioned file screening system to carry out the method for document screening, it comprises following process:
A. a plurality of clients are installed in respectively on each computing machine, server end is connected by network with a plurality of clients;
B. the file feature information of All Files on its place computing machine of the continuous circle collection of file characteristic acquisition module of client, and send to server end by the communication module of client;
If c. there is this characteristic information in server end, then the distributed quantity information with this feature institute respective file adds 1, will be recorded as the last triggered time of this document simultaneously the current time; If there is not this characteristic information in server end, then the distributed quantity information with this feature institute respective file is made as 1, will be recorded as the start time and the last triggered time of this document simultaneously the current time;
When d. the last triggered time of the distributed quantity information of certain file and file changes in the file logging table, the screening module is calculated the distribution range and the rate of propagation of this document, if this document meets and imposes a condition then put into the file collection storehouse, otherwise wouldn't collect this file.
By adopting above-mentioned document screening device and screening technique, can the All Files in the computing machine be screened, according to distribution range and the rate of propagation of each file in network, some files that do not have harm (or can be defined as non-malice substantially) are substantially discharged outside capture range, the file of collecting can follow-uply carry out rogue program again to be differentiated, has shortened the time spent of differentiating rogue program greatly.
Embodiment
The effect of file screening system of the present invention is to filter out a part of file, this part file that screens is as the follow-up object that carries out operations such as rogue program discriminating, and the not screened file that comes out can think not have the harm file, does not carry out follow-up rogue program and differentiates.This just requires the screening rule of this document screening system to want rationally, otherwise will omit the file that should differentiate.
Therefore, at first to introduce the theoretical foundation that this document screening system carries out document screening.The general distribution range of black and white sample in whole network exists difference, and we can utilize this species diversity to collect the black and white sample.Here said black and white sample refers to the malice file respectively and the file that means no harm.After deliberation, the file that contains malicious code generally has following feature:
1, rogue program is to show off the period that technology is a purpose, and the total amount of rogue program is less, growth rate is slower; Rogue program is the period of purpose with economic interests, and its growth rate is very fast, and total amount increases rapidly, meets the economics rule.
2, the economic aim of rogue program has determined the rogue program author need allow rogue program be distributed on wider victim's machine.And make rogue program is illegal act, and this has limited the scale of disseminating of rogue program again.
3, no matter in the solution of anti-rogue program how, all exist one or more initial victims at least.
Referring to Fig. 1, this figure is the distribution situation synoptic diagram of All Files in whole computing machines of client are housed.Wherein X-axis is the distribution range of file, and Y is the file amount.As can be seen from the figure, the distribution range of most files is all smaller, and the distribution range is a dash area area among the figure less than the file total amount of m%.This wherein just comprises some professional softwares, and as engineering software, Electronic Design software, financial accounting software or the like, from follow-up description as can be known, the minimum file of these distribution ranges is not collected, and just can not carry out follow-up rogue program and differentiate operation.
The file screening system that is used for anti-rogue program comprises server end and a plurality of client;
Server end comprises:
Communication module, it is used for realizing jointly with the communication module of each client the information interaction of server end and client;
The file collection storehouse, it is used to deposit the file that meets collection condition, as follow-up source file storehouse of carrying out the killing rogue program;
The file logging table, it writes down the distributed quantity information of each file, and the start time information of each file and last triggered time information; The quantity of said each file is meant the number of computers that client is housed of depositing this document, the said start time is meant that there is the time of this file for the first time in the computing machine of depositing this document by the report of user end to server end, and the said last triggered time is meant that there is the time of this file for the last time in the computing machine of depositing this document by the report of user end to server end;
The screening module, it is used to calculate the distribution range and the rate of propagation of each file, and will meet the file that imposes a condition and put into the file collection storehouse, described imposing a condition set by the programming personnel, judges whether to filter out this document to the file collection storehouse according to the rate of propagation of file distribution range and file; Described file distribution range is meant the ratio of the number of computers that has this document and the total number of computers that client is installed, and the rate of propagation of described file is meant that the quantity of this document is divided by last departure time of this document and the difference of start time;
Client comprises:
Communication module, it is used for realizing jointly with the communication module of server end the information interaction of server end and client;
The file characteristic acquisition module, the file feature information of All Files in its continuous circle collection client place computing machine, this document is corresponding one by one with its file characteristic, and file feature information sends to server end by the client communication module.Client judges whether this characteristic information sent before sending file feature information, and for the characteristic information that had sent, client no longer repeats to send;
Client does not all send to client to whole file, only adopts the characteristic information of file, reduces the Network Transmission flow, reduces the required file storage of server.But, must can determine a file by these characteristic informations, promptly file is corresponding one by one with the characteristic information of file.In the present invention, file feature information is described in two ways, and first is file MD5 value, if the MD5 value is identical, then be same file, if the MD5 value is different, further adopt the second way, file is divided into 10 zones, calculates the HASH value in these 10 zones respectively, any two values of choosing these 10 HASH values are an eigenwert, every group is a feature, if the file of different MD5 has any one feature identical, then be considered as identical file, otherwise be different files.
In conjunction with Fig. 1, described impose a condition into:
The file distribution range is during greater than m% and less than w%, if the rate of propagation of file less than U1, then screens this document and puts into the file collection storehouse;
The file distribution range is higher than b% and during less than n%, if the rate of propagation of file greater than U2, then screens this document and puts into the file collection storehouse;
When the file distribution range is higher than n%, then this document is screened and put into the file collection storehouse;
Wherein, m<b<w<n; The numerical value of described m, b, w, n, U1, U2 is set according to the quantity of client by the programming personnel.The quantity of client is many more, and the screening accuracy rate of file screening system is also just high relatively more.The definite of above-mentioned each numerical value also can set as required, such as, it is big more that the m value is set, and the accuracy rate of screening may reduce (some have the file of harm may leak choosing), but the follow-up quantity of documents for the treatment of that rogue program is differentiated reduces, and the time that rogue program is differentiated shortens.On the contrary, if the setting of m value is more little, the accuracy rate of screening will improve, but the time of follow-up killing disease will increase.
According to the distribution situation of our company's software client, value is as follows for applicant's (Kingsoft software), m=0.001, and b=0.01, w=0.02, n=0.1, U1 are 1/2 minutes, U2 was 1/2 seconds.With these numerical value is example, introduces the method for document screening below.
The document screening method comprises following process:
A. a plurality of clients are installed in respectively on each computing machine, server end is connected by network with a plurality of clients;
B. the file feature information of All Files on its place computing machine of the continuous circle collection of file characteristic acquisition module of client, and send to server end by the communication module of client; Client judges whether this characteristic information sent before sending file feature information, and for the characteristic information that had sent, client no longer repeats to send;
If c. there is this characteristic information in server end, then the distributed quantity information with this feature institute respective file adds 1, will be recorded as the last triggered time of this document simultaneously the current time; If there is not this characteristic information in server end, then the distributed quantity information with this feature institute respective file is made as 1, will be recorded as the start time and the last triggered time of this document simultaneously the current time;
When d. the last triggered time of the distributed quantity information of certain file and file changes in the file logging table, the screening module is calculated the distribution range and the rate of propagation of this document, if this document meets and imposes a condition then put into the file collection storehouse, otherwise wouldn't collect this file.Specifically, the file distribution range is greater than 0.001% and less than 0.02% o'clock, if the rate of propagation of file greater than 2 minutes/, then screens this document and puts into the file collection storehouse; The file distribution range is higher than 0.01% and less than 0.1% o'clock, if the rate of propagation of file less than 2 the second/, then this document is screened and puts into the file collection storehouse; The file distribution range is higher than at 0.1% o'clock, then this document is screened and puts into the file collection storehouse.
As the analysis of front to the rogue program feature that contains malicious code, a malice file in the early stage, the distribution range does not reach m%, can be not screened come out.But along with its diffusion, its distribution range increases, in case its distribution range reaches m% to w%, and rate of propagation is less than U1, and then file can be screened comes out.Even it is a rogue program (such as the worm that utilizes leak initiatively to propagate) of propagating at a high speed, this document does not reach above-mentioned requirements and is collected at this moment, so along with the diffusion of this document, the file distribution range is higher than b% and during less than n%, if the rate of propagation of file is greater than U2, then this document can be screened comes out; If still do not meet collection condition this moment, along with the further diffusion of this malice file as long as its distribution range reaches n%, so no matter how many its rate of propagation is, all come out screened.This shows that method of the present invention can effectively filter out the malice file at the diffusion characteristic of rogue program file, reduces the time of follow-up discriminating rogue program.
In addition, need to prove that what take for the information gathering of client is that machine is collected, and is not manual collection, the row cache of going forward side by side has so just guaranteed that to the user information source is anonymous, and information does not have related with the user.And owing to temporarily leave in the buffer area, concerning the user, server end does not have permanent storage client's relevant information.In addition, what server end extracted is executable file, rather than leaves the document on the subscriber set in.
Above embodiment describes the only unrestricted technical scheme of the present invention in order to explanation.Any modification or partial replacement that does not break away from spirit and scope of the invention should be encompassed in the middle of the claim scope of the present invention.