CN104598816B - A kind of file scanning method and device - Google Patents
A kind of file scanning method and device Download PDFInfo
- Publication number
- CN104598816B CN104598816B CN201410806302.8A CN201410806302A CN104598816B CN 104598816 B CN104598816 B CN 104598816B CN 201410806302 A CN201410806302 A CN 201410806302A CN 104598816 B CN104598816 B CN 104598816B
- Authority
- CN
- China
- Prior art keywords
- model
- file
- malicious file
- detected
- malicious
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Abstract
The embodiment of the invention provides a kind of file scanning method and device.On the one hand, the embodiment of the present invention judges the type of file to be detected by using M the first model respectively, and to obtain M judged result, M is the integer more than or equal to 2;So as to according to the M judged result, the file to be detected is judged to the number of the first model of malicious file for acquisition;And then, according to the number of first model that the file to be detected is judged to malicious file, obtain the type of the file to be detected.Therefore, judgment models are to the detection performance of malicious file during technical scheme provided in an embodiment of the present invention can be realized improving file scan.
Description
【Technical field】
The present invention relates to field of computer technology, more particularly to a kind of file scanning method and device.
【Background technology】
File scanning method based on machine learning, its basic thought is:The characteristic vector of the file of known type is calculated,
Then machine training is carried out using characteristic vector, obtains judgment models, the file of UNKNOWN TYPE is judged using judgment models
Type, to detect malicious file therein.
However, new malicious file can continuously emerge over time, and the training method based on machine learning
In, the judgment models of acquisition are all single models, therefore, the judgment models used in file scan are being faced in the prior art
During emerging malicious file, the detection performance to malicious file is relatively low.
【The content of the invention】
In view of this, a kind of file scanning method and device be the embodiment of the invention provides, it is possible to achieve improve file and sweep
Detection performance of the judgment models to malicious file during retouching.
A kind of one side of the embodiment of the present invention, there is provided file scanning method, including:
Judge the type of file to be detected respectively using M the first models, to obtain M judged result, M be more than or
Integer equal to 2;
According to the M judged result, the file to be detected is judged to the number of the first model of malicious file for acquisition
Mesh;
According to the number of first model that the file to be detected is judged to malicious file, obtain described to be detected
The type of file.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described according to institute
The number of the first model that the file to be detected is judged to malicious file is stated, the type of the file to be detected, bag is obtained
Include:
Compare by the file to be detected be judged to malicious file the first model number and default first threshold
Size;
If the file to be detected to be judged to the number of the first model of malicious file less than the first threshold, it is determined that
The file to be detected is normal file;
If the file to be detected is judged to, the number of the first model of malicious file is more than or equal to described first
Threshold value, determines that the file to be detected is malicious file.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, methods described is also
Including:
Emerging malicious file is obtained, as training sample;
Machine training is carried out using the training sample, to generate the second model;
The M the first model is adjusted using second model.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the M first
Model constitutes first set;It is described the M the first model is adjusted using second model, including:
Second model is added to default second set, the second set includes K the second model, and K is big
In 0 integer;
According to first model in second model and the first set in the second set, generation
P model group, P is more than 0 and less than or equal to the product of M and K;
Using the second model in model group each described, replaced in the first set and belong to the first of the model group
Model, to obtain P the 3rd set;
Obtain the malicious file recall rate and malicious file error rate of each the 3rd set;
According to the malicious file recall rate and malicious file error rate of the 3rd set each described, one the described 3rd is selected
Set;
The first set is adjusted using the second model in the corresponding model group of the 3rd set selected.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described using choosing
The second model in the corresponding model group of the 3rd set for going out is adjusted to the first set, including:
Compare the malicious file recall rate of the 3rd set selected and the malicious file recall rate of the first set
Size, and compare select it is described 3rd set malicious file error rate and the first set malicious file mistake
The size of rate;
If the malicious file recall rate of the malicious file recall rate more than the first set of the 3rd set selected,
And the malicious file error rate of the malicious file error rate less than the first set of the 3rd set selected, using described
The second model in the corresponding model group of 3rd set, replaces the first model for belonging to the model group in the first set,
Or, the second model in the corresponding model group of the 3rd set is increased in the first set.
A kind of one side of the embodiment of the present invention, there is provided file scanning device, including:
Type judging unit, the type for being judged file to be detected respectively using M the first model, is sentenced with obtaining M
Disconnected result, M is the integer more than or equal to 2;
As a result statistic unit, malice text is judged to for according to the M judged result, obtaining by the file to be detected
The number of the first model of part;
Type determining units, for the number according to first model that the file to be detected is judged to malicious file
Mesh, obtains the type of the file to be detected.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the type is true
Order unit, specifically for:
Compare by the file to be detected be judged to malicious file the first model number and default first threshold
Size;
If the file to be detected to be judged to the number of the first model of malicious file less than the first threshold, it is determined that
The file to be detected is normal file;
If the file to be detected is judged to, the number of the first model of malicious file is more than or equal to described first
Threshold value, determines that the file to be detected is malicious file.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, described device is also
Including:
File obtaining unit, for obtaining emerging malicious file, as training sample;
Model generation unit, for carrying out machine training using the training sample, to generate the second model;
Model adjustment unit, for being adjusted to the M the first model using second model.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the M first
Model constitutes first set;The model adjustment unit, specifically for:
Second model is added to default second set, the second set includes K the second model, and K is big
In 0 integer;
According to first model in second model and the first set in the second set, generation
P model group, P is more than 0 and less than or equal to the product of M and K;
Using the second model in model group each described, replaced in the first set and belong to the first of the model group
Model, to obtain P the 3rd set;
Obtain the malicious file recall rate and malicious file error rate of each the 3rd set;
According to the malicious file recall rate and malicious file error rate of the 3rd set each described, one the described 3rd is selected
Set;
The first set is adjusted using the second model in the corresponding model group of the 3rd set selected.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the model are adjusted
Whole unit is used to adjust the first set using the second model in the corresponding model group of the 3rd set selected
When whole, specifically for:
Compare the malicious file recall rate of the 3rd set selected and the malicious file recall rate of the first set
Size, and compare select it is described 3rd set malicious file error rate and the first set malicious file mistake
The size of rate;
If the malicious file recall rate of the malicious file recall rate more than the first set of the 3rd set selected,
And the malicious file error rate of the malicious file error rate less than the first set of the 3rd set selected, using described
The second model in the corresponding model group of 3rd set, replaces the first model for belonging to the model group in the first set,
Or, the second model in the corresponding model group of the 3rd set is increased in the first set.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantages that:
In technical scheme provided in an embodiment of the present invention, the type decision of file to be detected is carried out using multiple models, and
Comprehensive descision is carried out to the type of file to be detected according to the result of determination of multiple models, such that it is able to realize improving file scan
During judgment models to the detection performance of malicious file, improve Detection accuracy of the judgment models to malicious file.
【Brief description of the drawings】
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be attached to what is used needed for embodiment
Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for this area
For those of ordinary skill, without having to pay creative labor, can also obtain other attached according to these accompanying drawings
Figure.
Fig. 1 is the schematic flow sheet of the embodiment one of the file scanning method that the embodiment of the present invention is provided;
Fig. 2 is the exemplary plot that the judgment models that the embodiment of the present invention is provided are judged file to be detected;
Fig. 3 is the schematic flow sheet of the embodiment two of the file scanning method that the embodiment of the present invention is provided;
Fig. 4 is the functional block diagram of the file scanning device that the embodiment of the present invention is provided.
【Specific embodiment】
In order to be better understood from technical scheme, the embodiment of the present invention is retouched in detail below in conjunction with the accompanying drawings
State.
It will be appreciated that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.Base
Embodiment in the present invention, those of ordinary skill in the art obtained under the premise of creative work is not made it is all its
Its embodiment, belongs to the scope of protection of the invention.
The term for using in embodiments of the present invention is the purpose only merely for description specific embodiment, and is not intended to be limiting
The present invention." one kind ", " described " and " being somebody's turn to do " of singulative used in the embodiment of the present invention and appended claims
It is also intended to include most forms, unless context clearly shows that other implications.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, represent
There may be three kinds of relations, for example, A and/or B, can represent:Individualism A, while there is A and B, individualism B these three
Situation.In addition, character "/" herein, typicallys represent forward-backward correlation pair as if a kind of relation of "or".
It will be appreciated that though in embodiments of the present invention may be using term first, second etc. is come description collections or retouches
Model is stated, but these keywords should not necessarily be limited by these terms.These terms are only used for being distinguished from each other open keyword.For example,
In the case of not departing from range of embodiment of the invention, first set can also be referred to as second set, similarly, second set
First set can be referred to as.
Depending on linguistic context, word as used in this " if " can be construed to " ... when " or " when ...
When " or " in response to determining " or " in response to detection ".Similarly, depending on linguistic context, phrase " if it is determined that " or " if detection
(condition or event of statement) " can be construed to " when it is determined that when " or " in response to determine " or " when the detection (condition of statement
Or event) when " or " in response to detection (condition or event of statement) ".
Embodiment one
The embodiment of the present invention provides a kind of file scanning method, refer to Fig. 1, its text provided by the embodiment of the present invention
The schematic flow sheet of the embodiment one of part scan method, as illustrated, the method is comprised the following steps:
S101, the type of file to be detected is judged using M the first model respectively, and to obtain M judged result, M is big
In or equal to 2 integer.
Specifically, refer to Fig. 2, it is judged file to be detected by the judgment models that the embodiment of the present invention is provided
Exemplary plot, as illustrated, in the embodiment of the present invention, sentenced using M type of first model respectively to file to be detected
It is disconnected, to obtain M judged result.
Wherein, M is the integer more than or equal to 2.
It should be noted that M the first model is generated by machine training using incomplete same training sample,
The diversity of training sample is improved, be ensure that and cover to a greater extent various malicious files.
Preferably, each first model can be to training sample using nearest neighbor algorithm (k-Nearest Neighbor, kNN)
Originally the model that machine training is obtained is carried out, kNN models are properly termed as;Or, each first model can also be using simple shellfish
This algorithm of leaf carries out the model that machine training is obtained to training sample;Or, each first model can also be using support to
Amount machine (Support Vector Machine, SVM) carries out the model that machine training is obtained to training sample, is properly termed as SVM
Model.
It should be noted that M the first model can be the model obtained using identical algorithms, such as all it is SVM models.Or
Person, M the first model can also not exclusively be the model obtained using identical algorithms, and such as a part is SVM models, another part
It is kNN models.
So that the first model is SVM models as an example, the first model is illustrated to the method that file to be detected is judged
Explanation:The first model obtained by training is a matrix, and the number arranged in the matrix is equal to the number of types of file to be detected
Mesh, in the embodiment of the present invention, the type of file can include normal file and malicious file, therefore, the number of row is equal to 2.The
When one model is judged file to be detected, the characteristic vector of the first file to be detected according to UNKNOWN TYPE, by UNKNOWN TYPE
File to be detected characteristic vector and the matrix multiple, in the vector of acquisition, the numerical value correspondence of each element is by text to be detected
Part is judged to the score of respective type, and score type high is exactly the type that the file to be detected is determined.
Wherein, the method for obtaining the characteristic vector of the file to be detected of the UNKNOWN TYPE can be included but is not limited to:Treat
Detection file is characterized in a string of binary characters for 4 byte longs, and such as 0x01234567 and 0x89ABCDEF are features.
, it is necessary to pre-configured characteristic set when extracting the characteristic vector of file to be detected, for example, [0x01234567,
0x89ABCDEF, 0xAAAABBBB] it is characteristic set.Then, some files to be detected of sequential scan, check in characteristic set
Whether each feature is present in file to be detected, if it is, element corresponding with this feature in characteristic vector is set to
1, otherwise it is set to 0.For example, 0x01234567 and 0xAAAABBBB exists in certain file to be detected, and 0x89ABCDEF is not deposited
, then the characteristic vector of the file to be detected is [1,0,1].
S102, according to the M judged result, the file to be detected is judged to the first model of malicious file for acquisition
Number.
Specifically, as shown in Fig. 2 obtain M judged result after, count in the M judged result, file to be detected is sentenced
It is set to the number k of the first model of malicious file, that is to say, that how many first model treats this in M judged result of statistics
Detection file is judged to malicious file.
S103, according to the number of first model that the file to be detected is judged to malicious file, obtains described
The type of file to be detected.
Specifically, as shown in Fig. 2 obtain file to be detected is judged to the number k of the first model of malicious file after,
Compare the size t of number k and the default first threshold of the first model that file to be detected is judged to malicious file.
If file to be detected is judged to, the number k of the first model of malicious file is less than default first threshold t, really
Fixed file to be detected is normal file.
If conversely, file to be detected is judged to the number k of the first model of malicious file more than or equal to default
First threshold t, it is determined that file to be detected is malicious file.
This way it is possible to realize to the type decision of file to be detected, to detect that the file to be detected is malice text
Part.
It should be noted that be all in the prior art the type decision that file to be detected is carried out using single model, whenever
When updating occurs in training sample, such as there is new malicious file, it is necessary to which machine training is re-started based on all training samples,
To generate new model, the type decision of file to be detected is carried out using new model, so as to bring huge computing cost,
And the renewal efficiency comparison of model is low.In the embodiment of the present invention, the type decision of file to be detected is carried out using multiple models, adopted
The benefit for carrying out the type decision of file to be detected with multiple models can be conveniently to carry out the increase or replacement of model, with profit
The ability of malicious file is persistently detected in the maintenance of file scanning device, and lifting.When the renewal for needing to be trained sample
When, it is only necessary to machine training is carried out based on emerging training sample, a new model is generated, and be added to M judgement mould
In type, it is not necessary to re-start machine training based on all training samples, therefore computing cost can be reduced, improve model
Renewal efficiency, improve model to the detection performance of malicious file, improve the recall rate of malicious file, reduce the mistake of malicious file
Report rate.
Embodiment two
Fig. 3 is refer to, the schematic flow sheet of the embodiment two of its file scanning method provided by the embodiment of the present invention,
As shown in figure 3, being based on above-mentioned file scanning method, this document scan method can also include:
Emerging malicious file is obtained, as training sample;
Machine training is carried out using the training sample, to generate the second model;
The M the first model is adjusted using second model.
In the embodiment of the present invention, by way of carrying out constantly adjustment to M the first model, persistently to lift file scan
The detectability to malicious file of device.
For example, can be by the emerging malicious file that obtains in the recent period as training sample, when malicious file accumulation to
During fixed number amount, such as 10,000, it is possible to machine training is carried out using training sample, to generate the second model.Wherein it is possible to profit
The malicious file for determining in aforementioned manners, carries out desk checking, if desk checking result is malicious file, can be as newly going out
Existing training sample, for machine training.Conversely, if desk checking result is normal file, will not be used as training sample.
Preferably, in the embodiment of the present invention, second model can be kNN models;Or, the second model can also be
The judgment models that machine training is obtained are carried out to training sample using NB Algorithm;Or, the second model can also be
SVM models.
Preferably, can be carried out entering M the first model using the second model at interval of a period of time, such as one month
The operation of row adjustment, to realize the M renewal of the first model, adjustment, to detect emerging malicious file.
Below to being illustrated to the method that the M the first model is adjusted using second model.
First, M the first model composition first set, first set is to be used to carry out the class of file to be detected on line
The set that type judges, so first set is properly termed as Online.The second model for generating is added to default second set,
The second set includes K the second models, and K is the integer more than 0, second set as first set Online standby collection
Close, so second set is properly termed as Backup.
Then, first set Online and second set Backup is traveled through, one the is taken from first set Online
One model Online [i], i take the integer in 1 to M.And, a second Model B ackup is taken from second set Backup
[j], j takes the integer in 1 to K.According in the second Model B ackup [j] and the first set in second set
One the first model Online [i], generates P model group, and P is more than 0 and less than or equal to the product of M and K.Remember each model
Group is { Online [i], Backup [j] }.
Then, using the second Model B ackup [j] in model group each described, replaced in the first set Online
The the first model Online [i] for belonging to the model group is changed, to obtain P the 3rd set.Wherein, each the 3rd set can be designated as
New [i, j], so, just comprising other in individual first models of M in addition to first model Online [i] in the 3rd set
First model, also comprising a new second Model B ackup [j] for adding.
It should be noted that the treatment is carried out for each model group, to generate corresponding 3rd set, so there is P
Individual model group, just there is P the 3rd set.
Finally, the judgement of file type is carried out to the file of some UNKNOWN TYPEs using each the 3rd set, it is every to obtain
The malicious file recall rate and malicious file error rate of individual 3rd set.Then, according to the evil of the 3rd set each described
Meaning file recall rate and malicious file error rate, select the 3rd set.And it is right using the 3rd set selected
The second model in the model group answered is adjusted to the first set.
Wherein, malicious file recall rate is equal to the 3rd set and the judgement of file type is carried out to the file of some UNKNOWN TYPEs
When, the number of the malicious file for correctly the detecting ratio total with malicious file in the file of UNKNOWN TYPE, malicious file inspection
Extracting rate is higher, represents that the 3rd set is capable of detecting when more malicious files.
Wherein, when malicious file error rate carries out the judgement of file type equal to the 3rd set to the file of UNKNOWN TYPE,
Normal file is judged to the total ratio of the number of malicious file and the file of UNKNOWN TYPE, malicious file error rate is got over
It is low, represent that the accuracy rate of the 3rd set detection malicious file is higher.
Preferably, according to the malicious file recall rate and malicious file error rate of the 3rd set each described, one is selected
The method of the 3rd set can be included but is not limited to:According to the malicious file recall rate and malicious file of each the 3rd set
The ratio of error rate, obtains the efficiency ratio of each the 3rd set.Then according to the order that efficiency ratio is descending, to P the 3rd
Set is ranked up, to obtain ranking results, the 3rd set made number one in selected and sorted result, i.e., in P the 3rd collection
Maximum the 3rd set New [i, j] of efficiency ratio is found in conjunction.
Preferably, using the second Model B ackup in the corresponding model group of the 3rd set New [i, j] selected
The method that [j] is adjusted to the first set Online can be included but is not limited to:
Compare the malicious file recall rate of the 3rd set New [i, j] selected with the first set Online's
The size of malicious file recall rate, and compare the malicious file error rate of the 3rd set New [i, j] selected with it is described
The size of the malicious file error rate of first set Online.
If the malicious file recall rate of the 3rd set New [i, j] selected is more than the first set Online's
Malicious file recall rate, and the malicious file error rate of the 3rd set New [i, j] selected is less than the first set
The malicious file error rate of Online, represents the malicious file recall rate and malicious file error rate of the 3rd set New [i, j]
The malicious file recall rate and malicious file error rate of better than currently used first set Online, then need using choosing
The second Model B ackup in the corresponding model group of the 3rd set New [i, j] { Online [i], Backup [j] } for going out
[j], is adjusted to the first set Online, and the adjustment can include:If the first model in first set Online
The number of Online [i] reaches default model threshold, then using the corresponding model group { Online of the 3rd set New [i, j]
[i], Backup [j] } in the second Model B ackup [j], in the first set Online replace belong to the model group
First model Online [i].Or, if the number of the first model Online [i] in first set Online is also not reaching to
Default model threshold, can directly increase the corresponding model groups of the 3rd set New [i, j] in first set Online
The second Model B ackup [j] in { Online [i], Backup [j] }.
If conversely, the malicious file recall rate of the 3rd set New [i, j] selected is less than or equal to described first
The malicious file recall rate of set Online, and/or, the malicious file error rate of the 3rd set New [i, j] selected is big
In or equal to the first set Online malicious file error rate, represent the 3rd set New [i, j] malicious file examine
Extracting rate and/or malicious file error rate malicious file recall rate and evil not better than currently used first set Online
Meaning file error rate, then do not utilize the second model in the corresponding model groups of the 3rd set New [i, j] selected to first set
Online is adjusted, and keeps current first set Online constant.
It should be noted that terminal involved in the embodiment of the present invention can include but is not limited to personal computer
(Personal Computer, PC), personal digital assistant (Personal Digital Assistant, PDA), wireless handheld
Equipment, panel computer (Tablet Computer), mobile phone, MP3 player, MP4 players etc..
It should be noted that the executive agent of above-mentioned file scanning method can be file scanning device, the device can be with
The application of terminal is located locally, or can also be the plug-in unit or SDK being located locally in the application of terminal
Functional units such as (Software Development Kit, SDK), the embodiment of the present invention is not particularly limited to this.
It is understood that the application can be mounted in the application program (nativeApp) in terminal, or may be used also
To be a web page program (webApp) of browser in terminal, the embodiment of the present invention is not defined to this.
The embodiment of the present invention further provides the device embodiment for realizing each step and method in above method embodiment.
Fig. 4 is refer to, the functional block diagram of its file scanning device provided by the embodiment of the present invention.As illustrated,
The device includes:
Type judging unit 401, the type for judging file to be detected respectively using M the first model, to obtain M
Judged result, M is the integer more than or equal to 2;
As a result, be judged to for the file to be detected to dislike for according to the M judged result, obtaining by statistic unit 402
The number of the first model of meaning file;
Type determining units 403, for according to first model that the file to be detected is judged to malicious file
Number, obtain the type of the file to be detected.
Preferably, the type determining units 403, specifically for:
Compare by the file to be detected be judged to malicious file the first model number and default first threshold
Size;
If the file to be detected to be judged to the number of the first model of malicious file less than the first threshold, it is determined that
The file to be detected is normal file;
If the file to be detected is judged to, the number of the first model of malicious file is more than or equal to described first
Threshold value, determines that the file to be detected is malicious file.
Preferably, described device also includes:
File obtaining unit 404, for obtaining emerging malicious file, as training sample;
Model generation unit 405, for carrying out machine training using the training sample, to generate the second model;
Model adjustment unit 406, for being adjusted to the M the first model using second model.
Preferably, the M the first model composition first set;The model adjustment unit 406, specifically for:
Second model is added to default second set, the second set includes K the second model, and K is big
In 0 integer;
According to first model in second model and the first set in the second set, generation
P model group, P is more than 0 and less than or equal to the product of M and K;
Using the second model in model group each described, replaced in the first set and belong to the first of the model group
Model, to obtain P the 3rd set;
Obtain the malicious file recall rate and malicious file error rate of each the 3rd set;
According to the malicious file recall rate and malicious file error rate of the 3rd set each described, one the described 3rd is selected
Set;
The first set is adjusted using the second model in the corresponding model group of the 3rd set selected.
Preferably, the model adjustment unit 406 is used for using in the corresponding model group of the 3rd set selected
When second model is adjusted to the first set, specifically for:
Compare the malicious file recall rate of the 3rd set selected and the malicious file recall rate of the first set
Size, and compare select it is described 3rd set malicious file error rate and the first set malicious file mistake
The size of rate;
If the malicious file recall rate of the malicious file recall rate more than the first set of the 3rd set selected,
And the malicious file error rate of the malicious file error rate less than the first set of the 3rd set selected, using described
The second model in the corresponding model group of 3rd set, replaces the first model for belonging to the model group in the first set,
Or, the second model in the corresponding model group of the 3rd set is increased in the first set.
Because each unit in the present embodiment is able to carry out the method shown in Fig. 1~Fig. 3, what the present embodiment was not described in detail
Part, refers to the related description to Fig. 1~Fig. 3.
The technical scheme of the embodiment of the present invention has the advantages that:
The embodiment of the present invention judges the type of file to be detected by using M the first model respectively, to obtain M judgement
As a result, M is the integer more than or equal to 2;So as to according to the M judged result, obtain and judge the file to be detected
It is the number of the first model of malicious file;And then, the file to be detected is judged to the first of malicious file according to described
The number of model, obtains the type of the file to be detected.
Therefore, in technical scheme provided in an embodiment of the present invention, the type for carrying out file to be detected using multiple models is sentenced
It is fixed, and the result of determination of the multiple models of foundation carries out comprehensive descision to the type of file to be detected, such that it is able to realize improving text
Judgment models improve Detection accuracy of the judgment models to malicious file to the detection performance of malicious file in part scanning process.
In addition, the benefit that the type decision of file to be detected is carried out using multiple models can be conveniently to carry out the increasing of model
Plus or replace, be beneficial to the maintenance of file scanning device, and lifting persistently detects the ability of malicious file.
In addition, in the embodiment of the present invention, can be to being carried out for carrying out multiple models of type decision to file to be detected
Adjustment, is updated with implementation model, and the new model for being used to adjust is obtained using emerging training sample, with prior art
The middle technical scheme for needing to re-start machine training based on all training samples is compared, it is possible to reduce computing cost, improves mould
The renewal efficiency of type, improves detection performance of the model to malicious file, improves the recall rate of malicious file, reduces malicious file
Rate of false alarm.
It is apparent to those skilled in the art that, for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
In several embodiments provided by the present invention, it should be understood that disclosed system, apparatus and method can be with
Realize by another way.For example, device embodiment described above is only schematical, for example, the unit
Divide, only a kind of division of logic function there can be other dividing mode when actually realizing, for example, multiple units or group
Part can be combined or be desirably integrated into another system, or some features can be ignored, or not performed.It is another, it is shown
Or the coupling each other that discusses or direct-coupling or communication connection can be by some interfaces, device or unit it is indirect
Coupling is communicated to connect, and can be electrical, mechanical or other forms.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be according to the actual needs selected to realize the mesh of this embodiment scheme
's.
In addition, during each functional unit in each embodiment of the invention can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list
Unit can both be realized in the form of hardware, it would however also be possible to employ hardware adds the form of SFU software functional unit to realize.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can store and be deposited in an embodied on computer readable
In storage media.Above-mentioned SFU software functional unit storage is in a storage medium, including some instructions are used to so that a computer
Device (can be personal computer, server, or network equipment etc.) or processor (Processor) perform the present invention each
The part steps of embodiment methods described.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. it is various
Can be with the medium of store program codes.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention
Within god and principle, any modification, equivalent substitution and improvements done etc. should be included within the scope of protection of the invention.
Claims (6)
1. a kind of file scanning method, it is characterised in that methods described includes:
Judge the type of file to be detected respectively using M the first models, to obtain M judged result, M be more than or equal to
2 integer;
According to the M judged result, the file to be detected is judged to the number of the first model of malicious file for acquisition;
According to the number of first model that the file to be detected is judged to malicious file, the file to be detected is obtained
Type;
Methods described also includes:Emerging malicious file is obtained, as training sample;Machine is carried out using the training sample
Device is trained, to generate the second model;The M the first model is adjusted using second model, including:
The M the first model composition first set, default second set, second collection are added to by second model
Close comprising K the second model, K is the integer more than 0;
According to first model in second model and the first set in the second set, generation P
Model group, P is more than 0 and less than or equal to the product of M and K;Using the second model in model group each described, described
The first model for belonging to the model group is replaced in one set, to obtain P the 3rd set;
Obtain the malicious file recall rate and malicious file error rate of each the 3rd set;According to the 3rd set each described
Malicious file recall rate and malicious file error rate, select the 3rd set;
The first set is adjusted using the second model in the corresponding model group of the 3rd set selected.
2. method according to claim 1, it is characterised in that described to be judged to the file to be detected to dislike according to described
The number of the first model of meaning file, obtains the type of the file to be detected, including:
Compare the size of number and the default first threshold of the first model that the file to be detected is judged to malicious file;
If the file to be detected to be judged to the number of the first model of malicious file less than the first threshold, it is determined that described
File to be detected is normal file;
If the file to be detected to be judged to the number of the first model of malicious file more than or equal to the first threshold,
Determine that the file to be detected is malicious file.
3. method according to claim 1, it is characterised in that described using the corresponding model of the 3rd set selected
The second model in group is adjusted to the first set, including:
The malicious file recall rate for comparing the 3rd set selected is big with the malicious file recall rate of the first set
It is small, and the malicious file error rate and the malicious file error rate of the first set for comparing the 3rd set selected
Size;
If the malicious file recall rate of the malicious file recall rate more than the first set of the 3rd set selected, and choosing
The malicious file error rate of the malicious file error rate less than the first set of the 3rd set for going out, using the described 3rd
Gather the second model in corresponding model group, the first model for belonging to the model group is replaced in the first set, or,
Increase the second model in the corresponding model group of the 3rd set in the first set.
4. a kind of file scanning device, it is characterised in that described device includes:
Type judging unit, the type for judging file to be detected respectively using M the first model, knot is judged to obtain M
Really, M is the integer more than or equal to 2;
As a result statistic unit, for according to the M judged result, the file to be detected to be judged to malicious file by acquisition
The number of the first model;
Type determining units, for the number according to first model that the file to be detected is judged to malicious file,
Obtain the type of the file to be detected;
Described device also includes:
File obtaining unit, for obtaining emerging malicious file, as training sample;
Model generation unit, for carrying out machine training using the training sample, to generate the second model;
Model adjustment unit, for being adjusted to the M the first model using second model;
The M the first model composition first set;The model adjustment unit, specifically for:
Second model is added to default second set, the second set includes K the second model, and K is more than 0
Integer;
According to first model in second model and the first set in the second set, generation P
Model group, P is more than 0 and less than or equal to the product of M and K;
Using the second model in model group each described, the first mould for belonging to the model group is replaced in the first set
Type, to obtain P the 3rd set;
Obtain the malicious file recall rate and malicious file error rate of each the 3rd set;
According to the malicious file recall rate and malicious file error rate of the 3rd set each described, the 3rd collection is selected
Close;
The first set is adjusted using the second model in the corresponding model group of the 3rd set selected.
5. device according to claim 4, it is characterised in that the type determining units, specifically for:
Compare the size of number and the default first threshold of the first model that the file to be detected is judged to malicious file;
If the file to be detected to be judged to the number of the first model of malicious file less than the first threshold, it is determined that described
File to be detected is normal file;
If the file to be detected to be judged to the number of the first model of malicious file more than or equal to the first threshold,
Determine that the file to be detected is malicious file.
6. device according to claim 4, it is characterised in that the model adjustment unit is used for using described for selecting
When the second model in the corresponding model group of three set is adjusted to the first set, specifically for:
The malicious file recall rate for comparing the 3rd set selected is big with the malicious file recall rate of the first set
It is small, and the malicious file error rate and the malicious file error rate of the first set for comparing the 3rd set selected
Size;
If the malicious file recall rate of the malicious file recall rate more than the first set of the 3rd set selected, and choosing
The malicious file error rate of the malicious file error rate less than the first set of the 3rd set for going out, using the described 3rd
Gather the second model in corresponding model group, the first model for belonging to the model group is replaced in the first set, or,
Increase the second model in the corresponding model group of the 3rd set in the first set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410806302.8A CN104598816B (en) | 2014-12-22 | 2014-12-22 | A kind of file scanning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410806302.8A CN104598816B (en) | 2014-12-22 | 2014-12-22 | A kind of file scanning method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104598816A CN104598816A (en) | 2015-05-06 |
CN104598816B true CN104598816B (en) | 2017-07-04 |
Family
ID=53124593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410806302.8A Active CN104598816B (en) | 2014-12-22 | 2014-12-22 | A kind of file scanning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104598816B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992969B (en) * | 2019-03-25 | 2023-03-21 | 腾讯科技(深圳)有限公司 | Malicious file detection method and device and detection platform |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102346828A (en) * | 2011-09-20 | 2012-02-08 | 海南意源高科技有限公司 | Malicious program judging method based on cloud security |
CN102799804A (en) * | 2012-04-30 | 2012-11-28 | 珠海市君天电子科技有限公司 | Comprehensive identification method and system for security of unknown file |
EP2597569A1 (en) * | 2011-11-24 | 2013-05-29 | Kaspersky Lab Zao | System and method for distributing processing of computer security tasks |
CN103177217A (en) * | 2013-04-08 | 2013-06-26 | 腾讯科技(深圳)有限公司 | File scan method, file scan system, client-side and server |
CN104091122A (en) * | 2014-06-17 | 2014-10-08 | 北京邮电大学 | Detection system of malicious data in mobile internet |
-
2014
- 2014-12-22 CN CN201410806302.8A patent/CN104598816B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102346828A (en) * | 2011-09-20 | 2012-02-08 | 海南意源高科技有限公司 | Malicious program judging method based on cloud security |
EP2597569A1 (en) * | 2011-11-24 | 2013-05-29 | Kaspersky Lab Zao | System and method for distributing processing of computer security tasks |
CN102799804A (en) * | 2012-04-30 | 2012-11-28 | 珠海市君天电子科技有限公司 | Comprehensive identification method and system for security of unknown file |
CN103177217A (en) * | 2013-04-08 | 2013-06-26 | 腾讯科技(深圳)有限公司 | File scan method, file scan system, client-side and server |
CN104091122A (en) * | 2014-06-17 | 2014-10-08 | 北京邮电大学 | Detection system of malicious data in mobile internet |
Also Published As
Publication number | Publication date |
---|---|
CN104598816A (en) | 2015-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Twitter spam detection: Survey of new approaches and comparative study | |
Kim et al. | Genetic algorithm to improve SVM based network intrusion detection system | |
CN111428231B (en) | Safety processing method, device and equipment based on user behaviors | |
CN103024746B (en) | System and method for processing spam short messages for telecommunication operator | |
Uysal et al. | The impact of feature extraction and selection on SMS spam filtering | |
CN104217160B (en) | A kind of Chinese detection method for phishing site and system | |
Sheikhi et al. | An effective model for SMS spam detection using content-based features and averaged neural network | |
Kim et al. | Fusions of GA and SVM for anomaly detection in intrusion detection system | |
CN107786575A (en) | A kind of adaptive malice domain name detection method based on DNS flows | |
CN103106365B (en) | The detection method of the malicious application software on a kind of mobile terminal | |
CN109450845B (en) | Detection method for generating malicious domain name based on deep neural network algorithm | |
WO2016201938A1 (en) | Multi-stage phishing website detection method and system | |
Tsai et al. | D2S: document-to-sentence framework for novelty detection | |
Lu et al. | Telecom fraud identification based on ADASYN and random forest | |
Rajalakshmi et al. | Web page classification using n-gram based URL features | |
CN108023868A (en) | Malice resource address detection method and device | |
CN104598595A (en) | Fraud webpage detection method and corresponding device | |
CN103618744A (en) | Intrusion detection method based on fast k-nearest neighbor (KNN) algorithm | |
CN112488716A (en) | Abnormal event detection system | |
CN103412940A (en) | Method for detecting fraud telephones | |
CN107231383B (en) | CC attack detection method and device | |
CN106910135A (en) | User recommends method and device | |
Bingol et al. | Rumor Detection in Social Media using machine learning methods | |
CN111753299A (en) | Unbalanced malicious software detection method based on packet integration | |
CN104598816B (en) | A kind of file scanning method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20190812 Address after: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer 2 Patentee after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. Address before: 100193 room 1-01, 1-03, 1-04, block C, software Plaza, building 4, No. 8, Mong West Road, Beijing, Haidian District Patentee before: Pacify a Heng Tong (Beijing) Science and Technology Ltd. |
|
TR01 | Transfer of patent right |