CN110034921A - The webshell detection method of hash is obscured based on cum rights - Google Patents
The webshell detection method of hash is obscured based on cum rights Download PDFInfo
- Publication number
- CN110034921A CN110034921A CN201910311319.9A CN201910311319A CN110034921A CN 110034921 A CN110034921 A CN 110034921A CN 201910311319 A CN201910311319 A CN 201910311319A CN 110034921 A CN110034921 A CN 110034921A
- Authority
- CN
- China
- Prior art keywords
- hash
- cum rights
- webshell
- fragment
- hash value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0643—Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
Abstract
The invention belongs to cyberspace security technology areas, disclose a kind of webshell detection method and system that hash algorithm is obscured based on cum rights, by by file fragmentation to be detected;Every is asked hash and weight, assigns weight to each fragment, the core fragment of dangerous function gives biggish weight, while considering the comentropy of each fragment, and information entropy is bigger, and the weight given is smaller;The hash of each fragment be spliced into fuzzy hash go here and there and calculate total weight value obtain cum rights obscure hash value;The cum rights of file to be detected is obscured into hash value compared with being stored in advance in each webshell cum rights in fingerprint base and obscuring hash value successively.The present invention can effectively adapt to the very big situation of test object size variation compared with traditional fuzzy hash algorithm, have well adapting to property, the Detection accuracy of mutation sample is greatly improved, and improve anti-interference.
Description
Technical field
The invention belongs to cyberspace security technology area more particularly to a kind of webshell that hash is obscured based on cum rights
Detection method and system.
Background technique
Currently, the immediate prior art:
Webshell is a kind of malice back door write with scripting language such as jsp, asp, php etc., and attacker is using such as
After the website vulnerabilities such as sql injection, file upload upload the back door webshell acquisition permission, it can be repaired by remotely executing order
Change deletion or addition server file, the user data that can also be directly viewable in server database.
Since webshell operation will not leave record in system security log, and one is mixed in normal web page files
It rises, general management person is difficult to find out invasion trace, and the advanced back door webshell will use various technologies also to escape detection, institute
It is more more and more urgent to study efficient and accurately webshell detection method demand.
Dynamic detection and static detection are broadly divided into for the detection of webshell at present.Dynamic detection is according to malicious code
Behavioral characteristics when execution, including malicious act and API Calls behavior etc..
Static detection is mainly to analyze the semantic feature of webshell, and static detection speed is fast, detection feature is obvious, for
The research of webshell detection method focuses mostly on based on static detection method.Static detection method is quiet dependent on webshell
State feature, since webshell is mostly by scripting language, easily modification deformation, when webshell carry out simple mutation or by its
When condition code is deliberately obscured, conventional method can fail to report such webshell, therefore examine currently based on the webshell of characteristic matching
Survey method is difficult quickly to detect and identify the mutation of webshell.
Therefore, how to overcome traditional unicity and hysteresis quality based on the matched webshell detection mode of condition code, answer
Means are obscured to the text of webshell, realizes quickly detection webshell and its mutation, is all art technology all the time
The emphasis of personnel's concern.2006, Jesse Kornblum proposed fuzzy hash algorithm, unlike traditional hash, text
The partial content of part changes the partial content that can only change corresponding fuzzy hash value, so to be mainly used in file similar for the algorithm
Degree compares.And it is essentially also that redundancy is added on original malicious script that the webshell that attacker uses, which escapes means,
It upsets, hide static nature or reduces ratio of the static nature in malicious script, so the malice foot after redundancy is upset is added
This and former script have certain similitude, are based on this, have scholar that fuzzy hash is applied to Malicious Code Detection.
The ModSecurity of latest edition increases the webshell detection interface of traditional fuzzy hash algorithm ssdeep.
Patent CN201110375166 designs a kind of system and method using fuzzy hash algorithm detection malicious code, visitor
Family end is used to calculate the fuzzy hash value of object to be detected, and is transferred to cloud server;Cloud server will be received fuzzy
Cryptographic Hash is compared with the fuzzy hash value of storage blacklist, by the similarity of the fuzzy hash value compared according to judgement
Strategy, which is formed, determines that result is sent to client.
The fuzzy hash value of script file is calculated in patent 201710078106.7 using fuzzy hash algorithm, and will meter
The fuzzy hash value that script file is prestored in the fuzzy hash value and fingerprint base calculated is compared, to filter out and prestore foot
The unmatched script file of this document.
Patent CN201710352331 is by extracting character string used in APK;By the word extracted
Symbol string is converted into feature vector, to generate fuzzy hash value;The fuzzy hash value is gathered using k-means algorithm
Class, and using Hamilton distance as the similarity between measurement, realize the detection to mutation Malware.Above-mentioned document is equal
Using traditional fuzzy hash algorithm, there is a problem of that adaptability is poor, it is very big not adapt to test object size variation
Situation;Traditional fuzzy hash algorithm is mainly used in file similarity and compares, and does not account for the particularity of detection malicious code,
Testing result rate of failing to report is higher.It is fuzzy to propose a kind of cum rights for webshell feature and its escape detection method by the present invention
Hash algorithm is applied to webshell and detects, effectively solves the above problems.
In conclusion problem of the existing technology is:
(1) in practical applications, the difference in size of webshell is very big, only may in short be no more than 100 bytes,
It may also be more than 200KB, attacker, which often passes through, is added meaningless redundant code reduction key code proportion Interference Detection
To escape detection.Traditional fuzzy hash algorithm, which directly applies to webshell detection, can only resist a certain range of disturbance, and one
As within 6%, and (more than 20%) is added for large range of redundancy, detection effect becomes very poor, and adaptability is bad.
Since the difference in size of webshell is very big, change is easy to beyond traditional fuzzy hash algorithm in small webshell file
Detection range.
(2) traditional fuzzy hash algorithm compares for text similarity, is a kind of universality algorithm, and webshell makees
For a kind of text with specific function, it is that dangerous function, traditional fuzzy hash are calculated with the key difference of plain text
Method does not consider this point.
Solve the meaning of above-mentioned technical problem:
The invention can effectively adapt to the very big situation of test object size variation, have well adapting to property;It fully considers
The particularity of webshell also can preferably detect webshell in low wrong report, overcome traditional fuzzy hash
The deficiency of algorithm improves the accuracy rate of webshell detection.
Summary of the invention
In view of the problems of the existing technology, on the basis of studying webshell feature and it escapes detection method, this
Invention provides a kind of webshell detection method and system that hash is obscured based on cum rights, especially becomes applied to webshell
The webshell of shape is detected.
The invention is realized in this way a kind of webshell detection method for obscuring hash based on cum rights, described to be based on band
The webshell detection method of fuzzy hash is weighed based on the fuzzy hash algorithm of weight distribution, assigns weight to each fragment,
The core fragment of dangerous function gives big weight, improves influence of the dangerous function to last fuzzy hash result.
The comentropy of each fragment is considered simultaneously, and information entropy is bigger, and the weight given is smaller, improves dangerous function pair
The influence for finally obscuring hash result reduces the meaningless influence upset to last fuzzy hash result.
The hash of each fragment be spliced into fuzzy hash go here and there and calculate total weight value obtain cum rights obscure hash value;It will be to be checked
The cum rights for surveying file obscures hash value compared with being stored in advance in each webshell cum rights in fingerprint base and obscuring hash value successively
Realize the detection to webshell and webshel mutation.
Further, the webshell detection method for obscuring hash based on cum rights includes:
Step 1, timeslicing parameters determine;Cum rights obscures hash and carries out fragment triggering using the weak hash function of Alder-32
Condition detection, is rolling hash to file webshell file to be detected;
Step 2, every is asked hash and weight;Using 32 FNV-1 pairs of hash algorithm strong after the entire file fragmentation of determination
Each fragment content is calculated, and obtains one 32 hash values;
Step 3 calculates cum rights and obscures hash value, the corresponding character of the hash of fragment carried out being spliced into order fuzzy
Hash string, while the splicing of total weight being gone here and there below in fuzzy hash, and total weight is the addition of every weight;
Step 4, cum rights obscure hash value and compare, similarity judgement;The cum rights of file to be detected obscures hash value and in advance
It is stored in the fuzzy hash value of each cum rights in fingerprint base successively to compare, if there is any one cum rights in fingerprint base obscures hash
The similarity that value obscures hash value with the cum rights of the file to be detected is more than threshold value, then it is assumed that the file to be detected is
webshell。
Further, in step 1, the text in hash window is denoted as (b1, b2, b3..., bs-1, bs), corresponding hash
Value are as follows:
H (s)=F (b1, b2, b3..., bS-1,bs) (1)
When window slides backward a byte, new hash value calculates the influence of only removal first character section at last
A byte bs+1Effect:
H (s+1)=F (b2, b3..., bs-1, bs, bs+1)=h (s)-X (b1)+Y(bs) (2)
The trigger condition for determining fragment is file size L, rolling window length s and minimum fragment length:
When hash result h (s) is to binitModulus is binitWhen -1, in current window the last byte bsPiece is punished, entirely
The number of file fragmentation is just related to fragment trigger condition.
Further, in step 2, while calculating every hash, cum rights hash calculates every weight simultaneously;And it gives
Each fragment assigns different weights;
Fragment weight equation are as follows:
Wherein w is fragment weight, and K be that take 2, D be dangerous function number in fragment to coefficient default, I for fragment comentropy.
Further, in step 4, final hash string distance is calculated are as follows:
M is final hash string distance, and d is the editing distance of two hash character strings, and L is the length of longer hash string, f
(x) it is the comparison formula of two total weights of hash character string:
k1, k2For coefficient, default takes 4 and 2/5, W1, W2Total weight of respectively two hash character strings;The codomain of f (x)
For [- 1,1], work as W1, W2When difference is smaller, the value of f (x) is positive, and as difference is gradually increased, the value of f (x) becomes negative;
Further, in step 4, after calculating hash string distance, normalized is done to hash string distance and calculates phase
Knowledge and magnanimity map to 0-100:
Wherein, m is final hash value distance, l1, l2For hash value length, editing distance is smaller, obscures hash similarity h
Bigger, text is more similar.
Another object of the present invention is to provide a kind of webshell detection system that hash is obscured based on cum rights, the bases
Include: in the webshell detection system that cum rights obscures hash
Timeslicing parameters determining module obscures hash for cum rights and carries out the detection of fragment trigger condition, rolls to text
hash;
Every is asked hash and weight module, for using hash algorithm FNV-1 32 strong after the entire file fragmentation of determination
Each fragment content is calculated, obtains one 32 hash values;
It calculates cum rights and obscures hash value module, it is fuzzy for carrying out being spliced into the corresponding character of the hash of fragment in order
Hash string, while the splicing of total weight being gone here and there below in hash, and total weight is the addition of every weight;
Cum rights obscures hash value comparison module, similarity judgement;The cum rights of file to be detected obscures hash value and deposits in advance
Storage each cum rights in fingerprint base obscures hash value and successively compares, if there is any one cum rights in fingerprint base obscures hash value
The similarity for obscuring hash value with the cum rights of the file to be detected is more than threshold value, then it is assumed that the file to be detected is
webshell。
Another object of the present invention is to provide a kind of webshell for obscuring hash based on cum rights to detect program, is applied to
Terminal, described obscured based on cum rights obscure hash's based on cum rights described in the webshell detection program realization of hash
Webshell detection method.
Another object of the present invention is to provide a kind of terminal, the terminal, which is carried, realizes that the cum rights that is based on obscures hash
Webshell detection method processor.
Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer
When upper operation, so that computer executes the webshell detection method for obscuring hash based on cum rights.
Another object of the present invention is to provide the detection sides webshell for obscuring hash described in a kind of realize based on cum rights
Net of justice network space safety defends platform.
In conclusion advantages of the present invention and good effect are as follows:
The present invention analyzes webshell feature and its method for escaping detection, and the realization of webshell vicious function relies on
In dangerous function, and most of escape detection methods are all to hide dangerous function or reduction dangerous function proportion.Attacker
When being upset, dangerous function and its neighbouring code will not be changed easily, therefore the present invention is based on cum rights to obscure hash algorithm
Weight is assigned to each fragment, the core fragment of dangerous function gives biggish weight, improves dangerous function to most rear mold
Paste the influence of hash result;The comentropy of each fragment is considered simultaneously, and information entropy is bigger, and the weight given is smaller, reduces
The meaningless influence upset to last fuzzy hash result, the present invention can effectively adapt to the very big feelings of test object size variation
Condition has well adapting to property, the Detection accuracy of mutation sample can be greatly improved, and improve anti-interference.
The present invention is in order to test the antialiasing ability and adaptability that cum rights obscures hash algorithm, disclosed data set from network
In downloaded 2500 webshell samples and 5000 normal samples have carried out three experiments, it is fuzzy to be respectively compared cum rights
The detection effect of hash algorithm and traditional fuzzy hash algorithm to the webshell after obscuring.Judging from the experimental results, cum rights mould
Paste hash, which detects webshell, can obtain detection effect more better than traditional fuzzy hash really and possess preferably anti-
Interference performance.
Experiment one: randomly select part webshell file in proportion with the meaningless random string of radom insertion and intentionally
The code of justice calculates separately the webshell cum rights hash value after being confused and traditional fuzzy hash value as obscuring, thus than
The more different influences for obscuring ratio to two kinds of hash value sizes.
The experimental results showed that the same subgraph is two kinds after obscuring character to same webshell file insertion difference
The variation of hash value, abscissa are that the random of insertion obscures the ratio that character accounts for former webshell text, and ordinate is hash value,
Weight broken line is cum rights hash value, and tradition is traditional fuzzy hash value.From figure, it can be seen that be directed to same file
Obscure, cum rights hash value will be consistently greater than traditional fuzzy hash value, and hash value is bigger, then documentary evidence is more similar, also
It is to say, when the fuzzy hash of cum rights detects webshell, antialiasing ability is higher than traditional fuzzy hash.
Experiment two: having randomly selected 2 webshell samples, calculates separately their cum rights hash value and tradition hash
Value, and repeat to randomly select 100 times, two kinds of obtained hash values are as shown in the figure.
For experiment two, randomly selects two different files and carry out the calculating of hash value, ordinate is hash value, and weight is
Cum rights hash value, tradition are traditional fuzzy hash values, judging from the experimental results, the band of overwhelming majority difference Documents Comparison
Hash value is weighed all below traditional fuzzy hash value, and hash value is smaller, it was demonstrated that file is more dissimilar, so the mistake of cum rights hash
Sentence also lower than traditional fuzzy hash.Meanwhile in testing two figures, also there are several abnormal points, cum rights hash value is significantly greater than
Fuzzy hash value, the reason of generating abnormal point are that weight collision occurred, and so-called weight collision is that is, two files may very not phase
Seemingly, but the weight of the fuzzy hash of its cum rights may be relatively.
Experiment three: 5 webshell samples and 1000 normal samples are had chosen, by the cum rights of 5 webshell samples
Fuzzy hash value and traditional fuzzy hash value are saved respectively as fingerprint base.Then this 5 black samples are obscured, the side of obscuring
Formula is identical as experiment one, i.e., is inserted into meaningless character and significant code in proportion at random.Black sample and white sample after obscuring
This is mixed, and is detected later using the fingerprint base prestored to all samples.
The experimental results showed that the abscissa of all experimental result pictures obscures ratio i.e. institute for the webshell sample of experiment
The ratio that character accounts for original sample is obscured in insertion, and it is 99 that Fig. 4-Fig. 6, which obtains detection threshold, i.e., the hash value compared with fingerprint base is greater than etc.
Just it is judged as webshell in 99.It can be seen that from Fig. 3-Fig. 6 when not making to obscure, cum rights obscures hash and traditional fuzzy
Hash is attained by 100% to the webshell accuracy rate detected, with the increase for obscuring ratio, under the verification and measurement ratio of the two has
Drop, traditional fuzzy hash is directly reduced to zero, and cum rights obscures hash and remains to detect webshell, and effect is also relatively good, but with
This simultaneously, cum rights obscure hash also have certain false detection rate, as shown in fig. 6, experiment in false detection rate be 0.4%, and generate mistake
The reason of inspection is that there are weight collisions, to produce erroneous detection.And the setting of the size and detection threshold of false detection rate also has relationship,
If Fig. 7-Fig. 9 experiment set weight as 96, false detection rate has been increased to 0.8%, but at the same time, it detects accuracy and also obtains
It is promoted, obscures ratio when 20% and 30%, accuracy rate has been increased to 40% from 20%, obscures ratio at 50%, quasi-
True rate has then been increased to 20% from 0.
Detailed description of the invention
Fig. 1 is the webshell detection method flow chart provided in an embodiment of the present invention that hash is obscured based on cum rights.
Fig. 2 is that (insertion is not intended to for hash value after experiment a pair of webshell provided in an embodiment of the present invention obscures in proportion
Adopted character).
Fig. 3 is that the hash value after experiment a pair of webshell provided in an embodiment of the present invention obscures in proportion (is inserted into intentional
The code annotation of justice).
Fig. 4 is that the hash of two pairs of experiment different files provided in an embodiment of the present invention compares figure.
Fig. 5 is that experiment one provided in an embodiment of the present invention is detected to survey quantity (meaningless to obscure, the threshold values of webshell
99)。
Fig. 6 is that experiment one provided in an embodiment of the present invention is detected to survey quantity (significant to obscure, the threshold values of webshell
90)。
Fig. 7 is two webshell of experiment provided in an embodiment of the present invention detection accuracy (meaningless to obscure, threshold values 99).
Fig. 8 is two webshell of experiment provided in an embodiment of the present invention detection accuracy (significant to obscure, threshold values 90).
Fig. 9 is three webshell false detection rates of experiment provided in an embodiment of the present invention (meaningless to obscure, threshold values 99).
Figure 10 is three webshell false detection rates of experiment provided in an embodiment of the present invention (significant to obscure, threshold values 99).
Figure 11 is that experiment one provided in an embodiment of the present invention is detected to survey quantity (meaningless to obscure, the threshold values of webshell
96)。
Figure 12 is the webshell detecting system schematic diagram provided in an embodiment of the present invention that hash is obscured based on cum rights.
In figure: 1, timeslicing parameters determining module;2, hash and weight module are asked for every;3, it connects hash value and calculates total power
It is worth module;4, cum rights hash value comparison module.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
In the prior art, traditional fuzzy hash algorithm can only resist a certain range of disturbance, within generally 6%, and it is right
(more than 20%) is added in large range of redundancy, detection effect becomes very poor, and adaptability is bad.It is big due to webshell
Small difference is very big, and change is easy to beyond fuzzy hash detection range in small webshell file.Traditional fuzzy hash algorithm
Compare for text similarity, is a kind of universality algorithm, and webshell is as a kind of text with specific function, with
The key difference of plain text is the use of dangerous function, and traditional fuzzy hash algorithm does not consider this point difference.
To solve the above problems, below with reference to concrete scheme, the present invention is described in detail.
As shown in Figure 1, the webshell detection method provided in an embodiment of the present invention for obscuring hash based on cum rights, comprising:
S101, timeslicing parameters determine that cum rights obscures hash and the weak hash function of Alder-32 is used to carry out fragment touching first
The detection of clockwork spring part, i.e., be rolling hash to text.
S102, every is asked hash and weight, using hash algorithm FNV-1 32 strong to every after the entire file fragmentation of determination
One fragment content is calculated, and obtains one 32 hash values.
S103 calculates cum rights and obscures hash value.The corresponding character of the hash of fragment is carried out being spliced into order traditional
Fuzzy hash string, while the splicing of total weight being gone here and there below in fuzzy hash, total weight is the simple addition of every weight.
S104, cum rights obscure hash value and compare, similarity judgement;The cum rights of file to be detected obscures hash value and deposits in advance
Storage each cum rights in fingerprint base obscures hash value and successively compares, if there is any one cum rights in fingerprint base obscures hash value
The similarity for obscuring hash value with the cum rights of the file to be detected is more than threshold value, then it is assumed that the file to be detected is
webshell。
In step S101, timeslicing parameters are determined.Cum rights obscures hash and the weak hash function of Alder-32 is used to carry out first
The detection of fragment trigger condition, i.e., be rolling hash to text, the text in hash window be denoted as (b1, b2, b3..., bs-1,
bs), corresponding hash value are as follows:
H (s)=F (b1, b2, b3..., bs-1, bs) (1)
When window slides backward a byte, new hash value, which calculates, only needs to remove the influence of first character section again most
The latter byte bs+1Effect:
H (s+1)=F (b2, b3..., bs-1, bs, bi+1)=h (s)-X (b1)+Y(bs) (2)
The trigger condition of fragment is then codetermined by file size L, rolling window length s and minimum fragment length:
When hash result h (s) is to binitModulus is binitWhen -1, then in current window the last byte bsPiece is punished, therefore
And the number of entire file fragmentation is just related to fragment trigger condition.
In step S102, every is asked hash and weight.32 strong hash algorithms are used after the entire file fragmentation of determination
FNV-1 calculates each fragment content, obtains one 32 hash values, in order to improve relative efficiency, reduces fuzzy
Hash bit only selects rear six binary systems of every hash value to represent this piece of hash value, and six binary systems share 64
Kind combination is possible, each combination goes to represent using a character namely each fragment will use a character to represent, if
Fragment content is the same, then corresponding character must be the same.
And while calculating every hash, cum rights hash can calculate every weight simultaneously.It is calculated in traditional fuzzy hash
It is one to the contribution of last hash fiducial value that the status of each character, which is the difference of equivalent namely all characters, in method, in string
Sample namely each fragment is the same for the importance of entire file.And for webshell sample, it is clear that have
Aggressive back door code section is more important with respect to other parts, and is based on this characteristic, proposed in this paper to be directed to
The cum rights of webshell detection obscures hash algorithm, and different weights is imparted to each fragment.
According to escape detection method mentioned above, we are most of to pass through what redundancy was added it can be concluded that a conclusion
The webshell that mode is obscured, dangerous function number or accounting decline, and text information entropy will will increase, and especially pass through nothing
The webshell that meaning code redundancy is upset.Based on this conclusion, a fragment weight equation is proposed:
Wherein w is fragment weight, and K be that take 2, D be dangerous function number in fragment to coefficient default, I for fragment comentropy.
From formula 4, it will be seen that dangerous function number is positively correlated in the weight of fragment and fragment, with the comentropy of fragment at
Negative correlation, so the meaningless influence upset to last fuzzy hash result can be reduced by this formula.
In step S103, calculates cum rights and obscure hash value.The corresponding character of the hash of fragment is spliced into order
Fuzzy hash string, while the splicing of total weight being gone here and there below in hash, total weight is the simple addition of every weight.
In step S104, cum rights obscures hash value and compares, similarity judgement.The cum rights of file to be detected obscure hash value with
It is stored in advance in the fuzzy hash value of each cum rights in fingerprint base successively to compare, if it is fuzzy to there is any one cum rights in fingerprint base
The similarity that the cum rights of hash value and the file to be detected obscures hash value is more than threshold value, then it is assumed that the file to be detected is
webshell.The number of hash character string kinds of characters can be compared by being primarily based on editing distance calculation method, namely different
The number of the number of fragment, different fragments is more, and editing distance is also bigger.After calculating editing distance, total weight is subject to
Consider, calculate final hash string distance:
M is final hash string distance, and d is the editing distance of two hash character strings, and L is the length of longer hash string, f
(x) it is the comparison formula of two total weights of hash character string:
k1, k2For coefficient, default takes 4 and 2/5, W1, W2Total weight of respectively two hash character strings.The codomain of f (x)
For [- 1,1], work as W1, W2When difference is smaller, the value of f (x) is positive, and as difference is gradually increased, the value of f (x) becomes negative.
According to formula 4, when being inserted into meaningless obfuscated codes in a webshell script, the generation after obscuring can be made
Code and the fuzzy hash string editing distance d of original increase, but the difference of weight W is still smaller, and the value of f (x) is positive at this time, m
< d namely file are more alike.And on the other hand, since the Network Intrusion function of webshell depends primarily on fraction
Dangerous function, and other code segments are then widely present in normal script, so that the volume of webshell script and normal script
It is smaller to collect distance, causes erroneous detection.After the effect of weight, even if a webshell script and normal script hash value are compiled
Volume distance is smaller, but since the difference of its weight W is larger, f (x) is negative, at this time m > d namely file less as.So from reason
By upper analysis, webshell is detected using the fuzzy hash algorithm of Weight, webshell detection can be effectively improved
Rate reduces false detection rate."
It also needs to do normalized after calculating final hash string distance, that is, is mapped between 0-100:
Wherein, m is final hash string distance, l1, l2For hash string length, from the above equation, we can see that, editing distance is smaller, obscures
Hash fiducial value is bigger, i.e., text is more similar.
As shown in figure 12, the webshell detection system provided in an embodiment of the present invention that hash is obscured based on cum rights, it is described
The webshell detection system for obscuring hash based on cum rights includes:
Timeslicing parameters determining module 1 obscures hash for cum rights and carries out the detection of fragment trigger condition, rolls to text
hash。
Every is asked hash and weight module 2, and English uses hash algorithm FNV-1 32 strong after the entire file fragmentation of determination
Each fragment content is calculated, obtains one 32 hash values.
It calculates cum rights and obscures hash value module 3, for carrying out the corresponding character of the hash of fragment to be spliced into mould in order
Hash string is pasted, while later by the splicing of total weight, and total weight is the addition of every weight.
Cum rights hash value comparison module 4, similarity judgement;The cum rights of file to be detected obscures hash value and is stored in advance in
Each cum rights obscures hash value and successively compares in fingerprint base, if there is any one cum rights in fingerprint base obscures hash value and institute
It is more than threshold value that the cum rights for stating file to be detected, which obscures the similarity of hash value, then it is assumed that the file to be detected is webshell.
Below with reference to experiment, the invention will be further described.
The present invention has downloaded 2500 in disclosed data set from network to test the antialiasing ability of cum rights hash algorithm
A webshell sample and 5000 normal samples have carried out three experiments, have been respectively compared cum rights hash and traditional fuzzy hash
To the detection effect of the webshell after obscuring.From the point of view of theory analysis and experimental result, cum rights obscures hash to webshell
Detection effect more better than traditional fuzzy hash can be obtained really and possess better anti-interference ability by being detected.
Experiment one: randomly select part webshell file in proportion with the meaningless random string of radom insertion and intentionally
The code of justice calculates separately the webshell cum rights hash value after being confused and traditional fuzzy hash value as obscuring, thus than
The more different influences for obscuring ratio to two kinds of hash value sizes.
The experimental results showed that the same subgraph is two kinds after obscuring character to same webshell file insertion difference
The variation of hash value, abscissa are that the random of insertion obscures the ratio that character accounts for former webshell text, and ordinate is hash value,
Weight broken line is cum rights hash value, and tradition is traditional fuzzy hash value.From figure, it can be seen that be directed to same file
Obscure, cum rights hash value will be consistently greater than traditional fuzzy hash value, and hash value is bigger, then documentary evidence is more similar, also
It is to say, when the fuzzy hash of cum rights detects webshell, antialiasing ability is higher than traditional fuzzy hash.
Experiment two: having randomly selected 2 webshell samples, calculates separately their cum rights hash value and tradition hash
Value, and repeat to randomly select 100 times, two kinds of obtained hash values are as shown in the figure.
For experiment two, randomly selects two different files and carry out the calculating of hash value, ordinate is hash value, and weight is
Cum rights hash value, tradition are traditional fuzzy hash values, judging from the experimental results, the band of overwhelming majority difference Documents Comparison
Hash value is weighed all below traditional fuzzy hash value, and hash value is smaller, it was demonstrated that file is more dissimilar, so the mistake of cum rights hash
Sentence also lower than traditional fuzzy hash.Meanwhile in testing two figures, also there are several abnormal points, cum rights hash value is significantly greater than
Fuzzy hash value, the reason of generating abnormal point are that weight collision occurred, and so-called weight collision is that is, two files may very not phase
Seemingly, but the weight of the fuzzy hash of its cum rights may be relatively.
Experiment three: 5 webshell samples and 1000 normal samples are had chosen, by the cum rights of 5 webshell samples
Fuzzy hash value and traditional fuzzy hash value are saved respectively as fingerprint base.Then this 5 black samples are obscured, the side of obscuring
Formula is identical as experiment one, i.e., is inserted into meaningless character and significant code in proportion at random.Black sample and white sample after obscuring
This is mixed, and is detected later using the fingerprint base prestored to all samples.
The experimental results showed that the abscissa of all experimental result pictures obscures ratio i.e. institute for the webshell sample of experiment
The ratio that character accounts for original sample is obscured in insertion, and it is 99 that Fig. 4-Fig. 6, which obtains detection threshold, i.e., the hash value compared with fingerprint base is greater than etc.
Just it is judged as webshell in 99.It can be seen that from Fig. 3-Fig. 6 when not making to obscure, cum rights obscures hash and traditional fuzzy
Hash is attained by 100% to the webshell accuracy rate detected, with the increase for obscuring ratio, under the verification and measurement ratio of the two has
Drop, traditional fuzzy hash is directly reduced to zero, and cum rights obscures hash and remains to detect webshell, and effect is also relatively good, but with
This simultaneously, cum rights obscure hash also have certain false detection rate, as shown in fig. 6, experiment in false detection rate be 0.4%, and generate mistake
The reason of inspection is that there are weight collisions, to produce erroneous detection.And the setting of the size and detection threshold of false detection rate also has relationship,
If Fig. 7-Fig. 9 experiment set weight as 96, false detection rate has been increased to 0.8%, but at the same time, it detects accuracy and also obtains
It is promoted, obscures ratio when 20% and 30%, accuracy rate has been increased to 40% from 20%, obscures ratio at 50%, quasi-
True rate has then been increased to 20% from 0.
In embodiments of the present invention, Fig. 2 is after experiment a pair of webshell provided in an embodiment of the present invention obscures in proportion
Hash value (being inserted into meaningless character).
Fig. 3 is that the hash value after experiment a pair of webshell provided in an embodiment of the present invention obscures in proportion (is inserted into intentional
The code annotation of justice).
Figure 10 is three webshell false detection rates of experiment provided in an embodiment of the present invention (significant to obscure, threshold values 99).
Figure 11 is that experiment one provided in an embodiment of the present invention is detected to survey quantity (meaningless to obscure, the threshold values of webshell
96)。
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or
Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to
Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network
Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one
Computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from one
A web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)
Or wireless (such as infrared, wireless, microwave etc.) mode is carried out to another web-site, computer, server or data center
Transmission).The computer-readable storage medium can be any usable medium or include one that computer can access
The data storage devices such as a or multiple usable mediums integrated server, data center.The usable medium can be magnetic Jie
Matter, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid
State Disk (SSD)) etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (9)
1. a kind of webshell detection method for obscuring hash based on cum rights, which is characterized in that described to obscure hash based on cum rights
Fuzzy hash algorithm of the webshell detection method based on weight distribution, to each fragment assign weight, dangerous function
Core fragment give big weight;
In conjunction with the comentropy of each fragment, information entropy is bigger, and the weight given is smaller, improves dangerous function to last fuzzy
The influence of hash result, while reducing the meaningless influence upset to last fuzzy hash result;
The hash of each fragment be spliced into fuzzy hash go here and there and calculate total weight value obtain cum rights obscure hash value;By text to be detected
The cum rights of part obscures hash value and realizes compared with being stored in advance in each webshell cum rights in fingerprint base and obscuring hash value successively
Detection to webshell and webshel mutation.
2. the webshell detection method of hash is obscured based on cum rights as described in claim 1, which is characterized in that described to be based on
Cum rights obscure hash webshell detection method include:
Step 1, timeslicing parameters determine;Cum rights obscures hash and carries out fragment trigger condition using the weak hash function of Alder-32
Detection, is rolling hash to file to be detected;
Step 2, every is asked hash and weight;Using hash algorithm FNV-1 32 strong to each after the entire file fragmentation of determination
A fragment content is calculated, and obtains one 32 hash values;
Step 3 calculates cum rights and obscures hash value;It carries out the corresponding character of the hash of fragment to be spliced into fuzzy hash in order
String, while the splicing of total weight being gone here and there below in fuzzy hash, and total weight is the addition of every weight, obtains file band to be detected
Weigh hash value;
Step 4, cum rights obscure hash value and compare, similarity judgement;The cum rights of file to be detected obscures hash value and is stored in advance
Each cum rights obscures hash value and successively compares in fingerprint base, if exist in fingerprint base any one cum rights obscure hash value with
The similarity that the cum rights of the file to be detected obscures hash value is more than threshold value, then it is assumed that the file to be detected is
webshell。
3. the webshell detection method of hash is obscured based on cum rights as claimed in claim 2, which is characterized in that step 2
In, while calculating every hash, cum rights hash calculates every weight simultaneously;And different power is assigned to each fragment
Value;
Fragment weight equation are as follows:
Wherein w is fragment weight, and K be that take 2, D be dangerous function number in fragment to coefficient default, I for fragment comentropy.
4. the webshell detection method of hash is obscured based on cum rights as claimed in claim 2, which is characterized in that step 4
In, in advance collect webshell sample and calculate its cum rights obscure hash value be stored in fingerprint base;The band of file to be detected
Fuzzy hash value is weighed compared with being stored in each cum rights in fingerprint base and obscuring hash value successively, calculates file to be detected first
Hash character string obscures the editing distance d of hash value with cum rights in fingerprint base is stored in, then calculate final hash string away from
From:
M is final hash string distance, and d is the editing distance of two hash character strings, and L is the length of longer hash string, f (x)
For the comparison formula of two total weights of hash character string:
k1, k2For coefficient, default takes 4 and 2/5, W1, W2Total weight of respectively two hash character strings;The codomain of f (x) be [- 1,
1], work as W1, W2When difference is smaller, the value of f (x) is positive, and as difference is gradually increased, the value of f (x) becomes negative;
After calculating hash string distance, normalized is done to hash string distance and calculates phase knowledge and magnanimity, maps to 0-100:
Wherein, m is final hash value distance, l1, l2For hash value length, editing distance is smaller, and fuzzy hash similarity h is bigger,
Text is more similar.
5. a kind of webshell detection method for obscuring hash based on cum rights as described in claim 1 obscures hash based on cum rights
Webshell detection system, which is characterized in that the webshell detection system for obscuring hash based on cum rights includes:
Timeslicing parameters determining module obscures hash for cum rights and carries out the detection of fragment trigger condition, is rolling hash to text;
Every is asked hash and weight module, for using hash algorithm FNV-1 32 strong to every after the entire file fragmentation of determination
One fragment content is calculated, and obtains one 32 hash values;
It calculates cum rights and obscures hash value module, for carrying out the corresponding character of the hash of fragment to be spliced into fuzzy hash in order
String, while later by the splicing of total weight, and total weight is the addition of every weight;
Cum rights obscures hash value comparison module, and the cum rights of file to be detected obscures hash value and is stored in advance in fingerprint base each
Cum rights obscures hash value and successively compares, if there is any one cum rights in fingerprint base obscures hash value and the file to be detected
Cum rights obscure hash value similarity be more than threshold value, then it is assumed that the file to be detected is webshell.
6. a kind of webshell for obscuring hash based on cum rights detects program, it is applied to terminal, which is characterized in that described to be based on band
The webshell detection program for weighing fuzzy hash, which is realized, obscures hash's based on cum rights described in Claims 1 to 5 any one
Webshell detection method.
7. a kind of terminal, which is characterized in that the terminal, which is carried, to be realized described in Claims 1 to 5 any one based on cum rights mould
Paste the processor of the webshell detection method of hash.
8. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed
Benefit requires the webshell detection method for obscuring hash described in 1~5 any one based on cum rights.
9. a kind of realize the webshell detection method network for obscuring hash described in Claims 1 to 5 any one based on cum rights
Space safety defends platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910311319.9A CN110034921B (en) | 2019-04-18 | 2019-04-18 | Webshell detection method based on weighted fuzzy hash |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910311319.9A CN110034921B (en) | 2019-04-18 | 2019-04-18 | Webshell detection method based on weighted fuzzy hash |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110034921A true CN110034921A (en) | 2019-07-19 |
CN110034921B CN110034921B (en) | 2022-04-15 |
Family
ID=67238943
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910311319.9A Active CN110034921B (en) | 2019-04-18 | 2019-04-18 | Webshell detection method based on weighted fuzzy hash |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110034921B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110995440A (en) * | 2019-11-21 | 2020-04-10 | 腾讯科技(深圳)有限公司 | Work history confirming method, device, equipment and storage medium |
CN112487432A (en) * | 2020-12-10 | 2021-03-12 | 杭州安恒信息技术股份有限公司 | Method, system and equipment for malicious file detection based on icon matching |
CN112600797A (en) * | 2020-11-30 | 2021-04-02 | 泰康保险集团股份有限公司 | Method and device for detecting abnormal access behavior, electronic equipment and storage medium |
CN113132341A (en) * | 2020-01-16 | 2021-07-16 | 深信服科技股份有限公司 | Network attack behavior detection method and device, electronic equipment and storage medium |
CN113162761A (en) * | 2020-09-18 | 2021-07-23 | 广州锦行网络科技有限公司 | Webshell monitoring system |
CN113239352A (en) * | 2021-04-06 | 2021-08-10 | 中国科学院信息工程研究所 | Webshell detection method and system |
CN115438342A (en) * | 2022-09-13 | 2022-12-06 | 武汉思普崚技术有限公司 | Webshell detection method and related equipment |
CN117591119A (en) * | 2023-11-01 | 2024-02-23 | 国家计算机网络与信息安全管理中心 | Mass APK source code feature extraction and similarity analysis method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106599242A (en) * | 2016-12-20 | 2017-04-26 | 福建六壬网安股份有限公司 | Webpage change monitoring method and system based on similarity calculation |
CN106911686A (en) * | 2017-02-20 | 2017-06-30 | 杭州迪普科技股份有限公司 | WebShell detection methods and device |
CN107423309A (en) * | 2016-06-01 | 2017-12-01 | 国家计算机网络与信息安全管理中心 | Magnanimity internet similar pictures detecting system and method based on fuzzy hash algorithm |
US20180218153A1 (en) * | 2017-01-31 | 2018-08-02 | Hewlett Packard Enterprise Development Lp | Comparing structural information of a snapshot of system memory |
CN108985057A (en) * | 2018-06-27 | 2018-12-11 | 平安科技(深圳)有限公司 | A kind of webshell detection method and relevant device |
-
2019
- 2019-04-18 CN CN201910311319.9A patent/CN110034921B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423309A (en) * | 2016-06-01 | 2017-12-01 | 国家计算机网络与信息安全管理中心 | Magnanimity internet similar pictures detecting system and method based on fuzzy hash algorithm |
CN106599242A (en) * | 2016-12-20 | 2017-04-26 | 福建六壬网安股份有限公司 | Webpage change monitoring method and system based on similarity calculation |
US20180218153A1 (en) * | 2017-01-31 | 2018-08-02 | Hewlett Packard Enterprise Development Lp | Comparing structural information of a snapshot of system memory |
CN106911686A (en) * | 2017-02-20 | 2017-06-30 | 杭州迪普科技股份有限公司 | WebShell detection methods and device |
CN108985057A (en) * | 2018-06-27 | 2018-12-11 | 平安科技(深圳)有限公司 | A kind of webshell detection method and relevant device |
Non-Patent Citations (2)
Title |
---|
BAN XIAOFANG ET AL.: "Malware Variant Detection Using Similarity Search over Content Fingerprint", 《2014 IEEE》 * |
邸宏宇等: "一种基于改进模糊哈希的文件比较算法研究", 《信息网络安全 》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110995440A (en) * | 2019-11-21 | 2020-04-10 | 腾讯科技(深圳)有限公司 | Work history confirming method, device, equipment and storage medium |
CN110995440B (en) * | 2019-11-21 | 2022-08-09 | 腾讯科技(深圳)有限公司 | Work history confirming method, device, equipment and storage medium |
CN113132341A (en) * | 2020-01-16 | 2021-07-16 | 深信服科技股份有限公司 | Network attack behavior detection method and device, electronic equipment and storage medium |
CN113132341B (en) * | 2020-01-16 | 2023-03-21 | 深信服科技股份有限公司 | Network attack behavior detection method and device, electronic equipment and storage medium |
CN113162761A (en) * | 2020-09-18 | 2021-07-23 | 广州锦行网络科技有限公司 | Webshell monitoring system |
CN113162761B (en) * | 2020-09-18 | 2022-02-18 | 广州锦行网络科技有限公司 | Webshell monitoring system |
CN112600797A (en) * | 2020-11-30 | 2021-04-02 | 泰康保险集团股份有限公司 | Method and device for detecting abnormal access behavior, electronic equipment and storage medium |
CN112487432A (en) * | 2020-12-10 | 2021-03-12 | 杭州安恒信息技术股份有限公司 | Method, system and equipment for malicious file detection based on icon matching |
CN113239352A (en) * | 2021-04-06 | 2021-08-10 | 中国科学院信息工程研究所 | Webshell detection method and system |
CN115438342A (en) * | 2022-09-13 | 2022-12-06 | 武汉思普崚技术有限公司 | Webshell detection method and related equipment |
CN117591119A (en) * | 2023-11-01 | 2024-02-23 | 国家计算机网络与信息安全管理中心 | Mass APK source code feature extraction and similarity analysis method |
Also Published As
Publication number | Publication date |
---|---|
CN110034921B (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110034921A (en) | The webshell detection method of hash is obscured based on cum rights | |
CN111382430B (en) | System and method for classifying objects of a computer system | |
CN107204960B (en) | Webpage identification method and device and server | |
JP6697123B2 (en) | Profile generation device, attack detection device, profile generation method, and profile generation program | |
KR100894331B1 (en) | Anomaly Detection System and Method of Web Application Attacks using Web Log Correlation | |
JP2019079493A (en) | System and method for detecting malicious files using machine learning | |
JP2020115320A (en) | System and method for detecting malicious file | |
CN108399338A (en) | Platform integrity status measure information method based on process behavior | |
EP2284752B1 (en) | Intrusion detection systems and methods | |
CN107463844B (en) | WEB Trojan horse detection method and system | |
CN111756724A (en) | Detection method, device and equipment for phishing website and computer readable storage medium | |
CN115086060B (en) | Flow detection method, device, equipment and readable storage medium | |
EP2977928B1 (en) | Malicious code detection | |
JP6777612B2 (en) | Systems and methods to prevent data loss in computer systems | |
CN113746952A (en) | DGA domain name detection method, device, electronic equipment and computer storage medium | |
CN112380537A (en) | Method, device, storage medium and electronic equipment for detecting malicious software | |
CN105243327A (en) | Security processing method for files | |
KR101526500B1 (en) | Suspected malignant website detecting method and system using information entropy | |
KR101327865B1 (en) | Homepage infected with a malware detecting device and method | |
Noh et al. | Phishing Website Detection Using Random Forest and Support Vector Machine: A Comparison | |
CN114510720A (en) | Android malicious software classification method based on feature fusion and NLP technology | |
KR20210024748A (en) | Malware documents detection device and method using generative adversarial networks | |
CN115809466B (en) | Security requirement generation method and device based on STRIDE model, electronic equipment and medium | |
KR102498265B1 (en) | Privacy preserving applications and device fault detection | |
KR102448784B1 (en) | Method for providing weighting using device fingerprint, recording medium and device for performing the method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221108 Address after: 100029 Beijing city Chaoyang District Yumin Road No. 3 Patentee after: NATIONAL COMPUTER NETWORK AND INFORMATION SECURITY MANAGEMENT CENTER Address before: 610225, No. 24, Section 1, Xuefu Road, Southwest Economic Development Zone, Chengdu, Sichuan Patentee before: CHENGDU University OF INFORMATION TECHNOLOGY Patentee before: NATIONAL COMPUTER NETWORK AND INFORMATION SECURITY MANAGEMENT CENTER |
|
TR01 | Transfer of patent right |