CN110034921A - The webshell detection method of hash is obscured based on cum rights - Google Patents

The webshell detection method of hash is obscured based on cum rights Download PDF

Info

Publication number
CN110034921A
CN110034921A CN201910311319.9A CN201910311319A CN110034921A CN 110034921 A CN110034921 A CN 110034921A CN 201910311319 A CN201910311319 A CN 201910311319A CN 110034921 A CN110034921 A CN 110034921A
Authority
CN
China
Prior art keywords
hash
cum rights
webshell
fragment
hash value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910311319.9A
Other languages
Chinese (zh)
Other versions
CN110034921B (en
Inventor
林宏刚
陈麟
黄元飞
赖裕民
张家旺
李燕伟
王鹏翩
林星辰
应志军
吴倩
杜薇
陈禹
张晓娜
王博
杨鹏
高强
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
Chengdu University of Information Technology
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology, National Computer Network and Information Security Management Center filed Critical Chengdu University of Information Technology
Priority to CN201910311319.9A priority Critical patent/CN110034921B/en
Publication of CN110034921A publication Critical patent/CN110034921A/en
Application granted granted Critical
Publication of CN110034921B publication Critical patent/CN110034921B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC

Abstract

The invention belongs to cyberspace security technology areas, disclose a kind of webshell detection method and system that hash algorithm is obscured based on cum rights, by by file fragmentation to be detected;Every is asked hash and weight, assigns weight to each fragment, the core fragment of dangerous function gives biggish weight, while considering the comentropy of each fragment, and information entropy is bigger, and the weight given is smaller;The hash of each fragment be spliced into fuzzy hash go here and there and calculate total weight value obtain cum rights obscure hash value;The cum rights of file to be detected is obscured into hash value compared with being stored in advance in each webshell cum rights in fingerprint base and obscuring hash value successively.The present invention can effectively adapt to the very big situation of test object size variation compared with traditional fuzzy hash algorithm, have well adapting to property, the Detection accuracy of mutation sample is greatly improved, and improve anti-interference.

Description

The webshell detection method of hash is obscured based on cum rights
Technical field
The invention belongs to cyberspace security technology area more particularly to a kind of webshell that hash is obscured based on cum rights Detection method and system.
Background technique
Currently, the immediate prior art:
Webshell is a kind of malice back door write with scripting language such as jsp, asp, php etc., and attacker is using such as After the website vulnerabilities such as sql injection, file upload upload the back door webshell acquisition permission, it can be repaired by remotely executing order Change deletion or addition server file, the user data that can also be directly viewable in server database.
Since webshell operation will not leave record in system security log, and one is mixed in normal web page files It rises, general management person is difficult to find out invasion trace, and the advanced back door webshell will use various technologies also to escape detection, institute It is more more and more urgent to study efficient and accurately webshell detection method demand.
Dynamic detection and static detection are broadly divided into for the detection of webshell at present.Dynamic detection is according to malicious code Behavioral characteristics when execution, including malicious act and API Calls behavior etc..
Static detection is mainly to analyze the semantic feature of webshell, and static detection speed is fast, detection feature is obvious, for The research of webshell detection method focuses mostly on based on static detection method.Static detection method is quiet dependent on webshell State feature, since webshell is mostly by scripting language, easily modification deformation, when webshell carry out simple mutation or by its When condition code is deliberately obscured, conventional method can fail to report such webshell, therefore examine currently based on the webshell of characteristic matching Survey method is difficult quickly to detect and identify the mutation of webshell.
Therefore, how to overcome traditional unicity and hysteresis quality based on the matched webshell detection mode of condition code, answer Means are obscured to the text of webshell, realizes quickly detection webshell and its mutation, is all art technology all the time The emphasis of personnel's concern.2006, Jesse Kornblum proposed fuzzy hash algorithm, unlike traditional hash, text The partial content of part changes the partial content that can only change corresponding fuzzy hash value, so to be mainly used in file similar for the algorithm Degree compares.And it is essentially also that redundancy is added on original malicious script that the webshell that attacker uses, which escapes means, It upsets, hide static nature or reduces ratio of the static nature in malicious script, so the malice foot after redundancy is upset is added This and former script have certain similitude, are based on this, have scholar that fuzzy hash is applied to Malicious Code Detection.
The ModSecurity of latest edition increases the webshell detection interface of traditional fuzzy hash algorithm ssdeep.
Patent CN201110375166 designs a kind of system and method using fuzzy hash algorithm detection malicious code, visitor Family end is used to calculate the fuzzy hash value of object to be detected, and is transferred to cloud server;Cloud server will be received fuzzy Cryptographic Hash is compared with the fuzzy hash value of storage blacklist, by the similarity of the fuzzy hash value compared according to judgement Strategy, which is formed, determines that result is sent to client.
The fuzzy hash value of script file is calculated in patent 201710078106.7 using fuzzy hash algorithm, and will meter The fuzzy hash value that script file is prestored in the fuzzy hash value and fingerprint base calculated is compared, to filter out and prestore foot The unmatched script file of this document.
Patent CN201710352331 is by extracting character string used in APK;By the word extracted Symbol string is converted into feature vector, to generate fuzzy hash value;The fuzzy hash value is gathered using k-means algorithm Class, and using Hamilton distance as the similarity between measurement, realize the detection to mutation Malware.Above-mentioned document is equal Using traditional fuzzy hash algorithm, there is a problem of that adaptability is poor, it is very big not adapt to test object size variation Situation;Traditional fuzzy hash algorithm is mainly used in file similarity and compares, and does not account for the particularity of detection malicious code, Testing result rate of failing to report is higher.It is fuzzy to propose a kind of cum rights for webshell feature and its escape detection method by the present invention Hash algorithm is applied to webshell and detects, effectively solves the above problems.
In conclusion problem of the existing technology is:
(1) in practical applications, the difference in size of webshell is very big, only may in short be no more than 100 bytes, It may also be more than 200KB, attacker, which often passes through, is added meaningless redundant code reduction key code proportion Interference Detection To escape detection.Traditional fuzzy hash algorithm, which directly applies to webshell detection, can only resist a certain range of disturbance, and one As within 6%, and (more than 20%) is added for large range of redundancy, detection effect becomes very poor, and adaptability is bad. Since the difference in size of webshell is very big, change is easy to beyond traditional fuzzy hash algorithm in small webshell file Detection range.
(2) traditional fuzzy hash algorithm compares for text similarity, is a kind of universality algorithm, and webshell makees For a kind of text with specific function, it is that dangerous function, traditional fuzzy hash are calculated with the key difference of plain text Method does not consider this point.
Solve the meaning of above-mentioned technical problem:
The invention can effectively adapt to the very big situation of test object size variation, have well adapting to property;It fully considers The particularity of webshell also can preferably detect webshell in low wrong report, overcome traditional fuzzy hash The deficiency of algorithm improves the accuracy rate of webshell detection.
Summary of the invention
In view of the problems of the existing technology, on the basis of studying webshell feature and it escapes detection method, this Invention provides a kind of webshell detection method and system that hash is obscured based on cum rights, especially becomes applied to webshell The webshell of shape is detected.
The invention is realized in this way a kind of webshell detection method for obscuring hash based on cum rights, described to be based on band The webshell detection method of fuzzy hash is weighed based on the fuzzy hash algorithm of weight distribution, assigns weight to each fragment, The core fragment of dangerous function gives big weight, improves influence of the dangerous function to last fuzzy hash result.
The comentropy of each fragment is considered simultaneously, and information entropy is bigger, and the weight given is smaller, improves dangerous function pair The influence for finally obscuring hash result reduces the meaningless influence upset to last fuzzy hash result.
The hash of each fragment be spliced into fuzzy hash go here and there and calculate total weight value obtain cum rights obscure hash value;It will be to be checked The cum rights for surveying file obscures hash value compared with being stored in advance in each webshell cum rights in fingerprint base and obscuring hash value successively Realize the detection to webshell and webshel mutation.
Further, the webshell detection method for obscuring hash based on cum rights includes:
Step 1, timeslicing parameters determine;Cum rights obscures hash and carries out fragment triggering using the weak hash function of Alder-32 Condition detection, is rolling hash to file webshell file to be detected;
Step 2, every is asked hash and weight;Using 32 FNV-1 pairs of hash algorithm strong after the entire file fragmentation of determination Each fragment content is calculated, and obtains one 32 hash values;
Step 3 calculates cum rights and obscures hash value, the corresponding character of the hash of fragment carried out being spliced into order fuzzy Hash string, while the splicing of total weight being gone here and there below in fuzzy hash, and total weight is the addition of every weight;
Step 4, cum rights obscure hash value and compare, similarity judgement;The cum rights of file to be detected obscures hash value and in advance It is stored in the fuzzy hash value of each cum rights in fingerprint base successively to compare, if there is any one cum rights in fingerprint base obscures hash The similarity that value obscures hash value with the cum rights of the file to be detected is more than threshold value, then it is assumed that the file to be detected is webshell。
Further, in step 1, the text in hash window is denoted as (b1, b2, b3..., bs-1, bs), corresponding hash Value are as follows:
H (s)=F (b1, b2, b3..., bS-1,bs) (1)
When window slides backward a byte, new hash value calculates the influence of only removal first character section at last A byte bs+1Effect:
H (s+1)=F (b2, b3..., bs-1, bs, bs+1)=h (s)-X (b1)+Y(bs) (2)
The trigger condition for determining fragment is file size L, rolling window length s and minimum fragment length:
When hash result h (s) is to binitModulus is binitWhen -1, in current window the last byte bsPiece is punished, entirely The number of file fragmentation is just related to fragment trigger condition.
Further, in step 2, while calculating every hash, cum rights hash calculates every weight simultaneously;And it gives Each fragment assigns different weights;
Fragment weight equation are as follows:
Wherein w is fragment weight, and K be that take 2, D be dangerous function number in fragment to coefficient default, I for fragment comentropy.
Further, in step 4, final hash string distance is calculated are as follows:
M is final hash string distance, and d is the editing distance of two hash character strings, and L is the length of longer hash string, f (x) it is the comparison formula of two total weights of hash character string:
k1, k2For coefficient, default takes 4 and 2/5, W1, W2Total weight of respectively two hash character strings;The codomain of f (x) For [- 1,1], work as W1, W2When difference is smaller, the value of f (x) is positive, and as difference is gradually increased, the value of f (x) becomes negative;
Further, in step 4, after calculating hash string distance, normalized is done to hash string distance and calculates phase Knowledge and magnanimity map to 0-100:
Wherein, m is final hash value distance, l1, l2For hash value length, editing distance is smaller, obscures hash similarity h Bigger, text is more similar.
Another object of the present invention is to provide a kind of webshell detection system that hash is obscured based on cum rights, the bases Include: in the webshell detection system that cum rights obscures hash
Timeslicing parameters determining module obscures hash for cum rights and carries out the detection of fragment trigger condition, rolls to text hash;
Every is asked hash and weight module, for using hash algorithm FNV-1 32 strong after the entire file fragmentation of determination Each fragment content is calculated, obtains one 32 hash values;
It calculates cum rights and obscures hash value module, it is fuzzy for carrying out being spliced into the corresponding character of the hash of fragment in order Hash string, while the splicing of total weight being gone here and there below in hash, and total weight is the addition of every weight;
Cum rights obscures hash value comparison module, similarity judgement;The cum rights of file to be detected obscures hash value and deposits in advance Storage each cum rights in fingerprint base obscures hash value and successively compares, if there is any one cum rights in fingerprint base obscures hash value The similarity for obscuring hash value with the cum rights of the file to be detected is more than threshold value, then it is assumed that the file to be detected is webshell。
Another object of the present invention is to provide a kind of webshell for obscuring hash based on cum rights to detect program, is applied to Terminal, described obscured based on cum rights obscure hash's based on cum rights described in the webshell detection program realization of hash Webshell detection method.
Another object of the present invention is to provide a kind of terminal, the terminal, which is carried, realizes that the cum rights that is based on obscures hash Webshell detection method processor.
Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer When upper operation, so that computer executes the webshell detection method for obscuring hash based on cum rights.
Another object of the present invention is to provide the detection sides webshell for obscuring hash described in a kind of realize based on cum rights Net of justice network space safety defends platform.
In conclusion advantages of the present invention and good effect are as follows:
The present invention analyzes webshell feature and its method for escaping detection, and the realization of webshell vicious function relies on In dangerous function, and most of escape detection methods are all to hide dangerous function or reduction dangerous function proportion.Attacker When being upset, dangerous function and its neighbouring code will not be changed easily, therefore the present invention is based on cum rights to obscure hash algorithm Weight is assigned to each fragment, the core fragment of dangerous function gives biggish weight, improves dangerous function to most rear mold Paste the influence of hash result;The comentropy of each fragment is considered simultaneously, and information entropy is bigger, and the weight given is smaller, reduces The meaningless influence upset to last fuzzy hash result, the present invention can effectively adapt to the very big feelings of test object size variation Condition has well adapting to property, the Detection accuracy of mutation sample can be greatly improved, and improve anti-interference.
The present invention is in order to test the antialiasing ability and adaptability that cum rights obscures hash algorithm, disclosed data set from network In downloaded 2500 webshell samples and 5000 normal samples have carried out three experiments, it is fuzzy to be respectively compared cum rights The detection effect of hash algorithm and traditional fuzzy hash algorithm to the webshell after obscuring.Judging from the experimental results, cum rights mould Paste hash, which detects webshell, can obtain detection effect more better than traditional fuzzy hash really and possess preferably anti- Interference performance.
Experiment one: randomly select part webshell file in proportion with the meaningless random string of radom insertion and intentionally The code of justice calculates separately the webshell cum rights hash value after being confused and traditional fuzzy hash value as obscuring, thus than The more different influences for obscuring ratio to two kinds of hash value sizes.
The experimental results showed that the same subgraph is two kinds after obscuring character to same webshell file insertion difference The variation of hash value, abscissa are that the random of insertion obscures the ratio that character accounts for former webshell text, and ordinate is hash value, Weight broken line is cum rights hash value, and tradition is traditional fuzzy hash value.From figure, it can be seen that be directed to same file Obscure, cum rights hash value will be consistently greater than traditional fuzzy hash value, and hash value is bigger, then documentary evidence is more similar, also It is to say, when the fuzzy hash of cum rights detects webshell, antialiasing ability is higher than traditional fuzzy hash.
Experiment two: having randomly selected 2 webshell samples, calculates separately their cum rights hash value and tradition hash Value, and repeat to randomly select 100 times, two kinds of obtained hash values are as shown in the figure.
For experiment two, randomly selects two different files and carry out the calculating of hash value, ordinate is hash value, and weight is Cum rights hash value, tradition are traditional fuzzy hash values, judging from the experimental results, the band of overwhelming majority difference Documents Comparison Hash value is weighed all below traditional fuzzy hash value, and hash value is smaller, it was demonstrated that file is more dissimilar, so the mistake of cum rights hash Sentence also lower than traditional fuzzy hash.Meanwhile in testing two figures, also there are several abnormal points, cum rights hash value is significantly greater than Fuzzy hash value, the reason of generating abnormal point are that weight collision occurred, and so-called weight collision is that is, two files may very not phase Seemingly, but the weight of the fuzzy hash of its cum rights may be relatively.
Experiment three: 5 webshell samples and 1000 normal samples are had chosen, by the cum rights of 5 webshell samples Fuzzy hash value and traditional fuzzy hash value are saved respectively as fingerprint base.Then this 5 black samples are obscured, the side of obscuring Formula is identical as experiment one, i.e., is inserted into meaningless character and significant code in proportion at random.Black sample and white sample after obscuring This is mixed, and is detected later using the fingerprint base prestored to all samples.
The experimental results showed that the abscissa of all experimental result pictures obscures ratio i.e. institute for the webshell sample of experiment The ratio that character accounts for original sample is obscured in insertion, and it is 99 that Fig. 4-Fig. 6, which obtains detection threshold, i.e., the hash value compared with fingerprint base is greater than etc. Just it is judged as webshell in 99.It can be seen that from Fig. 3-Fig. 6 when not making to obscure, cum rights obscures hash and traditional fuzzy Hash is attained by 100% to the webshell accuracy rate detected, with the increase for obscuring ratio, under the verification and measurement ratio of the two has Drop, traditional fuzzy hash is directly reduced to zero, and cum rights obscures hash and remains to detect webshell, and effect is also relatively good, but with This simultaneously, cum rights obscure hash also have certain false detection rate, as shown in fig. 6, experiment in false detection rate be 0.4%, and generate mistake The reason of inspection is that there are weight collisions, to produce erroneous detection.And the setting of the size and detection threshold of false detection rate also has relationship, If Fig. 7-Fig. 9 experiment set weight as 96, false detection rate has been increased to 0.8%, but at the same time, it detects accuracy and also obtains It is promoted, obscures ratio when 20% and 30%, accuracy rate has been increased to 40% from 20%, obscures ratio at 50%, quasi- True rate has then been increased to 20% from 0.
Detailed description of the invention
Fig. 1 is the webshell detection method flow chart provided in an embodiment of the present invention that hash is obscured based on cum rights.
Fig. 2 is that (insertion is not intended to for hash value after experiment a pair of webshell provided in an embodiment of the present invention obscures in proportion Adopted character).
Fig. 3 is that the hash value after experiment a pair of webshell provided in an embodiment of the present invention obscures in proportion (is inserted into intentional The code annotation of justice).
Fig. 4 is that the hash of two pairs of experiment different files provided in an embodiment of the present invention compares figure.
Fig. 5 is that experiment one provided in an embodiment of the present invention is detected to survey quantity (meaningless to obscure, the threshold values of webshell 99)。
Fig. 6 is that experiment one provided in an embodiment of the present invention is detected to survey quantity (significant to obscure, the threshold values of webshell 90)。
Fig. 7 is two webshell of experiment provided in an embodiment of the present invention detection accuracy (meaningless to obscure, threshold values 99).
Fig. 8 is two webshell of experiment provided in an embodiment of the present invention detection accuracy (significant to obscure, threshold values 90).
Fig. 9 is three webshell false detection rates of experiment provided in an embodiment of the present invention (meaningless to obscure, threshold values 99).
Figure 10 is three webshell false detection rates of experiment provided in an embodiment of the present invention (significant to obscure, threshold values 99).
Figure 11 is that experiment one provided in an embodiment of the present invention is detected to survey quantity (meaningless to obscure, the threshold values of webshell 96)。
Figure 12 is the webshell detecting system schematic diagram provided in an embodiment of the present invention that hash is obscured based on cum rights.
In figure: 1, timeslicing parameters determining module;2, hash and weight module are asked for every;3, it connects hash value and calculates total power It is worth module;4, cum rights hash value comparison module.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
In the prior art, traditional fuzzy hash algorithm can only resist a certain range of disturbance, within generally 6%, and it is right (more than 20%) is added in large range of redundancy, detection effect becomes very poor, and adaptability is bad.It is big due to webshell Small difference is very big, and change is easy to beyond fuzzy hash detection range in small webshell file.Traditional fuzzy hash algorithm Compare for text similarity, is a kind of universality algorithm, and webshell is as a kind of text with specific function, with The key difference of plain text is the use of dangerous function, and traditional fuzzy hash algorithm does not consider this point difference.
To solve the above problems, below with reference to concrete scheme, the present invention is described in detail.
As shown in Figure 1, the webshell detection method provided in an embodiment of the present invention for obscuring hash based on cum rights, comprising:
S101, timeslicing parameters determine that cum rights obscures hash and the weak hash function of Alder-32 is used to carry out fragment touching first The detection of clockwork spring part, i.e., be rolling hash to text.
S102, every is asked hash and weight, using hash algorithm FNV-1 32 strong to every after the entire file fragmentation of determination One fragment content is calculated, and obtains one 32 hash values.
S103 calculates cum rights and obscures hash value.The corresponding character of the hash of fragment is carried out being spliced into order traditional Fuzzy hash string, while the splicing of total weight being gone here and there below in fuzzy hash, total weight is the simple addition of every weight.
S104, cum rights obscure hash value and compare, similarity judgement;The cum rights of file to be detected obscures hash value and deposits in advance Storage each cum rights in fingerprint base obscures hash value and successively compares, if there is any one cum rights in fingerprint base obscures hash value The similarity for obscuring hash value with the cum rights of the file to be detected is more than threshold value, then it is assumed that the file to be detected is webshell。
In step S101, timeslicing parameters are determined.Cum rights obscures hash and the weak hash function of Alder-32 is used to carry out first The detection of fragment trigger condition, i.e., be rolling hash to text, the text in hash window be denoted as (b1, b2, b3..., bs-1, bs), corresponding hash value are as follows:
H (s)=F (b1, b2, b3..., bs-1, bs) (1)
When window slides backward a byte, new hash value, which calculates, only needs to remove the influence of first character section again most The latter byte bs+1Effect:
H (s+1)=F (b2, b3..., bs-1, bs, bi+1)=h (s)-X (b1)+Y(bs) (2)
The trigger condition of fragment is then codetermined by file size L, rolling window length s and minimum fragment length:
When hash result h (s) is to binitModulus is binitWhen -1, then in current window the last byte bsPiece is punished, therefore And the number of entire file fragmentation is just related to fragment trigger condition.
In step S102, every is asked hash and weight.32 strong hash algorithms are used after the entire file fragmentation of determination FNV-1 calculates each fragment content, obtains one 32 hash values, in order to improve relative efficiency, reduces fuzzy Hash bit only selects rear six binary systems of every hash value to represent this piece of hash value, and six binary systems share 64 Kind combination is possible, each combination goes to represent using a character namely each fragment will use a character to represent, if Fragment content is the same, then corresponding character must be the same.
And while calculating every hash, cum rights hash can calculate every weight simultaneously.It is calculated in traditional fuzzy hash It is one to the contribution of last hash fiducial value that the status of each character, which is the difference of equivalent namely all characters, in method, in string Sample namely each fragment is the same for the importance of entire file.And for webshell sample, it is clear that have Aggressive back door code section is more important with respect to other parts, and is based on this characteristic, proposed in this paper to be directed to The cum rights of webshell detection obscures hash algorithm, and different weights is imparted to each fragment.
According to escape detection method mentioned above, we are most of to pass through what redundancy was added it can be concluded that a conclusion The webshell that mode is obscured, dangerous function number or accounting decline, and text information entropy will will increase, and especially pass through nothing The webshell that meaning code redundancy is upset.Based on this conclusion, a fragment weight equation is proposed:
Wherein w is fragment weight, and K be that take 2, D be dangerous function number in fragment to coefficient default, I for fragment comentropy. From formula 4, it will be seen that dangerous function number is positively correlated in the weight of fragment and fragment, with the comentropy of fragment at Negative correlation, so the meaningless influence upset to last fuzzy hash result can be reduced by this formula.
In step S103, calculates cum rights and obscure hash value.The corresponding character of the hash of fragment is spliced into order Fuzzy hash string, while the splicing of total weight being gone here and there below in hash, total weight is the simple addition of every weight.
In step S104, cum rights obscures hash value and compares, similarity judgement.The cum rights of file to be detected obscure hash value with It is stored in advance in the fuzzy hash value of each cum rights in fingerprint base successively to compare, if it is fuzzy to there is any one cum rights in fingerprint base The similarity that the cum rights of hash value and the file to be detected obscures hash value is more than threshold value, then it is assumed that the file to be detected is webshell.The number of hash character string kinds of characters can be compared by being primarily based on editing distance calculation method, namely different The number of the number of fragment, different fragments is more, and editing distance is also bigger.After calculating editing distance, total weight is subject to Consider, calculate final hash string distance:
M is final hash string distance, and d is the editing distance of two hash character strings, and L is the length of longer hash string, f (x) it is the comparison formula of two total weights of hash character string:
k1, k2For coefficient, default takes 4 and 2/5, W1, W2Total weight of respectively two hash character strings.The codomain of f (x) For [- 1,1], work as W1, W2When difference is smaller, the value of f (x) is positive, and as difference is gradually increased, the value of f (x) becomes negative.
According to formula 4, when being inserted into meaningless obfuscated codes in a webshell script, the generation after obscuring can be made Code and the fuzzy hash string editing distance d of original increase, but the difference of weight W is still smaller, and the value of f (x) is positive at this time, m < d namely file are more alike.And on the other hand, since the Network Intrusion function of webshell depends primarily on fraction Dangerous function, and other code segments are then widely present in normal script, so that the volume of webshell script and normal script It is smaller to collect distance, causes erroneous detection.After the effect of weight, even if a webshell script and normal script hash value are compiled Volume distance is smaller, but since the difference of its weight W is larger, f (x) is negative, at this time m > d namely file less as.So from reason By upper analysis, webshell is detected using the fuzzy hash algorithm of Weight, webshell detection can be effectively improved Rate reduces false detection rate."
It also needs to do normalized after calculating final hash string distance, that is, is mapped between 0-100:
Wherein, m is final hash string distance, l1, l2For hash string length, from the above equation, we can see that, editing distance is smaller, obscures Hash fiducial value is bigger, i.e., text is more similar.
As shown in figure 12, the webshell detection system provided in an embodiment of the present invention that hash is obscured based on cum rights, it is described The webshell detection system for obscuring hash based on cum rights includes:
Timeslicing parameters determining module 1 obscures hash for cum rights and carries out the detection of fragment trigger condition, rolls to text hash。
Every is asked hash and weight module 2, and English uses hash algorithm FNV-1 32 strong after the entire file fragmentation of determination Each fragment content is calculated, obtains one 32 hash values.
It calculates cum rights and obscures hash value module 3, for carrying out the corresponding character of the hash of fragment to be spliced into mould in order Hash string is pasted, while later by the splicing of total weight, and total weight is the addition of every weight.
Cum rights hash value comparison module 4, similarity judgement;The cum rights of file to be detected obscures hash value and is stored in advance in Each cum rights obscures hash value and successively compares in fingerprint base, if there is any one cum rights in fingerprint base obscures hash value and institute It is more than threshold value that the cum rights for stating file to be detected, which obscures the similarity of hash value, then it is assumed that the file to be detected is webshell.
Below with reference to experiment, the invention will be further described.
The present invention has downloaded 2500 in disclosed data set from network to test the antialiasing ability of cum rights hash algorithm A webshell sample and 5000 normal samples have carried out three experiments, have been respectively compared cum rights hash and traditional fuzzy hash To the detection effect of the webshell after obscuring.From the point of view of theory analysis and experimental result, cum rights obscures hash to webshell Detection effect more better than traditional fuzzy hash can be obtained really and possess better anti-interference ability by being detected.
Experiment one: randomly select part webshell file in proportion with the meaningless random string of radom insertion and intentionally The code of justice calculates separately the webshell cum rights hash value after being confused and traditional fuzzy hash value as obscuring, thus than The more different influences for obscuring ratio to two kinds of hash value sizes.
The experimental results showed that the same subgraph is two kinds after obscuring character to same webshell file insertion difference The variation of hash value, abscissa are that the random of insertion obscures the ratio that character accounts for former webshell text, and ordinate is hash value, Weight broken line is cum rights hash value, and tradition is traditional fuzzy hash value.From figure, it can be seen that be directed to same file Obscure, cum rights hash value will be consistently greater than traditional fuzzy hash value, and hash value is bigger, then documentary evidence is more similar, also It is to say, when the fuzzy hash of cum rights detects webshell, antialiasing ability is higher than traditional fuzzy hash.
Experiment two: having randomly selected 2 webshell samples, calculates separately their cum rights hash value and tradition hash Value, and repeat to randomly select 100 times, two kinds of obtained hash values are as shown in the figure.
For experiment two, randomly selects two different files and carry out the calculating of hash value, ordinate is hash value, and weight is Cum rights hash value, tradition are traditional fuzzy hash values, judging from the experimental results, the band of overwhelming majority difference Documents Comparison Hash value is weighed all below traditional fuzzy hash value, and hash value is smaller, it was demonstrated that file is more dissimilar, so the mistake of cum rights hash Sentence also lower than traditional fuzzy hash.Meanwhile in testing two figures, also there are several abnormal points, cum rights hash value is significantly greater than Fuzzy hash value, the reason of generating abnormal point are that weight collision occurred, and so-called weight collision is that is, two files may very not phase Seemingly, but the weight of the fuzzy hash of its cum rights may be relatively.
Experiment three: 5 webshell samples and 1000 normal samples are had chosen, by the cum rights of 5 webshell samples Fuzzy hash value and traditional fuzzy hash value are saved respectively as fingerprint base.Then this 5 black samples are obscured, the side of obscuring Formula is identical as experiment one, i.e., is inserted into meaningless character and significant code in proportion at random.Black sample and white sample after obscuring This is mixed, and is detected later using the fingerprint base prestored to all samples.
The experimental results showed that the abscissa of all experimental result pictures obscures ratio i.e. institute for the webshell sample of experiment The ratio that character accounts for original sample is obscured in insertion, and it is 99 that Fig. 4-Fig. 6, which obtains detection threshold, i.e., the hash value compared with fingerprint base is greater than etc. Just it is judged as webshell in 99.It can be seen that from Fig. 3-Fig. 6 when not making to obscure, cum rights obscures hash and traditional fuzzy Hash is attained by 100% to the webshell accuracy rate detected, with the increase for obscuring ratio, under the verification and measurement ratio of the two has Drop, traditional fuzzy hash is directly reduced to zero, and cum rights obscures hash and remains to detect webshell, and effect is also relatively good, but with This simultaneously, cum rights obscure hash also have certain false detection rate, as shown in fig. 6, experiment in false detection rate be 0.4%, and generate mistake The reason of inspection is that there are weight collisions, to produce erroneous detection.And the setting of the size and detection threshold of false detection rate also has relationship, If Fig. 7-Fig. 9 experiment set weight as 96, false detection rate has been increased to 0.8%, but at the same time, it detects accuracy and also obtains It is promoted, obscures ratio when 20% and 30%, accuracy rate has been increased to 40% from 20%, obscures ratio at 50%, quasi- True rate has then been increased to 20% from 0.
In embodiments of the present invention, Fig. 2 is after experiment a pair of webshell provided in an embodiment of the present invention obscures in proportion Hash value (being inserted into meaningless character).
Fig. 3 is that the hash value after experiment a pair of webshell provided in an embodiment of the present invention obscures in proportion (is inserted into intentional The code annotation of justice).
Figure 10 is three webshell false detection rates of experiment provided in an embodiment of the present invention (significant to obscure, threshold values 99).
Figure 11 is that experiment one provided in an embodiment of the present invention is detected to survey quantity (meaningless to obscure, the threshold values of webshell 96)。
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one Computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from one A web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL) Or wireless (such as infrared, wireless, microwave etc.) mode is carried out to another web-site, computer, server or data center Transmission).The computer-readable storage medium can be any usable medium or include one that computer can access The data storage devices such as a or multiple usable mediums integrated server, data center.The usable medium can be magnetic Jie Matter, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (9)

1. a kind of webshell detection method for obscuring hash based on cum rights, which is characterized in that described to obscure hash based on cum rights Fuzzy hash algorithm of the webshell detection method based on weight distribution, to each fragment assign weight, dangerous function Core fragment give big weight;
In conjunction with the comentropy of each fragment, information entropy is bigger, and the weight given is smaller, improves dangerous function to last fuzzy The influence of hash result, while reducing the meaningless influence upset to last fuzzy hash result;
The hash of each fragment be spliced into fuzzy hash go here and there and calculate total weight value obtain cum rights obscure hash value;By text to be detected The cum rights of part obscures hash value and realizes compared with being stored in advance in each webshell cum rights in fingerprint base and obscuring hash value successively Detection to webshell and webshel mutation.
2. the webshell detection method of hash is obscured based on cum rights as described in claim 1, which is characterized in that described to be based on Cum rights obscure hash webshell detection method include:
Step 1, timeslicing parameters determine;Cum rights obscures hash and carries out fragment trigger condition using the weak hash function of Alder-32 Detection, is rolling hash to file to be detected;
Step 2, every is asked hash and weight;Using hash algorithm FNV-1 32 strong to each after the entire file fragmentation of determination A fragment content is calculated, and obtains one 32 hash values;
Step 3 calculates cum rights and obscures hash value;It carries out the corresponding character of the hash of fragment to be spliced into fuzzy hash in order String, while the splicing of total weight being gone here and there below in fuzzy hash, and total weight is the addition of every weight, obtains file band to be detected Weigh hash value;
Step 4, cum rights obscure hash value and compare, similarity judgement;The cum rights of file to be detected obscures hash value and is stored in advance Each cum rights obscures hash value and successively compares in fingerprint base, if exist in fingerprint base any one cum rights obscure hash value with The similarity that the cum rights of the file to be detected obscures hash value is more than threshold value, then it is assumed that the file to be detected is webshell。
3. the webshell detection method of hash is obscured based on cum rights as claimed in claim 2, which is characterized in that step 2 In, while calculating every hash, cum rights hash calculates every weight simultaneously;And different power is assigned to each fragment Value;
Fragment weight equation are as follows:
Wherein w is fragment weight, and K be that take 2, D be dangerous function number in fragment to coefficient default, I for fragment comentropy.
4. the webshell detection method of hash is obscured based on cum rights as claimed in claim 2, which is characterized in that step 4 In, in advance collect webshell sample and calculate its cum rights obscure hash value be stored in fingerprint base;The band of file to be detected Fuzzy hash value is weighed compared with being stored in each cum rights in fingerprint base and obscuring hash value successively, calculates file to be detected first Hash character string obscures the editing distance d of hash value with cum rights in fingerprint base is stored in, then calculate final hash string away from From:
M is final hash string distance, and d is the editing distance of two hash character strings, and L is the length of longer hash string, f (x) For the comparison formula of two total weights of hash character string:
k1, k2For coefficient, default takes 4 and 2/5, W1, W2Total weight of respectively two hash character strings;The codomain of f (x) be [- 1, 1], work as W1, W2When difference is smaller, the value of f (x) is positive, and as difference is gradually increased, the value of f (x) becomes negative;
After calculating hash string distance, normalized is done to hash string distance and calculates phase knowledge and magnanimity, maps to 0-100:
Wherein, m is final hash value distance, l1, l2For hash value length, editing distance is smaller, and fuzzy hash similarity h is bigger, Text is more similar.
5. a kind of webshell detection method for obscuring hash based on cum rights as described in claim 1 obscures hash based on cum rights Webshell detection system, which is characterized in that the webshell detection system for obscuring hash based on cum rights includes:
Timeslicing parameters determining module obscures hash for cum rights and carries out the detection of fragment trigger condition, is rolling hash to text;
Every is asked hash and weight module, for using hash algorithm FNV-1 32 strong to every after the entire file fragmentation of determination One fragment content is calculated, and obtains one 32 hash values;
It calculates cum rights and obscures hash value module, for carrying out the corresponding character of the hash of fragment to be spliced into fuzzy hash in order String, while later by the splicing of total weight, and total weight is the addition of every weight;
Cum rights obscures hash value comparison module, and the cum rights of file to be detected obscures hash value and is stored in advance in fingerprint base each Cum rights obscures hash value and successively compares, if there is any one cum rights in fingerprint base obscures hash value and the file to be detected Cum rights obscure hash value similarity be more than threshold value, then it is assumed that the file to be detected is webshell.
6. a kind of webshell for obscuring hash based on cum rights detects program, it is applied to terminal, which is characterized in that described to be based on band The webshell detection program for weighing fuzzy hash, which is realized, obscures hash's based on cum rights described in Claims 1 to 5 any one Webshell detection method.
7. a kind of terminal, which is characterized in that the terminal, which is carried, to be realized described in Claims 1 to 5 any one based on cum rights mould Paste the processor of the webshell detection method of hash.
8. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed Benefit requires the webshell detection method for obscuring hash described in 1~5 any one based on cum rights.
9. a kind of realize the webshell detection method network for obscuring hash described in Claims 1 to 5 any one based on cum rights Space safety defends platform.
CN201910311319.9A 2019-04-18 2019-04-18 Webshell detection method based on weighted fuzzy hash Active CN110034921B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910311319.9A CN110034921B (en) 2019-04-18 2019-04-18 Webshell detection method based on weighted fuzzy hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910311319.9A CN110034921B (en) 2019-04-18 2019-04-18 Webshell detection method based on weighted fuzzy hash

Publications (2)

Publication Number Publication Date
CN110034921A true CN110034921A (en) 2019-07-19
CN110034921B CN110034921B (en) 2022-04-15

Family

ID=67238943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910311319.9A Active CN110034921B (en) 2019-04-18 2019-04-18 Webshell detection method based on weighted fuzzy hash

Country Status (1)

Country Link
CN (1) CN110034921B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995440A (en) * 2019-11-21 2020-04-10 腾讯科技(深圳)有限公司 Work history confirming method, device, equipment and storage medium
CN112487432A (en) * 2020-12-10 2021-03-12 杭州安恒信息技术股份有限公司 Method, system and equipment for malicious file detection based on icon matching
CN112600797A (en) * 2020-11-30 2021-04-02 泰康保险集团股份有限公司 Method and device for detecting abnormal access behavior, electronic equipment and storage medium
CN113132341A (en) * 2020-01-16 2021-07-16 深信服科技股份有限公司 Network attack behavior detection method and device, electronic equipment and storage medium
CN113162761A (en) * 2020-09-18 2021-07-23 广州锦行网络科技有限公司 Webshell monitoring system
CN113239352A (en) * 2021-04-06 2021-08-10 中国科学院信息工程研究所 Webshell detection method and system
CN115438342A (en) * 2022-09-13 2022-12-06 武汉思普崚技术有限公司 Webshell detection method and related equipment
CN117591119A (en) * 2023-11-01 2024-02-23 国家计算机网络与信息安全管理中心 Mass APK source code feature extraction and similarity analysis method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599242A (en) * 2016-12-20 2017-04-26 福建六壬网安股份有限公司 Webpage change monitoring method and system based on similarity calculation
CN106911686A (en) * 2017-02-20 2017-06-30 杭州迪普科技股份有限公司 WebShell detection methods and device
CN107423309A (en) * 2016-06-01 2017-12-01 国家计算机网络与信息安全管理中心 Magnanimity internet similar pictures detecting system and method based on fuzzy hash algorithm
US20180218153A1 (en) * 2017-01-31 2018-08-02 Hewlett Packard Enterprise Development Lp Comparing structural information of a snapshot of system memory
CN108985057A (en) * 2018-06-27 2018-12-11 平安科技(深圳)有限公司 A kind of webshell detection method and relevant device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423309A (en) * 2016-06-01 2017-12-01 国家计算机网络与信息安全管理中心 Magnanimity internet similar pictures detecting system and method based on fuzzy hash algorithm
CN106599242A (en) * 2016-12-20 2017-04-26 福建六壬网安股份有限公司 Webpage change monitoring method and system based on similarity calculation
US20180218153A1 (en) * 2017-01-31 2018-08-02 Hewlett Packard Enterprise Development Lp Comparing structural information of a snapshot of system memory
CN106911686A (en) * 2017-02-20 2017-06-30 杭州迪普科技股份有限公司 WebShell detection methods and device
CN108985057A (en) * 2018-06-27 2018-12-11 平安科技(深圳)有限公司 A kind of webshell detection method and relevant device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BAN XIAOFANG ET AL.: "Malware Variant Detection Using Similarity Search over Content Fingerprint", 《2014 IEEE》 *
邸宏宇等: "一种基于改进模糊哈希的文件比较算法研究", 《信息网络安全 》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995440A (en) * 2019-11-21 2020-04-10 腾讯科技(深圳)有限公司 Work history confirming method, device, equipment and storage medium
CN110995440B (en) * 2019-11-21 2022-08-09 腾讯科技(深圳)有限公司 Work history confirming method, device, equipment and storage medium
CN113132341A (en) * 2020-01-16 2021-07-16 深信服科技股份有限公司 Network attack behavior detection method and device, electronic equipment and storage medium
CN113132341B (en) * 2020-01-16 2023-03-21 深信服科技股份有限公司 Network attack behavior detection method and device, electronic equipment and storage medium
CN113162761A (en) * 2020-09-18 2021-07-23 广州锦行网络科技有限公司 Webshell monitoring system
CN113162761B (en) * 2020-09-18 2022-02-18 广州锦行网络科技有限公司 Webshell monitoring system
CN112600797A (en) * 2020-11-30 2021-04-02 泰康保险集团股份有限公司 Method and device for detecting abnormal access behavior, electronic equipment and storage medium
CN112487432A (en) * 2020-12-10 2021-03-12 杭州安恒信息技术股份有限公司 Method, system and equipment for malicious file detection based on icon matching
CN113239352A (en) * 2021-04-06 2021-08-10 中国科学院信息工程研究所 Webshell detection method and system
CN115438342A (en) * 2022-09-13 2022-12-06 武汉思普崚技术有限公司 Webshell detection method and related equipment
CN117591119A (en) * 2023-11-01 2024-02-23 国家计算机网络与信息安全管理中心 Mass APK source code feature extraction and similarity analysis method

Also Published As

Publication number Publication date
CN110034921B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN110034921A (en) The webshell detection method of hash is obscured based on cum rights
CN111382430B (en) System and method for classifying objects of a computer system
CN107204960B (en) Webpage identification method and device and server
JP6697123B2 (en) Profile generation device, attack detection device, profile generation method, and profile generation program
KR100894331B1 (en) Anomaly Detection System and Method of Web Application Attacks using Web Log Correlation
JP2019079493A (en) System and method for detecting malicious files using machine learning
JP2020115320A (en) System and method for detecting malicious file
CN108399338A (en) Platform integrity status measure information method based on process behavior
EP2284752B1 (en) Intrusion detection systems and methods
CN107463844B (en) WEB Trojan horse detection method and system
CN111756724A (en) Detection method, device and equipment for phishing website and computer readable storage medium
CN115086060B (en) Flow detection method, device, equipment and readable storage medium
EP2977928B1 (en) Malicious code detection
JP6777612B2 (en) Systems and methods to prevent data loss in computer systems
CN113746952A (en) DGA domain name detection method, device, electronic equipment and computer storage medium
CN112380537A (en) Method, device, storage medium and electronic equipment for detecting malicious software
CN105243327A (en) Security processing method for files
KR101526500B1 (en) Suspected malignant website detecting method and system using information entropy
KR101327865B1 (en) Homepage infected with a malware detecting device and method
Noh et al. Phishing Website Detection Using Random Forest and Support Vector Machine: A Comparison
CN114510720A (en) Android malicious software classification method based on feature fusion and NLP technology
KR20210024748A (en) Malware documents detection device and method using generative adversarial networks
CN115809466B (en) Security requirement generation method and device based on STRIDE model, electronic equipment and medium
KR102498265B1 (en) Privacy preserving applications and device fault detection
KR102448784B1 (en) Method for providing weighting using device fingerprint, recording medium and device for performing the method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20221108

Address after: 100029 Beijing city Chaoyang District Yumin Road No. 3

Patentee after: NATIONAL COMPUTER NETWORK AND INFORMATION SECURITY MANAGEMENT CENTER

Address before: 610225, No. 24, Section 1, Xuefu Road, Southwest Economic Development Zone, Chengdu, Sichuan

Patentee before: CHENGDU University OF INFORMATION TECHNOLOGY

Patentee before: NATIONAL COMPUTER NETWORK AND INFORMATION SECURITY MANAGEMENT CENTER

TR01 Transfer of patent right