CN102663296B - Intelligent detection method for Java script malicious code facing to the webpage - Google Patents

Intelligent detection method for Java script malicious code facing to the webpage Download PDF

Info

Publication number
CN102663296B
CN102663296B CN201210092707.0A CN201210092707A CN102663296B CN 102663296 B CN102663296 B CN 102663296B CN 201210092707 A CN201210092707 A CN 201210092707A CN 102663296 B CN102663296 B CN 102663296B
Authority
CN
China
Prior art keywords
script
javascript
sample
gram
storehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210092707.0A
Other languages
Chinese (zh)
Other versions
CN102663296A (en
Inventor
范渊
陈铁明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN201210092707.0A priority Critical patent/CN102663296B/en
Publication of CN102663296A publication Critical patent/CN102663296A/en
Application granted granted Critical
Publication of CN102663296B publication Critical patent/CN102663296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the internet safety technology, and is designed to provide an intelligent detection method for Java script malicious code facing to the webpage. The method comprises three processes of optimization of sample, safety detecting and renewal of optimization. A classical N-gram statistical model and a KNN grader are efficiently fused by means of the efficient mixing of static detecting and dynamic detecting. The dynamic behavioural analysis of the code is realized by setting up N-gram characteristic of a Java Script machine code operation order. The efficiency of the static classification detection is largely improved by the optimization of the sample relayed on the KNN grader. The suggested intelligent detecting method is specially provided with operability, the step of optimization of sample can ensure that the classification effectiveness is not decreased when a training script is enlarged, and the step of the safety detecting can ensure to execute the high-efficient intelligent detecting based in the optimization of sample, and the step of the renewal of optimization can ensure thag the precision of the intelligent detecting is not be declined when new malicious script is increased. The method has the advantages of providing the ability of detecting the new malicious script and the dynamic optimization adjust ability for continuing work.

Description

The intelligent detecting method of object web page JavaScript malicious code
Technical field
The present invention relates to a kind of JavaScript malicious code intelligent detecting method.
Background technology
Malicious code is one of important form threatening computer security, is one section of computer code or program (one section of instruction) in essence, and this section of code can perform a series of operation comprising malicious intentions according to the wish of assailant; The form of code may be executable code instruction, script, word processing macrolanguage or other types.Typical malicious code comprises virus, worm and Trojan Horse.
The present invention's research to as if the JavaScript script that can be embedded in webpage, be a kind of based on object and event driven client-side scripting language.JavaScript makes to achieve between webpage and user relation that is a kind of real-time, dynamic, interactivity, make webpage can comprise how active element and more excellent content, but also make hackers more easily write and run malicious code, from network, such as automatically can load other malicious script, energy operation pages Document object, the html interface that operation user sees, can obtain or ask to input the data such as valuable account number cipher to user, and send request of data to server under the sun.Meanwhile, hacker can also use JavaScript to attack the leak of browser, and this attack may cause browser collapse, RAM leakage etc.In the face of these safety problems, need badly and the safety problem of JavaScript is furtherd investigate, improve the detectability of the malicious script to JavaScript, ensure the safety of internet, applications.
Malicious Code Detection technology has become an important directions of information security field, and has achieved very many achievements in research.The detection technique of malicious code mainly divides Static Detection and detection of dynamic two kinds according to adopting the difference of analytic target, and Static Detection analyzes the text feature of code, and detection of dynamic is then the analysis to code act of execution.
The typical method of Static Detection is the detection technique based on signature, mainly based on the thought of pattern match, creates malicious code storehouse for often kind of known malicious code produces a unique signature character mark.These signature characters analyze Virus Sample by industry specialists, carries out manual extraction, the peculiar property of a signature mark one particular malicious code.Performing step based on endorsement method is as follows:
(1) known malicious code sample is gathered;
(2) in malicious code sample, malicious code signature feature is extracted;
(3) signature is included in malicious code data storehouse;
(4) file is detected.If containing the signature in malicious code storehouse in file to be checked, namely judge that this file is malicious code or by malicious code infections.
Be current most convenient, most widely used detection method based on endorsement method, the virus killing product of a lot of business is all adopt this technology.Its advantage is that detection speed is fast, and existing malicious code in virus base, can accurately detect, rate of false alarm is lower.Shortcoming is helpless to emerging virus, must constantly update version, adds the feature of new virus in virus base.
The typical method of detection of dynamic is behavior-based detection technology, generally needs Dynamic Execution code or virtual run time version, utilizes the peculiar behavioural characteristic of virus to monitor virus.By to viral years of researches, find that there is the joint act that some behaviors are malicious codes, and very special, seldom comprise these behaviors in normal code.Some typical malicious act features are as follows:
(1) No. 13H, INT interruption is seized.Boot-type virus can attack Boot sector or Master boot sector, and places the code needed for virus wherein, and when system starts, Boot sector or Master boot sector can perform INT 13H function, and viral code will be loaded.
(2) Installed System Memory total amount is revised.Virus, in order to complete the specific functions such as infection and destruction, will reduce Installed System Memory total amount, and make system and other application program can not occupy its space, and make self to reside in internal memory.
(3) write operation is performed to specific file.Because virus depends on and gives birth to, so when virus performs, native codes will be attached among infected file, and make infected file have abnormal write operation.
(4) monitoring system calling sequence.System call is the unique interface of user application and operating system, and the malice that some system call sequence can embody to a certain degree is semantic.
Therefore, behavior-based detection method can detect some emerging unknown virus, and the difficult point of its research is to extract malicious act feature, and system overhead is larger.
In sum, Static Detection efficiency is high, but cannot detect new malicious code; Dynamic detection technology can detect new malicious code, but efficiency is not high, and it is large that behavioural characteristic extracts difficulty, poor operability.In view of this, researchist pays close attention to how automatically to detect emerging malicious code efficiently, and the method for automatic categorizer just becomes a kind of hot spot technology in anti-virus field.In fact, along with the application of data mining technology, data mining technology is applied to Malicious Code Detection and has achieved good experiment effect.At present, the Malicious Code Detection based on data mining and machine learning starts to be paid close attention to more and more, has become a new study hotspot.
But, although machine learning method is applied to Malicious Code Detection field and has achieved more achievement in research, but the main object of at present research mainly concentrates the executable file to Windows system, the current web virus the fastest with the JavaScript script velocity of propagation that is representative is detected and also lacks further investigation.In fact, Code Obfuscation Security Technology is applied in JavaScript script edit more and more, such as code compaction, substitute, restructuring, redundancy interference and encryption etc., the script generated by special obfuscation often successfully can escape the detection of the static detecting tool of feature based code.Therefore, the new efficient JavaScript malicious script detection method that research stationary detection technique and dynamic detection technology merge mutually will be a kind of trend.
Summary of the invention
The technical problem to be solved in the present invention is, for the ubiquitous Code obfuscation phenomenon of JavaScript malicious script, the shortcoming of new malicious code cannot be detected in order to overcome stationary detection technique, and solve the problems such as the lower and feature extraction of efficiency that dynamic behaviour detection technique exists is more difficult, provide a kind of based on renewable preferred sample, without the need to static code condition code and dynamic behaviour condition code, the novel intelligent detection method that can detect new malicious code, general stalwartness.
The technical solution adopted for the present invention to solve the technical problems is:
There is provided a kind of intelligent detecting method of object web page JavaScript malicious code, the method comprises preferred sample, safety detection, renewal preferably three processes, is specially:
Preferred sample: utilize N-gram language statistics method and machine learning algorithm KNN, by carrying out machine learning to the JavaScript script in JavaScript script training storehouse, generating the JavaScript being used for safety detection and detecting Sample Storehouse;
Safety detection: extract JavaScript script for webpage URL to be detected, the JavaScript set up based on preferred sample processes detects Sample Storehouse, detects the webpage of specifying whether comprise JavaScript malicious code by KNN sorting algorithm;
Upgrade preferably: the accuracy of statistics safety detection, if accuracy of detection remains in the scope of setting, then sustainable enabling detects Sample Storehouse execution safety detection through preferred JavaScript; If otherwise accuracy of detection decline and exceed preset range, then all having completed is detected and the JavaScript script causing accuracy of detection to decline is inserted in JavaScript script training storehouse, re-starts the detection Sample Storehouse that preferred sample obtains renewal; In this process, keep preferred detection Sample Storehouse quantity constant with the efficiency detected that ensures safety.
In the present invention, described JavaScript detects Sample Storehouse and comprises malicious code N-gram sample and benign code N-gram sample.
In the present invention, in the process of described preferred sample, by determining following parameter to the analysis of training script: the accuracy of P, JavaScript safety detection; N, N-gram size parameter, N f, N-gram frequency statistics threshold value, namely represent the frequency of occurrences in JavaScript script training storehouse the highest before N findividual N-gram; N °, the malice sample in representative preferred detection Sample Storehouse and the total quantity of optimum sample.Specifically comprise the steps:
(1) gather current representative JavaScript malicious script and optimum script, form the script training storehouse reaching ten thousand number of stages;
(2) adopt the JavaScript analytics engine V8 that increases income of Google, compiling JavaScript script obtains V8 machine code, the sequence of operation of onestep extraction machine code of going forward side by side;
(3) be base unit with handling function, calculate the sequence of operation N-gram of each malice training script and optimum training script, and preserve the highest front N of frequency findividual N-gram; Malicious script in note JavaScript script training storehouse and the quantity of optimum script are respectively n mand n b, training script total amount is n=n m+ n b; By the N of each training script calculated findividual N-gram set is designated as (i=1,2 ..., n m) and (i=1,2 ..., n b), the frequency values that each N-gram occurs is designated as respectively (i=1,2 ..., n m) and (i=1,2 ..., n b), here to not gathering or in N-gram s ', namely order i=1,2 ..., n m;
(4) select KNN sorter (getting K=i), sorting algorithm is described below: the front N calculating the JavaScript script machine code sequence of operation to be sorted findividual N-gram, is designated as S set f, the frequency values that each N-gram occurs is designated as f (s), s ∈ S f; Obtain satisfied i=1,2 ..., n mi=j m, meet i=1,2 ..., n bi=j b; If d m< d bthen judge that this script is as malicious code, jth mnamely individual malice training script is selected as once as malice sample; Otherwise be benign code, jth bnamely individual optimum training script is selected as once as optimum sample;
(5) for total amount be the training script storehouse of n, the cross validation taking KNN to classify is tested, and specifically training script can be divided into respectively with part (n of selection mand n bbe all the multiple of N °), random each portion of selecting is as KNN training data, and remainder is all as test data; When logging test results is correct, each training script is elected to be the cumulative number of sample by KNN sorter; Finally according to the height of cumulative number, N ° detection Sample Storehouse be made up of the N-gram of malicious code and benign code respectively before selecting respectively, the note malicious code N-gram detected in Sample Storehouse gathers and benign code N-gram gathers and is respectively (i=1,2 ..., N °) and (i=1,2 ..., N °), in each set, the frequency values of each N-gram is designated as respectively (i=1,2 ..., N °) and (i=1,2 ..., n b).
In the present invention, in described safety detection process, comprise the steps:
(1) according to the webpage URL specified, embedded JavaScript code is extracted as script to be detected;
(2) the JavaScript analytics engine V8 that increases income performing Google obtains JavaScript machine code, the onestep extraction sequence of operation of going forward side by side;
(3) the front N of script operation sequence to be detected is calculated findividual N-gram, note N-gram set is S f, the frequency of occurrences value of each N-gram is designated as f (s), s ∈ S f;
(4) the KNN sorting algorithm of K=1 is utilized to detect S fwhether be the N-gram of malicious code, basic process is as follows: calculate
d m = min ( &Sigma; s &Element; S f U S i fm ( f ( s ) - f i fm ( s ) f ( s ) + f i fm ( s ) ) 2 ) , i=1,2,...,N°,
d b = min ( &Sigma; s &Element; S f US i fb ( f ( s ) - f i fb ( s ) f ( s ) + f i fb ( s ) ) 2 ) , i=1,2,...,N°,
Here min () function representation gets minimum value.If d m< d bthen judge that this script is as malicious code, otherwise be benign code.
In the present invention, in described renewal preferred process, comprise the steps:
(1) record JavaScript script when each safety detection is failed to report or reported by mistake, and first directly added to by its N-gram and preferably detect in Sample Storehouse, JavaScript detects Sample Storehouse and increases;
(2) the error rate P after accumulative each safety detection f, work as P ftime > 2 (1-P) (P is here the Malicious Code Detection accuracy of setting), by n fthe wrong script of individual detection all joins in existing n script training storehouse, and namely script training storehouse size becomes n=n+n f, again perform preferred sample processes, the JavaScript regaining 2N ° size detects Sample Storehouse (N ° malice detects sample and N ° optimum detection sample), to keep accuracy P and sorting algorithm execution efficiency not to decline;
(3) if the P in step (2) f> 2 (1-P) is false, then repeated execution of steps (1).
Beneficial effect of the present invention is mainly manifested in:
(1) high efficiency method that a kind of Static Detection and detection of dynamic effectively mix is proposed, by the N-gram statistical model of classics and KNN sorter effective integration, can realize the dynamic behaviour analysis to code by the N-gram feature setting up the JavaScript machine code sequence of operation, the sample relied on by preferred KNN sorter can improve the efficiency that static classification detects greatly.
(2) preferred sample, safety detection, renewal preferably three relatively independent parts are proposed; make the unique operability of intelligent detecting method proposed by the invention; namely preferred sample can guarantee that classification effectiveness can not reduce along with the increase in training script storehouse; safety detection can be guaranteed to perform efficient Intelligent Measurement based on preferred sample, upgrades and preferably then can guarantee that the precision of Intelligent Measurement can not decline with the increase of new malicious script.
(3) along with the continuous increase of training script and the development of JavaScript technology, N, N involved in the method that the present invention proposes f, the major parameter such as N °, P appropriately adjusts by machine learning and experimental analysis, the dynamic optimization adjustment capability that ability intelligent detecting method being possessed better detect new malicious script and continuous firing produce.
(4) arbitrary JavaScript code obfuscation can effectively be shielded by research machine code operations sequence signature, and because adopt JavaScript analytics engine V8 can obtain the machine code of any JavaScript script, the method that therefore the present invention proposes also can support the safety detection to the JavaScript code fragment extracted in webpage URL.
(5) all algorithms involved by JavaScript malicious code intelligent detecting method disclosed by the invention and implementation step, simple and practical, efficient low-consume, is easy to realize modular develop and field on all kinds of platform.
Accompanying drawing explanation
The Intelligent Measurement flow process of Fig. 1 object web page JavaScript malicious code;
Fig. 2 is based on the preferred sample basic process of JavaScript machine code sequence of operation N-gram and KNN sorter;
Fig. 3 adopts the machine code after JavaScript analytics engine V8 compiling JavaScript script and extraction machine code operations sequence diagram.
Embodiment
First it should be noted that, the present invention relates to the application of the software engineerings such as search engine, is that computer technology is applied in the one of internet arena.In implementation procedure of the present invention, the application of multiple software function module can be related to.Applicant thinks, as reading over application documents, accurate understanding is of the present invention realize principle and goal of the invention after, when in conjunction with existing known technology, those skilled in the art can use its software programming technical ability grasped to realize the present invention completely.This category of all genus that all the present patent application files are mentioned, applicant will not enumerate.
Preferred sample utilizes N-gram language statistics method and machine learning algorithm KNN (data analysis technique of both generally acknowledging), by determining following parameter to the analysis of training script: P (accuracy of JavaScript safety detection), N (N-gram size parameter N), N f(N-gram frequency statistics threshold value, namely represent the frequency of occurrences in JavaScript script training storehouse the highest before N findividual N-gram), N ° (representing the quantity that preferred JavaScript detects malice sample and optimum sample in Sample Storehouse), final generation size is that the preferred JavaScript of 2N ° detects Sample Storehouse, mainly comprises malicious code sample N-gram and benign code sample N-gram.Basic process as shown in Figure 2, mainly comprises the steps:
(1) gather current representative JavaScript malicious script and optimum script, form the JavaScript script training storehouse reaching ten thousand number of stages;
(2) adopt the JavaScript analytics engine V8 of Google, compiling JavaScript script obtains V8 machine code, the sequence of operation (as shown in Figure 3) of onestep extraction machine code of going forward side by side;
(3) take handling function as base unit, calculate the N-gram of each script (malicious script and optimum script) the machine code sequence of operation in script training storehouse, and preserve the highest front N of the frequency of occurrences findividual N-gram.Note malice and optimum script quantity are respectively n mand n b, script total amount is n=n m+ n b; By the N of each script calculated findividual N-gram set is designated as (i=1,2 ..., n m) and (i=1,2 ..., n b), the frequency values that each N-gram occurs is designated as respectively (i=1,2 ..., n m) and (i=1,2 ..., n b), here to not gathering or in N-gram s ', namely regulation i=1,2 ..., n m.
(4) select KNN sorter (getting K=1), sorting algorithm is described below: the front N calculating the JavaScript script machine code sequence of operation to be sorted findividual N-gram, is designated as S set f, the frequency values that each N-gram occurs is designated as f (s), s ∈ S f.Obtain satisfied i=1,2 ..., n mi, and be designated as i=j m, obtain satisfied i=1,2 ..., n bj, be designated as i=j b.If d m< d bthen judge that this script is as malicious code, jth in script training storehouse mnamely individual malicious script is selected as once as the detection sample of malice; Otherwise be benign code, jth bnamely individual optimum script is selected as once as optimum detection sample.
(5) for total amount be the training script storehouse of n, the cross validation taking KNN to classify is tested, and specifically training script can be divided into respectively with part (n of selection mand n bbe all the multiple of N °), random each portion of selecting is as KNN training data, and remainder is all as test data; When logging test results is correct, each training script is elected to be the cumulative number of sample by KNN sorter; Finally according to the height of cumulative number, before selecting respectively, N ° malicious script and optimum script are as the malice sample detected in sample and optimum sample, and are stored as N-gram set respectively, are designated as (i=1,2 ..., N °) (maliciously) and (i=1,2 ..., N °) (optimum), then remember that the frequency values of each N-gram in above-mentioned two set is respectively (i=1,2 ..., N °) and (i=1,2 ..., n b).
As the experimental demonstration to Selecting parameter in intelligent detecting method, we have obtained 5000 optimum scripts respectively from websites such as http://code.google.com and http://vx.netlux.org/ and malicious script forms training storehouse, analyze by experiment and determine at N=3, N fwhen=500, N °=100, classification accuracy rate can reach P > 95%, obtains ideal result, and can ensure higher execution efficiency.As a preferred scheme, also therefore determine that parameters is: N=3, N f=500, N °=100, P=95%.
Safety detection utilizes the detection sample of preferred Sample Establishing (i=1,2 ..., N °) and (i=1,2 ..., N °), the webpage of specifying is detected whether comprise JavaScript malicious code.Key step is as follows:
(1) according to the webpage URL specified, embedded JavaScript code is extracted as script to be detected;
(2) perform analytics engine V8 and obtain JavaScript machine code, the onestep extraction sequence of operation of going forward side by side;
(3) the front N of script operation sequence to be detected is calculated findividual N-gram, note N-gram set is S f, the frequency of occurrences value of each N-gram is designated as f (s), s ∈ S f;
(4) the KNN sorting algorithm of K=1 is utilized to detect S fwhether is the N-gram of malicious script, basic process is as follows:
Calculate
d m = min ( &Sigma; s &Element; S f U S i fm ( f ( s ) - f i fm ( s ) f ( s ) + f i fm ( s ) ) 2 ) , i=1,2,...,N°,
d b = min ( &Sigma; s &Element; S f US i fb ( f ( s ) - f i fb ( s ) f ( s ) + f i fb ( s ) ) 2 ) , i=1,2,...,N°,
If d m< d bthen judge that this script is as malicious code, otherwise be benign code.Preferred renewal utilizes the result of safety detection, and completing size is that (N ° malice sample and N ° optimum sample, be designated as set respectively for the detection Sample Storehouse of 2N ° (i=1,2 ..., N °) and (i=1,2 ..., N °)) reselect, ensure safety detect accuracy and execution efficiency.Key step is as follows:
(1) record each safety detection make a mistake (fail to report or report by mistake) time JavaScript script, and first its N-gram is directly added to and preferably detects in Sample Storehouse;
(2) the error rate P after accumulative each safety detection f, work as P ftime > 2 (1-P), by n fthe wrong script of individual detection all joins in existing n script training storehouse, script bank size n=n+n f, perform preferred sample processes, regain the detection Sample Storehouse of 2N ° size, to keep accuracy P and sorting algorithm execution efficiency not to decline;
(3) if the P in (2) f> 2 (1-P) is false, then repeat (1).
Here get P=95%, then work as P frefer to time > 2 (1-P) that current detection error rate reaches 10%, explanation utilizes current preferred detection sample, when adopting KNN sorter to differentiate new JavaScript script, there is obvious decay in Detection results, therefore the new script causing detecting error is fed back to as training script that to re-execute sample in script training storehouse preferred, obtain the detection Sample Storehouse of new KNN classification, ensure safety the efficiency and precision that detect.

Claims (2)

1. an intelligent detecting method for object web page JavaScript malicious code, is characterized in that, the method comprises preferred sample, safety detection, renewal preferably three processes, is specially:
Preferred sample: utilize N-gram language statistics method and machine learning algorithm KNN, by carrying out machine learning to the JavaScript script in JavaScript script training storehouse, generating the JavaScript being used for safety detection and detecting Sample Storehouse; Described JavaScript detects Sample Storehouse and comprises malicious code N-gram sample and benign code N-gram sample;
Safety detection: extract JavaScript script for webpage URL to be detected, the JavaScript set up based on preferred sample processes detects Sample Storehouse, detects the webpage of specifying whether comprise JavaScript malicious code by KNN sorting algorithm;
Upgrade preferably: the accuracy of statistics safety detection, if accuracy of detection remains in the scope of setting, then continue to enable and detect Sample Storehouse execution safety detection through preferred JavaScript; If accuracy of detection declines exceed preset range, then all having completed is detected and the JavaScript script causing accuracy of detection to decline is inserted in JavaScript script training storehouse, re-start preferred sample and obtain the JavaScript detection Sample Storehouse of renewal; In this process, keep preferred detection Sample Storehouse quantity constant with the efficiency detected that ensures safety;
In the process of described preferred sample, by determining following parameter to the analysis of training script: P, the i.e. accuracy of JavaScript safety detection; N, i.e. N-gram size parameter; N f, i.e. N-gram frequency statistics threshold value, represent the frequency of occurrences in JavaScript script training storehouse the highest before N findividual N-gram; N o, namely represent the quantity that preferred JavaScript detects malice sample in Sample Storehouse and optimum sample;
Specifically comprising the steps: of preferred sample
(1) gather current representative JavaScript malicious script and optimum script, form the JavaScript script training storehouse reaching ten thousand number of stages;
(2) adopt the JavaScript analytics engine V8 that increases income of Google, compiling JavaScript script obtains V8 machine code, the sequence of operation of onestep extraction machine code of going forward side by side;
(3) take handling function as base unit, calculate the N-gram of the machine code sequence of operation of each malicious script and optimum script in JavaScript script training storehouse, and preserve the highest front N of the frequency of occurrences findividual N-gram;
Note malice and optimum script quantity are respectively n mand n b, script total amount is n=n m+ n b; By the N of each script calculated findividual N-gram set is designated as with the frequency values that each N-gram occurs is designated as respectively f i m ( s ) , s &Element; S i fm ( i = 1,2 , . . . , n m ) With f i b ( s ) , s &Element; S i fb ( i = 1,2 , . . . , n b ) , Here to not gathering or in N-gram s ', namely s &prime; &NotElement; S i fm &cup; S i fb , Regulation f i m ( s &prime; ) = f i b ( s &prime; ) = 0 , i=1,2,...,n m
(4) select KNN sorter, get K=1; Sorting algorithm is described below: the front N calculating the JavaScript script machine code sequence of operation to be sorted findividual N-gram, is designated as S set f, the frequency values that each N-gram occurs is designated as f (s), s ∈ S f; Obtain satisfied d m = min ( &Sigma; s &Element; S f &cup; S i fm ( f ( s ) - f i fm ( s ) f ( s ) + f i fm ( s ) ) 2 ) , i = 1,2 , . . . , n m I, and be designated as i=j m, obtain satisfied d b = min ( &Sigma; s &Element; S f &cup; S i fb ( f ( s ) - f i fb ( s ) f ( s ) + f i fb ( s ) ) 2 ) , i = 1,2 , . . . , n b J, be designated as i=j b; If d m< d bthen judge that this script is as malicious code, jth in JavaScript script training storehouse mnamely individual malicious script is selected as once as the detection sample of malice; Otherwise be benign code, jth bnamely individual optimum script is selected as once as optimum detection sample;
(5) for total amount be the JavaScript training script storehouse of n, the cross validation taking KNN to classify is tested, and is specifically divided into respectively by training script with part, the n of selection mand n bbe all N omultiple, random each select a as KNN training data, remainder is all as test data; When logging test results is correct, each training script is elected to be the cumulative number of sample by KNN sorter; Finally according to the height of cumulative number, N before selecting respectively oindividual malicious script and optimum script as the malice sample detected in sample and optimum sample, and are stored as N-gram set respectively, are designated as (maliciously) and (optimum), then remember that the frequency values of each N-gram in above-mentioned two set is respectively f i m ( s ) , s &Element; S i fm ( i = 1,2 , . . . , N o ) With f i b ( s ) , s &Element; S i fb ( i = 1,2 , . . . , n b ) ;
In described safety detection process, comprise the steps:
(1) according to the webpage URL specified, embedded JavaScript code is extracted as script to be detected;
(2) the JavaScript analytics engine V8 performing Google obtains JavaScript machine code, the onestep extraction sequence of operation of going forward side by side;
(3) the front N of script operation sequence to be detected is calculated findividual N-gram, note N-gram set is S f, the frequency of occurrences value of each N-gram is designated as f (s), s ∈ S f;
(4) the KNN sorting algorithm of K=1 is utilized to detect S fwhether is the N-gram of malicious script, basic process is as follows:
Calculate:
d m = min ( &Sigma; s &Element; S f &cup; S i fm ( f ( s ) - f i fm ( s ) f ( s ) + f i fm ( s ) ) 2 ) , i = 1,2 , . . . , N o ,
d b = min ( &Sigma; s &Element; S f &cup; S i fb ( f ( s ) - f i fb ( s ) f ( s ) + f i fb ( s ) ) 2 ) , i = 1,2 , . . . , N o ,
If d m< d bthen judge that this script is as malicious code, otherwise be benign code.
2. method according to claim 1, is characterized in that, in described renewal preferred process, comprises the steps:
(1) record each safety detection make a mistake (fail to report or report by mistake) time JavaScript script, and first its N-gram has directly been added to preferably JavaScript and has detected in Sample Storehouse;
(2) the error rate P after accumulative each safety detection f, work as P ftime > 2 (1-P), by n fthe wrong script of individual detection all joins in existing n script training storehouse, JavaScript script training storehouse size n=n+n f, perform preferred sample processes, regain N othe JavaScript of individual size detects Sample Storehouse, to keep accuracy P and sorting algorithm execution efficiency not to decline;
(3) if the P in step (2) f> 2 (1-P) is false, then repeated execution of steps (1).
CN201210092707.0A 2012-03-31 2012-03-31 Intelligent detection method for Java script malicious code facing to the webpage Active CN102663296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210092707.0A CN102663296B (en) 2012-03-31 2012-03-31 Intelligent detection method for Java script malicious code facing to the webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210092707.0A CN102663296B (en) 2012-03-31 2012-03-31 Intelligent detection method for Java script malicious code facing to the webpage

Publications (2)

Publication Number Publication Date
CN102663296A CN102663296A (en) 2012-09-12
CN102663296B true CN102663296B (en) 2015-01-07

Family

ID=46772783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210092707.0A Active CN102663296B (en) 2012-03-31 2012-03-31 Intelligent detection method for Java script malicious code facing to the webpage

Country Status (1)

Country Link
CN (1) CN102663296B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014089744A1 (en) * 2012-12-10 2014-06-19 华为技术有限公司 Method and apparatus for detecting malicious code
CN103116494B (en) * 2013-01-25 2016-05-25 中兴通讯股份有限公司 Automatic test platform test output information extracting method and device
CN103258163B (en) * 2013-05-15 2015-08-26 腾讯科技(深圳)有限公司 A kind of script virus recognition methods, Apparatus and system
CN103559235B (en) * 2013-10-24 2016-08-17 中国科学院信息工程研究所 A kind of online social networks malicious web pages detection recognition methods
CN104636664B (en) * 2013-11-08 2018-04-27 腾讯科技(深圳)有限公司 Cross-site scripting attack leak detection method and device based on DOM Document Object Model
CN104134040B (en) * 2014-07-25 2017-03-29 中国人民解放军信息工程大学 A kind of binary malicious codes menace appraisal procedure based on information fusion
CN106485148A (en) * 2015-10-29 2017-03-08 远江盛邦(北京)网络安全科技股份有限公司 The implementation method of the malicious code behavior analysiss sandbox being combined based on JS BOM
CN106022132A (en) * 2016-05-30 2016-10-12 南京邮电大学 Real-time webpage Trojan detection method based on dynamic content analysis
CN106055980B (en) * 2016-05-30 2018-12-11 南京邮电大学 A kind of rule-based JavaScript safety detecting method
CN106529293B (en) * 2016-11-09 2019-11-05 东巽科技(北京)有限公司 A kind of sample class determination method for malware detection
CN107688744B (en) * 2017-08-31 2020-03-13 杭州安恒信息技术股份有限公司 Malicious file classification method and device based on image feature matching
CN107659570B (en) * 2017-09-29 2020-09-15 杭州安恒信息技术股份有限公司 Webshell detection method and system based on machine learning and dynamic and static analysis
CN108920956B (en) * 2018-07-03 2021-05-14 亚信科技(成都)有限公司 Machine learning method and system based on context awareness
CN109190372B (en) * 2018-07-09 2021-11-12 四川大学 JavaScript malicious code detection method based on bytecode
CN109254827B (en) * 2018-08-27 2022-04-22 电子科技大学成都学院 Virtual machine safety protection method and system based on big data and machine learning
CN110427755A (en) * 2018-10-16 2019-11-08 新华三信息安全技术有限公司 A kind of method and device identifying script file
CN113703780B (en) * 2020-05-22 2024-04-19 广州虎牙科技有限公司 Decompilation detection and webpage resource data sending method, device, equipment and medium
CN111723371B (en) * 2020-06-22 2024-02-20 上海斗象信息科技有限公司 Method for constructing malicious file detection model and detecting malicious file
CN112052451A (en) * 2020-08-17 2020-12-08 北京兰云科技有限公司 Webshell detection method and device
CN112685314A (en) * 2021-01-05 2021-04-20 广州知图科技有限公司 JavaScript engine security test method and test system
CN114595454B (en) * 2022-03-11 2024-04-02 西安电子科技大学 Malicious JS script detection method based on mixed analysis and feature fusion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034042A (en) * 2010-12-13 2011-04-27 四川大学 Novel unwanted code detecting method based on characteristics of function call relationship graph
CN102346829A (en) * 2011-09-22 2012-02-08 重庆大学 Virus detection method based on ensemble classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034042A (en) * 2010-12-13 2011-04-27 四川大学 Novel unwanted code detecting method based on characteristics of function call relationship graph
CN102346829A (en) * 2011-09-22 2012-02-08 重庆大学 Virus detection method based on ensemble classification

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
D.Krishna Sandeep Reddy等.New malicious code detection using variable length N-grams.《2nd International Conference on Information Systems Security》.2006,276-288. *
Jeremy Z. Kolter等.Learning to Detect Malicious Executables in the Wild.《Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining》.2004, *
Tony Abou-Assaleh等.Detection of new malicious code using N-grams signatures.《2nd Annual Conference on Privacy,Security and Trust》.2004, *
基于主动学习的计算机病毒检测方法研究;张勇等;《计算机与数字工程》;20111130;第39卷(第11期);第90页 *
基于加权信息增益的恶意代码检测方法;张小康等;《计算机工程》;20100320;第36卷(第6期);第149-151页 *

Also Published As

Publication number Publication date
CN102663296A (en) 2012-09-12

Similar Documents

Publication Publication Date Title
CN102663296B (en) Intelligent detection method for Java script malicious code facing to the webpage
Fass et al. Jast: Fully syntactic detection of malicious (obfuscated) javascript
Wang et al. Detection of malicious web pages based on hybrid analysis
Likarish et al. Obfuscated malicious javascript detection using classification techniques
Hou et al. Malicious web content detection by machine learning
Wang et al. Jsdc: A hybrid approach for javascript malware detection and classification
Nunan et al. Automatic classification of cross-site scripting in web pages using document-based and URL-based features
CN107241296B (en) Webshell detection method and device
US20140173736A1 (en) Method and system for detecting webpage Trojan embedded
US20240054218A1 (en) Real-time javascript classifier
Choi et al. Automatic detection for javascript obfuscation attacks in web pages through string pattern analysis
Kaur et al. Detecting blind cross-site scripting attacks using machine learning
Phung et al. Detection of malicious javascript on an imbalanced dataset
Alnabulsi et al. GMSA: Gathering multiple signatures approach to defend against code injection attacks
KR20210084204A (en) Malware Crawling Method and System
Eskandari et al. To incorporate sequential dynamic features in malware detection engines
Gorji et al. Detecting obfuscated JavaScript malware using sequences of internal function calls
Miura et al. Macros finder: Do you remember loveletter?
CN103475671A (en) Method for detecting rogue programs
McGahagan et al. A comprehensive evaluation of webpage content features for detecting malicious websites
Liang et al. Malicious web pages detection based on abnormal visibility recognition
Raja et al. Malicious webpage classification based on web content features using machine learning and deep learning
CN106021252B (en) Determining internet-based object information using public internet search
Cosovan et al. A practical guide for detecting the java script-based malware using hidden markov models and linear classifiers
US10250621B1 (en) Automatic extraction of indicators of compromise from multiple data sources accessible over a network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20151027

Address after: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Patentee after: Dbappsecurity Co.,ltd.

Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Patentee before: Dbappsecurity Co.,ltd.

Patentee before: Chen Tieming

CP03 Change of name, title or address

Address after: Zhejiang Zhongcai Building No. 68 Hangzhou 310051 Zhejiang province Binjiang District Tong Road 15

Patentee after: Hangzhou Annan information technology Limited by Share Ltd

Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Patentee before: Dbappsecurity Co.,ltd.

CP03 Change of name, title or address
CP02 Change in the address of a patent holder

Address after: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Patentee after: Hangzhou Annan information technology Limited by Share Ltd

Address before: Zhejiang Zhongcai Building No. 68 Hangzhou 310051 Zhejiang province Binjiang District Tong Road 15

Patentee before: Hangzhou Annan information technology Limited by Share Ltd

CP02 Change in the address of a patent holder