CN102663296A - Intelligent detection method for Java script malicious code facing to the webpage - Google Patents

Intelligent detection method for Java script malicious code facing to the webpage Download PDF

Info

Publication number
CN102663296A
CN102663296A CN2012100927070A CN201210092707A CN102663296A CN 102663296 A CN102663296 A CN 102663296A CN 2012100927070 A CN2012100927070 A CN 2012100927070A CN 201210092707 A CN201210092707 A CN 201210092707A CN 102663296 A CN102663296 A CN 102663296A
Authority
CN
China
Prior art keywords
script
javascript
sample
gram
storehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100927070A
Other languages
Chinese (zh)
Other versions
CN102663296B (en
Inventor
范渊
陈铁明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN201210092707.0A priority Critical patent/CN102663296B/en
Publication of CN102663296A publication Critical patent/CN102663296A/en
Application granted granted Critical
Publication of CN102663296B publication Critical patent/CN102663296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the internet safety technology, and is designed to provide an intelligent detection method for Java script malicious code facing to the webpage. The method comprises three processes of optimization of sample, safety detecting and renewal of optimization. A classical N-gram statistical model and a KNN grader are efficiently fused by means of the efficient mixing of static detecting and dynamic detecting. The dynamic behavioural analysis of the code is realized by setting up N-gram characteristic of a Java Script machine code operation order. The efficiency of the static classification detection is largely improved by the optimization of the sample relayed on the KNN grader. The suggested intelligent detecting method is specially provided with operability, the step of optimization of sample can ensure that the classification effectiveness is not decreased when a training script is enlarged, and the step of the safety detecting can ensure to execute the high-efficient intelligent detecting based in the optimization of sample, and the step of the renewal of optimization can ensure thag the precision of the intelligent detecting is not be declined when new malicious script is increased. The method has the advantages of providing the ability of detecting the new malicious script and the dynamic optimization adjust ability for continuing work.

Description

The intelligent detecting method of object web page JavaScript malicious code
Technical field
The present invention relates to a kind of JavaScript malicious code intelligent detecting method.
Background technology
Malicious code is one of important form that threatens computer security, is one section computer code or program (one section instruction) in essence, and this section code can be carried out a series of operations that comprise the malice attempt according to assailant's wish; The form of code possibly be the instruction of executable code property, script, word processing macrolanguage or other types.Typical malicious code comprises virus, worm and Trojan Horse.
The present invention research to as if can be embedded in the JavaScript script in the webpage, be a kind of based on object and event driven client script language.JavaScript makes and has realized relation a kind of real-time, dynamic, interactivity between webpage and the user; Make webpage can comprise more active elements and excellent more content; But also make hackers write and move malicious code more easily; For example can from network, load other malicious script automatically, ability operation pages Document object, the html interface that the operation user sees; Can obtain or ask to import data such as valuable account number cipher, and send request of data under the sun server to the user.Simultaneously, the hacker can also use JavaScript to attack the leak of browser, and this attack possibly cause browser collapse, internal memory to be revealed or the like.In the face of these safety problems, need badly the safety problem of JavaScript is furtherd investigate, improve detectability to the malicious script of JavaScript, ensure the safety of internet, applications.
The malicious code detection technique has become an important directions of information security field, and has obtained very many achievements in research.The detection technique of malicious code is according to different two kinds of Static Detection and the detection of dynamic of mainly dividing that adopt analytic target, and Static Detection is that the text feature of code is analyzed, and detection of dynamic then is the analysis to the code act of execution.
The typical method of Static Detection is based on the detection technique of signature, mainly based on the thought of pattern match, creates the malicious code storehouse for every kind of known malicious code produces a unique signature character mark.These signature characters are to analyze Virus Sample by industry specialists, carry out manual extraction, the peculiar property of a particular malicious code of a signature sign.Performing step based on endorsement method is following:
(1) gathers the known malicious code sample;
(2) in malicious code sample, extract the malicious code signature characteristic;
(3) include signature in the malicious code data storehouse;
(4) detect file.If contain the signature in the malicious code storehouse in the file to be checked, judge that promptly this file is malicious code or is infected by malicious code.
Be present most convenient, use the widest detection method that based on endorsement method a lot of commercial virus killing products all are to adopt this technology.Its advantage is that detection speed is fast, and existing malicious code in the virus base can accurately detect, and rate of false alarm is lower.Shortcoming is powerless to emerging virus, must bring in constant renewal in version, in virus base, adds the characteristic of new virus.
The typical method of detection of dynamic is based on the detection technique of behavior, generally needs dynamic run time version or virtual run time version, utilizes the peculiar behavioural characteristic of virus to monitor virus.Through to viral years of researches, find that it is joint acts of malicious code that some behaviors are arranged, and also very special, seldom comprise these behaviors in the normal code.Some typical malicious act characteristics are following:
(1) seizes INT 13H number interruption.Boot-type virus can be attacked Boot sector or MBS, and places the required code of virus therein, and during system start-up, Boot sector or MBS can be carried out INT 13H function, and viral code will be loaded.
(2) revise the Installed System Memory total amount.Virus will reduce the Installed System Memory total amount in order to accomplish specific functions such as infection and destruction, make system and other application program can not occupy its space, self reside in the internal memory and make.
(3) specific file is carried out write operation.Owing to virus is to depend on to give birth to, when virus is carried out, will self code be attached to by among the infected file so, making is had unusual write operation by infected file.
(4) monitoring system calling sequence.System call is unique interface of user application and operating system, and the malice that some system call sequence can embody to a certain degree is semantic.
Therefore, can detect some emerging unknown virus based on the detection method of behavior, the difficult point of its research is to extract the malicious act characteristic, and system overhead is bigger.
In sum, Static Detection efficient is high, but can't detect new malicious code; Dynamic detection technology can detect new malicious code, but efficient is not high, and it is big that behavioural characteristic is extracted difficulty, poor operability.In view of this, the researchist pays close attention to and detects emerging malicious code how efficiently automatically, and the method for automatic categorizer just becomes a kind of hot spot technology in the anti-virus field.In fact, along with the application of data mining technology, data mining technology is applied to the malicious code detection has obtained good experiment effect.At present, detect based on the malicious code of data mining and machine learning and to begin to be paid close attention to more and more, become a new research focus.
Yet; Though machine learning method is applied to the malicious code detection range and has obtained more achievement in research; But at present the main object of research is mainly concentrated the executable file to the Windows system, for current be that the fastest webpage malicious code of velocity of propagation of representative detects and also lacks further investigation with the JavaScript script.In fact; The code obfuscation has been applied to during the JavaScript script writes more and more; For example code compaction, substitute, reorganization, redundantly disturb and encrypt or the like, often can successfully escape detection through the script that special obfuscation generates based on the Static Detection instrument of condition code.Therefore, research Static Detection technology will be a kind of trend with the new malicious script of the JavaScript efficiently detection method that dynamic detection technology merges mutually.
Summary of the invention
The technical matters that the present invention will solve is; To the ubiquitous code aliasing of JavaScript malicious script; In order to overcome the shortcoming that the Static Detection technology can't detect new malicious code; And solve the lower and feature extraction of efficient that the dynamic behaviour detection technique exists than problems such as difficulties, provide a kind of based on renewable preferred sample, need not static code condition code and dynamic behaviour condition code, can detect the novel intelligent detection method of new malicious code, general stalwartness.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of intelligent detecting method of object web page JavaScript malicious code is provided, and this method comprises preferred sample, safety detection, preferred three processes of renewal, is specially:
Preferred sample: utilize N-gram language statistics method and machine learning algorithm KNN,, generate the JavaScript that is used for safety detection and detect the sample storehouse through the JavaScript script in the JavaScript script training storehouse is carried out machine learning;
Safety detection: extract the JavaScript script to webpage URL to be detected, the JavaScript that sets up based on preferred sample processes detects the sample storehouse, and whether the webpage that detects appointment through the KNN sorting algorithm comprises the JavaScript malicious code;
Upgrade preferred: the accuracy of statistics safety detection, if accuracy of detection remains in the scope of setting, then sustainable launching through preferred JavaScript detection sample storehouse carried out safety detection; Otherwise exceed preset range if accuracy of detection descends, then all have been accomplished the JavaScript script that detects and cause accuracy of detection to descend and inserted in the JavaScript script training storehouse, carry out the detection sample storehouse that preferred sample obtains upgrading again; In this process, keep the preferred constant efficient of sample storehouse quantity that detects to ensure safety and to detect.
Among the present invention, said JavaScript detects the sample storehouse and comprises malicious code N-gram sample and optimum code N-gram sample.
Among the present invention, in the process of said preferred sample, through following parameter: P being confirmed in the analysis of training script, the accuracy of JavaScript safety detection; N, N-gram size parameter, N f, N-gram frequency statistics threshold value is promptly represented the highest preceding N of the frequency of occurrences in JavaScript script training storehouse fIndividual N-gram; N °, the malice sample in the preferred detection of the representative sample storehouse and the total quantity of optimum sample.Specifically comprise the steps:
(1) gathers current representative JavaScript malicious script and optimum script, form the script training storehouse that reaches ten thousand number of stages;
(2) the JavaScript analytics engine V8 that increases income of employing Google, compiling JavaScript script obtains the V8 machine code, and further extracts the sequence of operation of machine code;
(3) be base unit with the handling function, calculate the sequence of operation N-gram of each malice training script and optimum training script, and preserve the highest preceding N of frequency fIndividual N-gram; The malicious script in the note JavaScript script training storehouse and the quantity of optimum script are respectively n mAnd n b, the training script total amount is n=n m+ n bN with each training script that calculates fIndividual N-gram set is designated as (i=1,2 ..., n m) and
Figure BDA0000149286380000042
(i=1,2 ..., n b), the frequency values that each N-gram occurs is designated as respectively
Figure BDA0000149286380000043
Figure BDA0000149286380000044
(i=1,2 ..., n m) and
Figure BDA0000149286380000045
Figure BDA0000149286380000046
(i=1,2 ..., n b), here to not gathering
Figure BDA0000149286380000047
Or
Figure BDA0000149286380000048
In N-gram s ', promptly
Figure BDA0000149286380000049
Order
Figure BDA00001492863800000410
I=1,2 ..., n m
(4) select KNN sorter (getting K=i), sorting algorithm is described below: the preceding N that calculates the JavaScript script machine code sequence of operation of waiting to classify fIndividual N-gram is designated as S set f, the frequency values that each N-gram occurs is designated as f (s), s ∈ S fObtain satisfied
Figure BDA00001492863800000411
I=1,2 ..., n mI=j m, satisfy
Figure BDA00001492863800000412
I=1,2 ..., n bI=j bIf d m<d bJudge that then this script is a malicious code, j mIndividual malice training script promptly is selected as once as the malice sample; Otherwise be optimum code, j bIndividual optimum training script promptly is selected as once as optimum sample;
(5) be the training script storehouse of n to total amount, take the cross validation test of KNN classification, specifically can training script be divided into respectively With
Figure BDA00001492863800000414
Part (n of selection mAnd n bAll be N ° multiple), each selects portion as the KNN training data at random, and remainder is all as test data; When logging test results was correct, each training script was elected to be the cumulative number of sample by the KNN sorter; At last according to the height of cumulative number, N ° detection sample storehouse of forming by the N-gram of malicious code and optimum code respectively before selecting respectively, note detects malicious code N-gram set and optimum code N-gram in the sample storehouse and gathers and do respectively
Figure BDA00001492863800000415
(i=1,2 ..., N °) and
Figure BDA00001492863800000416
(i=1,2 ..., N °), the frequency values of each N-gram is designated as respectively in each set
Figure BDA00001492863800000418
(i=1,2 ..., N °) and
Figure BDA00001492863800000419
Figure BDA00001492863800000420
(i=1,2 ..., n b).
Among the present invention, in said safety detection process, comprise the steps:
(1), extracts embedded JavaScript code as script to be detected according to the webpage URL of appointment;
(2) carry out the JavaScript analytics engine V8 acquisition JavaScript machine code of increasing income of Google, and further extract the sequence of operation;
(3) the preceding N of calculating script operation sequence to be detected fIndividual N-gram, note N-gram set is S f, the frequency of occurrences value of each N-gram is designated as f (s), s ∈ S f
(4) utilize the KNN sorting algorithm of K=1 to detect S fWhether be the N-gram of malicious code, basic process is following: calculate
d m = min ( Σ s ∈ S f U S i fm ( f ( s ) - f i fm ( s ) f ( s ) + f i fm ( s ) ) 2 ) , i=1,2,...,N°,
d b = min ( Σ s ∈ S f US i fb ( f ( s ) - f i fb ( s ) f ( s ) + f i fb ( s ) ) 2 ) , i=1,2,...,N°,
The min () function representation is here got minimum value.If d m<d bJudge that then this script is a malicious code, otherwise be optimum code.
Among the present invention, in said renewal preferred process, comprise the steps:
JavaScript script when (1) the each safety detection of record is failed to report or reported by mistake, and its N-gram is directly added to preferably detect in the sample storehouse earlier, JavaScript detects the sample storehouse and increases;
(2) the error rate P after the each safety detection of accumulative total f, work as P fDuring>2 (1-P) (the P here detects accuracy for the malicious code of setting), with n fIndividual faults script all joins in existing n the script training storehouse, and script training storehouse size promptly becomes n=n+n f, carrying out preferred sample processes once more, the JavaScript that regains 2N ° size detects sample storehouse (N ° malice detects sample and N ° optimum detection sample), does not descend so that keep accuracy P and sorting algorithm to carry out efficient;
(3) if the P in the step (2) f>2 (1-P) are false, then repeated execution of steps (1).
Beneficial effect of the present invention mainly shows:
(1) high efficiency method that a kind of Static Detection and detection of dynamic are effectively mixed is proposed; The N-gram statistical model and the KNN sorter of classics are effectively merged; Can realize the dynamic behaviour analysis to code through the N-gram characteristic of setting up the JavaScript machine code sequence of operation, the sample that is relied on through preferred KNN sorter can improve the efficient that static classification detects greatly.
(2) preferred sample, safety detection, preferred three the relatively independent parts of renewal are proposed; Make the unique operability of intelligent detecting method proposed by the invention; Be that preferred sample can guarantee that classification effectiveness can not reduce along with the increase in training script storehouse; Safety detection can guarantee that carrying out efficient intelligent based on preferred sample detects, and upgrades and preferably can guarantee that then the precision of Intelligent Measurement can not descend with the increase of new malicious script.
(3) along with the continuous increase of training script and the development of JavaScript technology, related N, N in the method that the present invention proposes f, major parameters such as N °, P can do suitably adjustment through machine learning and experimental analysis, make intelligent detecting method possess the ability of the new malicious script of better detection and the dynamic optimization adjustment capability that continuous firing produces.
(4) can effectively shield JavaScript code obfuscation arbitrarily through research machine code operations sequence signature; And because employing JavaScript analytics engine V8 can obtain the machine code of any JavaScript script, so the method that the present invention proposes also can be supported the safety detection to the JavaScript code snippet that extracts among the webpage URL.
(5) related all algorithms and the implementation step of JavaScript malicious code intelligent detecting method disclosed by the invention, simple and practical, efficient low-consume is easy on all kinds of platforms, realize modular develop and field.
Description of drawings
The Intelligent Measurement flow process of Fig. 1 object web page JavaScript malicious code;
Fig. 2 is based on the preferred sample basic process of JavaScript machine code sequence of operation N-gram and KNN sorter;
Fig. 3 adopts the machine code behind the JavaScript analytics engine V8 compiling JavaScript script and extracts machine code operations sequence synoptic diagram.
Embodiment
At first need to prove, the present invention relates to the application of software engineerings such as search engine, is a kind of application of computer technology at internet arena.In implementation procedure of the present invention, can relate to the application of a plurality of software function modules.The applicant thinks, as read over application documents, accurately understand realization principle of the present invention and goal of the invention after, under the situation that combines existing known technology, those skilled in the art can use the software programming technical ability of its grasp to realize the present invention fully.This category of all genus that all application documents of the present invention are mentioned, the applicant enumerates no longer one by one.
Preferred sample utilizes N-gram language statistics method and machine learning algorithm KNN (both is the recognized data analytical technology), through the analysis of training script being confirmed following parameter: P (accuracy of JavaScript safety detection), N (N-gram size parameter N), N f(N-gram frequency statistics threshold value is promptly represented the highest preceding N of the frequency of occurrences in JavaScript script training storehouse fIndividual N-gram), N ° (representing preferred JavaScript to detect the quantity of malice sample and optimum sample in the sample storehouse); Final generation size is that 2N ° preferred JavaScript detects the sample storehouse, mainly comprises malicious code sample N-gram and optimum code sample N-gram.Basic process is as shown in Figure 2, mainly comprises the steps:
(1) gathers current representative JavaScript malicious script and optimum script, form the JavaScript script training storehouse that reaches ten thousand number of stages;
(2) the JavaScript analytics engine V8 of employing Google, compiling JavaScript script obtains the V8 machine code, and further extracts the sequence of operation (as shown in Figure 3) of machine code;
(3) with the handling function be base unit, calculate the N-gram of each script (malicious script and optimum script) machine code sequence of operation in the script training storehouse, and preserve the highest preceding N of the frequency of occurrences fIndividual N-gram.Note malice and optimum script quantity are respectively n mAnd n b, the script total amount is n=n m+ n bN with each script that calculates fIndividual N-gram set is designated as
Figure BDA0000149286380000071
(i=1,2 ..., n m) and
Figure BDA0000149286380000072
(i=1,2 ..., n b), the frequency values that each N-gram occurs is designated as respectively
Figure BDA0000149286380000073
Figure BDA0000149286380000074
(i=1,2 ..., n m) and
Figure BDA0000149286380000075
Figure BDA0000149286380000076
(i=1,2 ..., n b), here to not gathering
Figure BDA0000149286380000077
Or
Figure BDA0000149286380000078
In N-gram s ', promptly
Figure BDA0000149286380000079
Regulation
Figure BDA00001492863800000710
I=1,2 ..., n m
(4) select KNN sorter (getting K=1), sorting algorithm is described below: the preceding N that calculates the JavaScript script machine code sequence of operation of waiting to classify fIndividual N-gram is designated as S set f, the frequency values that each N-gram occurs is designated as f (s), s ∈ S fObtain satisfied
Figure BDA00001492863800000711
I=1,2 ..., n mI, and be designated as i=j m, obtain satisfied I=1,2 ..., n bJ, be designated as i=j bIf d m<d bJudge that then this script is a malicious code, j in the script training storehouse mIndividual malicious script promptly is selected as once the detection sample as malice; Otherwise be optimum code, j bIndividual optimum script promptly is selected as once as optimum detection sample.
(5) be the training script storehouse of n to total amount, take the cross validation test of KNN classification, specifically can training script be divided into respectively
Figure BDA00001492863800000713
With
Figure BDA00001492863800000714
Part (n of selection mAnd n bAll be N ° multiple), each selects portion as the KNN training data at random, and remainder is all as test data; When logging test results was correct, each training script was elected to be the cumulative number of sample by the KNN sorter; According to the height of cumulative number, select preceding N ° malicious script and optimum script as the malice sample and the optimum sample that detect in the sample respectively, and be stored as the N-gram set respectively at last, be designated as
Figure BDA00001492863800000715
(i=1,2 ..., N °) (malice) and
Figure BDA00001492863800000716
(i=1,2 ..., N °) (optimum), remember that again the frequency values of each N-gram in above-mentioned two set is respectively
Figure BDA00001492863800000717
Figure BDA00001492863800000718
(i=1,2 ..., N °) and
Figure BDA00001492863800000719
Figure BDA00001492863800000720
(i=1,2 ..., n b).
As the experimental demonstration that parameter in the intelligent detecting method is selected; We have obtained 5000 optimum scripts and malicious script formation training storehouse respectively from websites such as http://code.google.com and http://vx.netlux.org/; Confirm at N=3 N through experimental analysis f=500, under N °=100 the situation, classification accuracy rate can reach P>95%, obtains comparatively ideal results, and can ensure higher execution efficient.As a preferred scheme, confirm therefore that also each parameter is: N=3, N f=500, N °=100, P=95%.
Detection sample
Figure BDA00001492863800000721
(i=1 that the preferred sample of safety detection utilization is set up; 2; ...; N °) and
Figure BDA00001492863800000722
(i=1; 2; ..., N °), the webpage of appointment is detected whether comprise the JavaScript malicious code.Key step is following:
(1), extracts embedded JavaScript code as script to be detected according to the webpage URL of appointment;
(2) carry out analytics engine V8 and obtain the JavaScript machine code, and further extract the sequence of operation;
(3) the preceding N of calculating script operation sequence to be detected fIndividual N-gram, note N-gram set is S f, the frequency of occurrences value of each N-gram is designated as f (s), s ∈ S f
(4) utilize the KNN sorting algorithm of K=1 to detect S fWhether is the N-gram of malicious script, basic process is following:
Calculate
d m = min ( Σ s ∈ S f U S i fm ( f ( s ) - f i fm ( s ) f ( s ) + f i fm ( s ) ) 2 ) , i=1,2,...,N°,
d b = min ( Σ s ∈ S f US i fb ( f ( s ) - f i fb ( s ) f ( s ) + f i fb ( s ) ) 2 ) , i=1,2,...,N°,
If d m<d bJudge that then this script is a malicious code, otherwise be optimum code.The preferred result who utilizes safety detection that upgrades; The completion size is detection sample storehouse (N ° malice sample and N ° optimum sample of 2N °; Be designated as set
Figure BDA0000149286380000083
(i=1 respectively; 2 ..., N °) and
Figure BDA0000149286380000084
(i=1; 2; ..., N °)) reselect, the accuracy that detects of ensuring safety with carry out efficient.Key step is following:
JavaScript script when (1) the each safety detection of record makes a mistake (fail to report or report by mistake), and its N-gram is directly added to preferably detect in the sample storehouse earlier;
(2) the error rate P after the each safety detection of accumulative total f, work as P fDuring>2 (1-P), with n fIndividual faults script all joins in existing n the script training storehouse, script storehouse size n=n+n f, carry out preferred sample processes, regain the detection sample storehouse of 2N ° size, do not descend so that keep accuracy P and sorting algorithm to carry out efficient;
(3) if the P in (2) f>2 (1-P) are false, and then repeat (1).
Here get P=95%, then work as P fBe meant during>2 (1-P) that current faults rate has reached 10%; Explain and utilize current preferred detection sample; When adopting the KNN sorter to differentiate new JavaScript script, detect effect and occurred more significantly decaying, the new script that therefore will cause detecting error feeds back to as training script that to carry out sample again in the script training storehouse preferred; Obtain the detection sample storehouse of new KNN classification, the efficient and the precision that ensure safety and detect.

Claims (5)

1. the intelligent detecting method of an object web page JavaScript malicious code is characterized in that, this method comprises preferred sample, safety detection, preferred three processes of renewal, is specially:
Preferred sample: utilize N-gram language statistics method and machine learning algorithm KNN,, generate the JavaScript that is used for safety detection and detect the sample storehouse through the JavaScript script in the JavaScript script training storehouse is carried out machine learning;
Safety detection: extract the JavaScript script to webpage URL to be detected, the JavaScript that sets up based on preferred sample processes detects the sample storehouse, and whether the webpage that detects appointment through the KNN sorting algorithm comprises the JavaScript malicious code;
Upgrade preferred: the accuracy of statistics safety detection if accuracy of detection remains in the scope of setting, then continues to launch through preferred JavaScript detection sample storehouse and carries out safety detection; Exceed preset range if accuracy of detection descends, then all have been accomplished the JavaScript script that detects and cause accuracy of detection to descend and inserted in the JavaScript script training storehouse, the JavaScript detection sample storehouse of carrying out preferred sample again and obtaining upgrading; In this process, keep the preferred constant efficient of sample storehouse quantity that detects to ensure safety and to detect.
2. method according to claim 1 is characterized in that, said JavaScript detects the sample storehouse and comprises malicious code N-gram sample and optimum code N-gram sample.
3. method according to claim 1 is characterized in that, in the process of said preferred sample, and through following parameter: P is confirmed in the analysis of training script, i.e. the accuracy of JavaScript safety detection; N, i.e. N-gram size parameter; N f, i.e. N-gram frequency statistics threshold value, representative the highest preceding N of the frequency of occurrences in JavaScript script training storehouse fIndividual N-gram; N °, promptly represent preferred JavaScript to detect malice sample and the quantity of optimum sample in the sample storehouse;
Specifically comprising the steps: of preferred sample
(1) gathers current representative JavaScript malicious script and optimum script, form the JavaScript script training storehouse that reaches ten thousand number of stages;
(2) the JavaScript analytics engine V8 that increases income of employing Google, compiling JavaScript script obtains the V8 machine code, and further extracts the sequence of operation of machine code;
(3) with the handling function be base unit, the N-gram of the machine code sequence of operation of each malicious script and optimum script in the calculating JavaScript script training storehouse, and preserve the highest preceding N of the frequency of occurrences fIndividual N-gram;
Note malice and optimum script quantity are respectively n mAnd n b, the script total amount is n=n m+ n bN with each script that calculates fIndividual N-gram set is designated as
Figure FDA0000149286370000011
(i=1,2 ..., n m) and
Figure FDA0000149286370000012
(i=1,2 ..., n b), the frequency values that each N-gram occurs is designated as respectively
Figure FDA0000149286370000013
(i=1,2 ..., n m) and
Figure FDA0000149286370000015
Figure FDA0000149286370000016
(i=1,2 ..., n b), here to not gathering Or
Figure FDA0000149286370000018
In N-gram s ', promptly Regulation
Figure FDA00001492863700000110
I=1,2 ..., n m
(4) select KNN sorter (getting K=1), sorting algorithm is described below: the preceding N that calculates the JavaScript script machine code sequence of operation of waiting to classify fIndividual N-gram is designated as S set f, the frequency values that each N-gram occurs is designated as f (s), s ∈ S fObtain satisfied
Figure FDA0000149286370000021
I=1,2 ..., n mI, and be designated as i=j m, obtain satisfied
Figure FDA0000149286370000022
I=1,2 ..., n bJ, be designated as i=j bIf d m<d bJudge that then this script is a malicious code, j in the JavaScript script training storehouse mIndividual malicious script promptly is selected as once the detection sample as malice; Otherwise be optimum code, j bIndividual optimum script promptly is selected as once as optimum detection sample;
(5) being the JavaScript training script storehouse of n to total amount, taking the cross validation test of KNN classification, specifically is that training script is divided into respectively
Figure FDA0000149286370000023
With
Figure FDA0000149286370000024
Part (n of selection mAnd n bAll be N ° multiple), each selects portion as the KNN training data at random, and remainder is all as test data; When logging test results was correct, each training script was elected to be the cumulative number of sample by the KNN sorter; According to the height of cumulative number, select preceding N ° malicious script and optimum script as the malice sample and the optimum sample that detect in the sample respectively, and be stored as the N-gram set respectively at last, be designated as
Figure FDA0000149286370000025
(i=1,2 ..., N °) (malice) and (i=1,2 ..., N °) (optimum), remember that again the frequency values of each N-gram in above-mentioned two set is respectively
Figure FDA0000149286370000028
(i=1,2 ..., N °) and
Figure FDA0000149286370000029
(i=1,2 ..., n b).
4. method according to claim 1 is characterized in that, in said safety detection process, comprises the steps:
(1), extracts embedded JavaScript code as script to be detected according to the webpage URL of appointment;
(2) carry out the JavaScript analytics engine V8 acquisition JavaScript machine code of Google, and further extract the sequence of operation;
(3) the preceding N of calculating script operation sequence to be detected fIndividual N-gram, note N-gram set is S f, the frequency of occurrences value of each N-gram is designated as f (s), s ∈ S f
(4) utilize the KNN sorting algorithm of K=1 to detect S fWhether is the N-gram of malicious script, basic process is following:
Calculate:
d m = min ( Σ s ∈ S f U S i fm ( f ( s ) - f i fm ( s ) f ( s ) + f i fm ( s ) ) 2 ) , i=1,2,...,N°,
d b = min ( Σ s ∈ S f US i fb ( f ( s ) - f i fb ( s ) f ( s ) + f i fb ( s ) ) 2 ) , i=1,2,...,N°,
If d m<d bJudge that then this script is a malicious code, otherwise be optimum code.
5. method according to claim 1 is characterized in that, in said renewal preferred process, comprises the steps:
JavaScript script when (1) the each safety detection of record makes a mistake (fail to report or report by mistake), and elder generation has directly added to its N-gram preferably, and JavaScript detects in the sample storehouse;
(2) the error rate P after the each safety detection of accumulative total f, work as P fDuring>2 (1-P), with n fIndividual faults script all joins in existing n the script training storehouse, JavaScript script training storehouse size n=n+n f, carrying out preferred sample processes, the JavaScript that regains N ° size detects the sample storehouse, does not descend so that keep accuracy P and sorting algorithm to carry out efficient;
(3) if the P in the step (2) f>2 (1-P) are false, then repeated execution of steps (1).
CN201210092707.0A 2012-03-31 2012-03-31 Intelligent detection method for Java script malicious code facing to the webpage Active CN102663296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210092707.0A CN102663296B (en) 2012-03-31 2012-03-31 Intelligent detection method for Java script malicious code facing to the webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210092707.0A CN102663296B (en) 2012-03-31 2012-03-31 Intelligent detection method for Java script malicious code facing to the webpage

Publications (2)

Publication Number Publication Date
CN102663296A true CN102663296A (en) 2012-09-12
CN102663296B CN102663296B (en) 2015-01-07

Family

ID=46772783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210092707.0A Active CN102663296B (en) 2012-03-31 2012-03-31 Intelligent detection method for Java script malicious code facing to the webpage

Country Status (1)

Country Link
CN (1) CN102663296B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116494A (en) * 2013-01-25 2013-05-22 中兴通讯股份有限公司 Automatic testing platform testing output information extraction method and device
CN103221960A (en) * 2012-12-10 2013-07-24 华为技术有限公司 Detection method and apparatus of malicious code
CN103559235A (en) * 2013-10-24 2014-02-05 中国科学院信息工程研究所 Online social network malicious webpage detection and identification method
CN104134040A (en) * 2014-07-25 2014-11-05 中国人民解放军信息工程大学 Binary malicious code threatening evaluating method based on information fusion
WO2014183545A1 (en) * 2013-05-15 2014-11-20 Tencent Technology (Shenzhen) Company Limited Method,device and system for identifying script virus
WO2015067114A1 (en) * 2013-11-08 2015-05-14 腾讯科技(深圳)有限公司 Method, apparatus, terminal and media for detecting document object model-based cross-site scripting attack vulnerability
CN106022132A (en) * 2016-05-30 2016-10-12 南京邮电大学 Real-time webpage Trojan detection method based on dynamic content analysis
CN106055980A (en) * 2016-05-30 2016-10-26 南京邮电大学 Rule-based JavaScript security testing method
CN106485148A (en) * 2015-10-29 2017-03-08 远江盛邦(北京)网络安全科技股份有限公司 The implementation method of the malicious code behavior analysiss sandbox being combined based on JS BOM
CN106529293A (en) * 2016-11-09 2017-03-22 东巽科技(北京)有限公司 Sample classification determination method for malware detection
CN107659570A (en) * 2017-09-29 2018-02-02 杭州安恒信息技术有限公司 Webshell detection methods and system based on machine learning and static and dynamic analysis
CN107688744A (en) * 2017-08-31 2018-02-13 杭州安恒信息技术有限公司 Malicious file sorting technique and device based on Image Feature Matching
CN108920956A (en) * 2018-07-03 2018-11-30 亚信科技(成都)有限公司 Machine learning method and system based on context aware
CN109190372A (en) * 2018-07-09 2019-01-11 四川大学 A kind of JavaScript Malicious Code Detection model based on bytecode
CN109254827A (en) * 2018-08-27 2019-01-22 电子科技大学成都学院 A kind of secure virtual machine means of defence and system based on big data and machine learning
CN110427755A (en) * 2018-10-16 2019-11-08 新华三信息安全技术有限公司 A kind of method and device identifying script file
CN111723371A (en) * 2020-06-22 2020-09-29 上海斗象信息科技有限公司 Method for constructing detection model of malicious file and method for detecting malicious file
CN112052451A (en) * 2020-08-17 2020-12-08 北京兰云科技有限公司 Webshell detection method and device
CN112685314A (en) * 2021-01-05 2021-04-20 广州知图科技有限公司 JavaScript engine security test method and test system
CN113703780A (en) * 2020-05-22 2021-11-26 广州虎牙科技有限公司 Decompilation detection method, device, equipment and medium, and webpage resource data sending method, device, equipment and medium
CN114595454A (en) * 2022-03-11 2022-06-07 西安电子科技大学 Malicious JS script detection method based on mixed analysis and feature fusion
CN113703780B (en) * 2020-05-22 2024-04-19 广州虎牙科技有限公司 Decompilation detection and webpage resource data sending method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034042A (en) * 2010-12-13 2011-04-27 四川大学 Novel unwanted code detecting method based on characteristics of function call relationship graph
CN102346829A (en) * 2011-09-22 2012-02-08 重庆大学 Virus detection method based on ensemble classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034042A (en) * 2010-12-13 2011-04-27 四川大学 Novel unwanted code detecting method based on characteristics of function call relationship graph
CN102346829A (en) * 2011-09-22 2012-02-08 重庆大学 Virus detection method based on ensemble classification

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
D.KRISHNA SANDEEP REDDY等: "《2nd International Conference on Information Systems Security》", 22 December 2006 *
JEREMY Z. KOLTER等: "《Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining》", 26 August 2004 *
TONY ABOU-ASSALEH等: "《2nd Annual Conference on Privacy,Security and Trust》", 16 October 2004 *
张勇等: "基于主动学习的计算机病毒检测方法研究", 《计算机与数字工程》 *
张小康等: "基于加权信息增益的恶意代码检测方法", 《计算机工程》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103221960A (en) * 2012-12-10 2013-07-24 华为技术有限公司 Detection method and apparatus of malicious code
WO2014089744A1 (en) * 2012-12-10 2014-06-19 华为技术有限公司 Method and apparatus for detecting malicious code
CN103116494A (en) * 2013-01-25 2013-05-22 中兴通讯股份有限公司 Automatic testing platform testing output information extraction method and device
CN103116494B (en) * 2013-01-25 2016-05-25 中兴通讯股份有限公司 Automatic test platform test output information extracting method and device
WO2014183545A1 (en) * 2013-05-15 2014-11-20 Tencent Technology (Shenzhen) Company Limited Method,device and system for identifying script virus
CN103559235A (en) * 2013-10-24 2014-02-05 中国科学院信息工程研究所 Online social network malicious webpage detection and identification method
CN103559235B (en) * 2013-10-24 2016-08-17 中国科学院信息工程研究所 A kind of online social networks malicious web pages detection recognition methods
US9754113B2 (en) 2013-11-08 2017-09-05 Tencent Technology (Shenzhen) Company Limited Method, apparatus, terminal and media for detecting document object model-based cross-site scripting attack vulnerability
WO2015067114A1 (en) * 2013-11-08 2015-05-14 腾讯科技(深圳)有限公司 Method, apparatus, terminal and media for detecting document object model-based cross-site scripting attack vulnerability
CN104134040A (en) * 2014-07-25 2014-11-05 中国人民解放军信息工程大学 Binary malicious code threatening evaluating method based on information fusion
CN104134040B (en) * 2014-07-25 2017-03-29 中国人民解放军信息工程大学 A kind of binary malicious codes menace appraisal procedure based on information fusion
CN106485148A (en) * 2015-10-29 2017-03-08 远江盛邦(北京)网络安全科技股份有限公司 The implementation method of the malicious code behavior analysiss sandbox being combined based on JS BOM
CN106055980A (en) * 2016-05-30 2016-10-26 南京邮电大学 Rule-based JavaScript security testing method
CN106022132A (en) * 2016-05-30 2016-10-12 南京邮电大学 Real-time webpage Trojan detection method based on dynamic content analysis
CN106055980B (en) * 2016-05-30 2018-12-11 南京邮电大学 A kind of rule-based JavaScript safety detecting method
CN106529293A (en) * 2016-11-09 2017-03-22 东巽科技(北京)有限公司 Sample classification determination method for malware detection
CN107688744A (en) * 2017-08-31 2018-02-13 杭州安恒信息技术有限公司 Malicious file sorting technique and device based on Image Feature Matching
CN107688744B (en) * 2017-08-31 2020-03-13 杭州安恒信息技术股份有限公司 Malicious file classification method and device based on image feature matching
CN107659570A (en) * 2017-09-29 2018-02-02 杭州安恒信息技术有限公司 Webshell detection methods and system based on machine learning and static and dynamic analysis
CN108920956B (en) * 2018-07-03 2021-05-14 亚信科技(成都)有限公司 Machine learning method and system based on context awareness
CN108920956A (en) * 2018-07-03 2018-11-30 亚信科技(成都)有限公司 Machine learning method and system based on context aware
CN109190372A (en) * 2018-07-09 2019-01-11 四川大学 A kind of JavaScript Malicious Code Detection model based on bytecode
CN109190372B (en) * 2018-07-09 2021-11-12 四川大学 JavaScript malicious code detection method based on bytecode
CN109254827A (en) * 2018-08-27 2019-01-22 电子科技大学成都学院 A kind of secure virtual machine means of defence and system based on big data and machine learning
CN109254827B (en) * 2018-08-27 2022-04-22 电子科技大学成都学院 Virtual machine safety protection method and system based on big data and machine learning
CN110427755A (en) * 2018-10-16 2019-11-08 新华三信息安全技术有限公司 A kind of method and device identifying script file
CN113703780A (en) * 2020-05-22 2021-11-26 广州虎牙科技有限公司 Decompilation detection method, device, equipment and medium, and webpage resource data sending method, device, equipment and medium
CN113703780B (en) * 2020-05-22 2024-04-19 广州虎牙科技有限公司 Decompilation detection and webpage resource data sending method, device, equipment and medium
CN111723371A (en) * 2020-06-22 2020-09-29 上海斗象信息科技有限公司 Method for constructing detection model of malicious file and method for detecting malicious file
CN111723371B (en) * 2020-06-22 2024-02-20 上海斗象信息科技有限公司 Method for constructing malicious file detection model and detecting malicious file
CN112052451A (en) * 2020-08-17 2020-12-08 北京兰云科技有限公司 Webshell detection method and device
CN112685314A (en) * 2021-01-05 2021-04-20 广州知图科技有限公司 JavaScript engine security test method and test system
CN114595454A (en) * 2022-03-11 2022-06-07 西安电子科技大学 Malicious JS script detection method based on mixed analysis and feature fusion
CN114595454B (en) * 2022-03-11 2024-04-02 西安电子科技大学 Malicious JS script detection method based on mixed analysis and feature fusion

Also Published As

Publication number Publication date
CN102663296B (en) 2015-01-07

Similar Documents

Publication Publication Date Title
CN102663296A (en) Intelligent detection method for Java script malicious code facing to the webpage
Laskov et al. Static detection of malicious JavaScript-bearing PDF documents
CN101924761A (en) Method for detecting malicious program according to white list
Wang et al. Detection of malicious web pages based on hybrid analysis
CN101924762B (en) Cloud security-based active defense method
KR101162051B1 (en) Using string comparison malicious code detection and classification system and method
Liu et al. A novel approach for detecting browser-based silent miner
Wang et al. Jsdc: A hybrid approach for javascript malware detection and classification
CN102663319B (en) Prompting method and device for download link security
CN103279710B (en) Method and system for detecting malicious codes of Internet information system
CN101751530B (en) Method for detecting loophole aggressive behavior and device
WO2013026320A1 (en) Method and system for detecting webpage trojan embedded
CN104881607A (en) XSS vulnerability detection method based on simulating browser behavior
CN102609649A (en) Method and device for collecting malicious software automatically
CN105303109A (en) Malicious code information analysis method and system
CN110765459A (en) Malicious script detection method and device and storage medium
CN102591965A (en) Method and device for detecting black chain
US11263062B2 (en) API mashup exploration and recommendation
Phung et al. Detection of malicious javascript on an imbalanced dataset
CN103475671A (en) Method for detecting rogue programs
KR20210084204A (en) Malware Crawling Method and System
JP6505533B2 (en) Malicious code detection
Liang et al. Malicious web pages detection based on abnormal visibility recognition
Gorji et al. Detecting obfuscated JavaScript malware using sequences of internal function calls
Lee et al. A study of malware detection and classification by comparing extracted strings

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20151027

Address after: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Patentee after: Dbappsecurity Co.,ltd.

Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Patentee before: Dbappsecurity Co.,ltd.

Patentee before: Chen Tieming

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Zhejiang Zhongcai Building No. 68 Hangzhou 310051 Zhejiang province Binjiang District Tong Road 15

Patentee after: Hangzhou Annan information technology Limited by Share Ltd

Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Patentee before: Dbappsecurity Co.,ltd.

CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Patentee after: Hangzhou Annan information technology Limited by Share Ltd

Address before: Zhejiang Zhongcai Building No. 68 Hangzhou 310051 Zhejiang province Binjiang District Tong Road 15

Patentee before: Hangzhou Annan information technology Limited by Share Ltd