TWI358639B - Malware detection system, data mining module, malw - Google Patents
Malware detection system, data mining module, malw Download PDFInfo
- Publication number
- TWI358639B TWI358639B TW96138249A TW96138249A TWI358639B TW I358639 B TWI358639 B TW I358639B TW 96138249 A TW96138249 A TW 96138249A TW 96138249 A TW96138249 A TW 96138249A TW I358639 B TWI358639 B TW I358639B
- Authority
- TW
- Taiwan
- Prior art keywords
- feature
- program
- malicious
- features
- programs
- Prior art date
Links
Description
13586391358639
三達編號:TW3715PA * 九、發明說明: 【發明所屬之技術領域】 本發明是有關於一種惡意程式偵測系統、資料採礦模 組與惡意程式偵測模組,且特別是有關於一種使用基於資 料採礦技術之惡意程式偵測系統、資料採礦模組與惡意程 式偵測模組。 【先前技術】 • 近年來,惡意程式的演變十分快速。傳統防毒系統係 由已知惡意程式中取出對應之樣版(Pat tern ),存於其資 料庫中。每個程式,包括惡意程式與非惡意程式,均對應 一獨一無二的樣版。當欲偵測一待測程式時,傳統防毒系 統係比對此待測程式所對應的樣版與存於資料庫中的樣 版。當傳統防毒系統比對到此待測程式所對應的樣版與資 料庫中的某一樣版完全相同,傳統防毒系統即偵測出此待 測程式為已知的惡意程式。 ® 然而,惡意程式往往以極快的速度演變成不同的變種 惡意程式。變種惡意程式與其惡意程式的行為係十分類 似,但兩者所對應的樣版仍有差異。舉例來說,當一已知 惡意程式A演變成新的變種惡意程式A’時,即使變種惡 意程式A’與已知惡意程式A的行為類似,且傳統防毒系 統已有已知惡意程式A的樣版’傳統防毒糸統仍無法成功 偵測到變種惡意程式A’ 。如此,傳統防毒系統僅能出偵 測已知惡意程式,無法偵測由已知惡意程式所演變而來, 6 1358639达达编号号: TW3715PA * IX, invention description: [Technical field of the invention] The present invention relates to a malware detection system, a data mining module and a malicious program detection module, and in particular to a use based on Data mining technology malware detection system, data mining module and malware detection module. [Prior Art] • In recent years, the evolution of malicious programs has been very rapid. The traditional anti-virus system removes the corresponding pattern (Pat tern) from the known malware and stores it in its database. Each program, including both malicious and non-malware, corresponds to a unique template. When a program to be tested is to be detected, the conventional anti-virus system is compared to the sample stored in the database for the sample corresponding to the program to be tested. When the traditional anti-virus system is identical to the version of the sample corresponding to the program to be tested, the traditional anti-virus system detects that the program is a known malware. ® However, malware often evolves into very different variants of malware at an extremely fast rate. The variant malware is very similar to the behavior of its malware, but the corresponding versions of the two are still different. For example, when a known malware A evolves into a new variant malware A', even if the variant malware A' is similar to the known malware A, and the conventional antivirus system has a known malware A The sample 'traditional anti-virus system still can't successfully detect the variant malware A'. In this way, the traditional anti-virus system can only detect known malicious programs, and cannot detect the evolution of known malicious programs. 6 1358639
三達編號:TW3715PA 且行為與已知惡意程式類似的新變種惡意程式。因此,傳 統防毒系統無法應付日益增多的變種惡意程式。當新的變 種惡意程式出現時,在傳統防毒系統取得此變種惡意程式 的樣版之前,此惡意程式早已對用戶端的電腦造成傷害。 【發明内容】 本發明係有關於一種惡意程式偵測系統。本發明之惡 意程式偵測系統,僅使用已知惡意程式與已知非惡意程式 的特徵即可偵測出與已知惡意程式同類型但從未出現過 的變種惡意程式。 根據本發明(之第一方面),提出一種資料採礦模組, 用以依據數個已知惡意程式(Ma 1 ware)與數個已知非惡意 程式,輸出一分類(Classif ication )模型(Model)。一 待測程式係依據分類模型被分類為惡意程式與非惡意程 式其中之一。資料採礦模組包括一程式資料庫、一特徵採 礦單元、一特徵篩選單元與一分類模型訓練單元。程式資 料庫用以儲存已知惡意程式與已知非惡意程式。特徵採礦 單元用以由已知惡意程式與已知非惡意程式中萃取出N個 待篩選特徵(feature)。第i個待篩選特徵係為已知惡意 程式和已知非惡意程式中之至少其一與一檔案系統之互 動行為。已知惡意程式與已知非惡意程式中之至少其一係 具有第i個待篩選特徵。i為一小於或等於N之正整數。 特徵篩選單元由N個待篩選特徵篩選出數個有效特徵。每 個有效特徵係實質上主要為已知惡意程式與已知非惡意 7 1358639 三達編號:TW3715PA 已知惡意程式與已知非惡意其中之— ,型分=型訓練單元用以依據有效特二練得η類 吴^。4程式债測模組包括一特徵分析單元、 徵篩選單元與-分類器。特徵分析單元用以由待I墓 寺徵。初步特徵係為待測程式與槽案二之 二特徵筛選單元用以依據有效特徵,由初步 ::!特徵將待測程式分類為惡意程式4:5其依 用以二發明(之第四方面)’提出-種資料採礦方法, 個已:惡意程式與數個已知非惡意程式,二 式:二式係依據分類模型被_ 式:、非心4式其中之一。資料採礦方法包括 广第意程广與已知非惡意程式中萃取出Ν個待』特 = =惡意程式和已知非惡意 或等於Ν之正整數。接:由、Ν之互動订為。1為一小於 右—數接考,由Ν個待篩選特徵篩選出數個 ===== 據有效特徵訓練得到分類模型。 又 根據本發明(之第 方法,用以_—待測程式 測方法包括:首先,由佐、目丨和八心思耘式偵 步特徵係為待測程式與 9 1358639A new variant of the third-generation TW3715PA that behaves like a known malware. As a result, traditional antivirus systems are unable to cope with the growing number of variants. When a new variant of the malware appears, the malicious program has already caused damage to the client's computer before the traditional antivirus system obtained the variant of the malware. SUMMARY OF THE INVENTION The present invention is directed to a malware detection system. The malware detection system of the present invention can detect variants of the same type but never seen by known malicious programs using only the characteristics of known malicious programs and known non-malicious programs. According to a first aspect of the present invention, a data mining module is provided for outputting a classification model based on a number of known malicious programs (Ma 1 ware) and a plurality of known non-malicious programs (Model) ). A program to be tested is classified into one of a malicious program and a non-malicious program according to a classification model. The data mining module includes a program database, a feature mining unit, a feature screening unit and a classification model training unit. The program library is used to store known malware and known non-malicious programs. The feature mining unit is used to extract N features to be screened from known malware and known non-malicious programs. The i-th to-be-screened feature is an interaction behavior between at least one of a known malicious program and a known non-malicious program and a file system. It is known that at least one of the malicious programs and the known non-malicious programs has the i-th to-be-screened feature. i is a positive integer less than or equal to N. The feature screening unit filters out several valid features from the N features to be filtered. Each valid feature is essentially a known malware and a known non-malicious 7 1358639 three-numbered: TW3715PA known malware and known non-malicious - the type = type training unit is used to validate the second Practice η class Wu ^. The program debt testing module includes a feature analyzing unit, a screening unit and a classifier. The feature analysis unit is used to levy the tomb. The preliminary feature is the second program of the program to be tested and the second feature of the slot screening unit. According to the effective feature, the program to be tested is classified into a malicious program by the preliminary::! feature. Aspects] 'Proposed-type data mining methods, one has: malicious programs and several known non-malicious programs, two types: two types are based on the classification model is one of _:: non-heart 4. Data mining methods include the wide-ranging and wide-ranging and non-malicious programs that extract a single positive == malware and a known non-malicious or equal positive integer. Pick up: by the interaction of Ν 订. 1 is a less than right-number reference, and several features are selected by a feature to be screened ===== According to the effective feature training, the classification model is obtained. According to the present invention (the first method, the method for measuring _ to be tested includes: first, the program of the test, the target, and the eight-hearted detection type is the program to be tested and 9 1358639
» 三達編號:TW3715PA f少其中-個程式萃取而得。每個待韩選特徵ρι至fn係 為所有已知惡意程式pm和所有已知非惡意程式⑼中之至 少其一與一檔案系統之互動行為。 舉例來說,所有已知惡意程式的其中數個已知惡意程 式與所有已知非惡意程式的其中數個已知非惡意程式具 有待筛選特徵FW匕表示上述數個已知惡意程式Pm和上 述數個已知非惡意程式P b均具有相同的與檔案系統的互 動行為。 由於惡意程式與非惡意程式使用動態連結檔 (Dynamic Link Library,DLL)的方式不同,因此,在 本發明實施例中,特徵採礦單元丨丨2係萃取每個已知惡意 私式Pm與每個已知非惡意程式此所使用的動態連結檔的 路徑與每個程式所使用的應'用#式介号(AppHcati〇n Program Interface,API) ’ 作為待轉選特徵 π 至 FN。 在本發明實施例中,特徵採礦單元112由一程式,即 φ 已知惡思釭式或已知非惡意程式,所萃取出來的待篩選特 徵共分四種。第一種為此程式直接使用的第一層動態連結 檔。第二種為此程式所使用的第一層動態連結檔至最後一 層動fe連結檔的路控。第三種為上述第一層動態連結檔 中’被此秋式所使用的應用程式介面。第四種為上述第一 層動恕連結檔中,被其他動態連結檔所使用的應用程式介 面。 以萃取某一程式Fi 1 emon. exe與視窗作業系統 (W i n d 〇 w s )的檔案系統的互動行為作為待篩選特徵為 11 1358639» Sanda number: TW3715PA f Less than one of the programs extracted. Each feature to be selected ρι to fn is an interaction behavior of at least one of all known malicious programs pm and all known non-malicious programs (9) with a file system. For example, several known malwares of all known malicious programs and several known non-malicious programs of all known non-malicious programs have a feature to be filtered FW, indicating the above-mentioned several known malicious programs Pm and The above several known non-malicious programs P b have the same interaction behavior with the file system. Since the malicious program and the non-malicious program use a dynamic link library (DLL) in a different manner, in the embodiment of the present invention, the feature mining unit 丨丨 2 extracts each known malicious private Pm and each It is known that the path of the dynamic link file used by the non-malicious program and the AppHcati〇n Program Interface (API) used by each program are to be selected as feature to be selected π to FN. In the embodiment of the present invention, the feature mining unit 112 is divided into four types by a program, that is, φ known as a bad thinking or a known non-malicious program. The first type of dynamic link file that is used directly by this program. The second type of dynamic link file used for this program is the route to the last layer of the link. The third type is the application interface used by the autumn type in the first layer of the dynamic link file. The fourth type is the application interface used by other dynamic link files in the first layer of the first move. To extract the interaction behavior of a program Fi 1 emon. exe and the file system of the Windows operating system (W i n d 〇 w s ) as the feature to be screened is 11 1358639
三達編號:TW3715PA 例’ Filemon.exe所使用的第一層動態連結檔包括 C0MCTL32. DLL、KERNAL32. DLL 與 USER32. DLL 等動態連結 檔。因此,特徵採礦單元112萃取上述第一層動態連結檔 作為Filemon. exe的待篩選特徵。 上述苐一層動恕連結樓可能會使用到第二層動態連 結檔,而上述第二層動態連結檔可能會使用到第三層動態 連結槽’其餘狀況依此類推至最後一層動態連結槽。特徵 採礦單元112即萃取第一層的每個動態連結檔至最後一層 的每個動態連結檔的路徑作為此程式的待篩選特徵。 舉例來說’第一層動態連結檔中的USER32.DLL,係 使用到第二層動態連結檔GDI32.DLL、KERNAL32.DLL與 MS IMG32· DLL等動態連結樓。而上述第二層動態連結樓中 的KERNAL32. DLL係使用到最後一層動態連結檔 NTDLL· DLL。因此,特徵採礦單元112係萃取第一層動態 連結標中的USER32. DLL、第二層動態連結檔中的 KERNAL32. DLL至最後一層動態連結檔中的NTDLL. DLL所形 成的路徑作為F i 1 emon. exe的待篩選特徵。 上述係以萃取第一層動態連結檔中的USER32.DLL所 使用的動態連結檔路徑為例’對於第一層動態連結檔中的 其他動態連結槽,例如C0MCTL32· DLL,亦以相同方式萃取 其所使用的動態連結檔路徑。 特徵採礦單元112亦萃取上述第一層動態連結檔 中,Filemon, exe所使用到的應用程式介面,例如 RtlFreeHeap、RtlAllocateHeap 與 RtlGetLastWin32Error 12 1358639Sanda number: TW3715PA Example ' The first layer of dynamic link files used by Filemon.exe includes dynamic links such as C0MCTL32.DLL, KERNAL32.DLL and USER32.DLL. Therefore, the feature mining unit 112 extracts the above-mentioned first layer dynamic link file as a feature to be screened of Filemon.exe. The second layer of dynamic linking links may use the second layer of dynamic linking files, and the second layer of dynamic linking files may use the third layer of dynamic linking slots, and the rest of the conditions may be pushed to the last layer of dynamic linking slots. Features The mining unit 112 extracts the path of each dynamic link from the first layer to the dynamic link of the last layer as the feature to be screened for this program. For example, USER32.DLL in the first layer of dynamic link file uses dynamic link building such as the second layer dynamic link files GDI32.DLL, KERNAL32.DLL and MS IMG32·DLL. The KERNAL32.DLL in the second layer of the dynamic link building uses the last layer of dynamic link file NTDLL·DLL. Therefore, the feature mining unit 112 extracts the path formed by the USER32.DLL in the first layer dynamic link, the KERNAL32.DLL in the second layer dynamic link file, and the NTDLL.DLL in the last layer dynamic link file as F i 1 The feature of emon. exe to be filtered. The above is an example of extracting the dynamic link path used by USER32.DLL in the first layer dynamic link file. For other dynamic link slots in the first layer dynamic link file, such as C0MCTL32·DLL, the same is also extracted in the same manner. The dynamic link path used. The feature mining unit 112 also extracts the application interface used by Filemon, exe in the first layer dynamic link file, such as RtlFreeHeap, RtlAllocateHeap and RtlGetLastWin32Error 12 1358639.
三達編號:TW3715PA 等等,作為Filemon.exe的待篩選特徵。特徵採礦單元112 並萃取上述第一層動態連結檔中,被其他動態連結檔所使 用的應用程式介面’例如是CsrAllocateCaptureBuffer、 CsrAllocateMessagePointer 與 RtlSizeHeap 等等,作為 Filemon. exe的待篩選特徵。 特徵採礦單元112係由每個已知惡意程式Pm與每個 已知非惡意程式Pb萃取出待篩選特徵F1至fn後,第一Sanda number: TW3715PA, etc., as a feature to be filtered by Filemon.exe. The feature mining unit 112 extracts the application interfaces used by other dynamic links in the first layer dynamic link file, such as CsrAllocateCaptureBuffer, CsrAllocateMessagePointer and RtlSizeHeap, etc., as the to-be-screened feature of Filemon.exe. The feature mining unit 112 first extracts the features to be screened F1 to fn by each known malicious program Pm and each known non-malicious program Pb, first
特徵筛選單元113即由待篩選特徵F1至FN篩選出數個有 效特徵Fe。 ^詳述第一特徵篩選單元113之動作。在本發明實施 例中’由於待篩選特徵的數量很多,且許多待篩選特徵可 /月b同時為已知非惡意程式與已知非惡意程式所具有的特 徵/因此’第—特徵篩選單元113係逐一決定每個待篩選 =徵F1至FN疋否為有效特徵。其中,有效特徵Fe係 實=上主要為已知惡意程式與已知非惡意程式其中之一 所具有的特徵。亦即,有效特徵Fe係僅符合以下 兩個情況其一。筮 ^ ^ 立。 矛一種情況是有效特徵Fe實質上主要為 已二:思、私式所具有的特徵。第二種情況是有效特徵Fe 實質主要為已知非惡意程式所具有的特徵。 例如,在1 Π η η / Λ . υϋ個已知惡意程式Pm與1050個已知非 惡思私式P b,有如n xta … ΟΟΛ d00個已知惡意程式Pm具有待篩選特徵 F1,有320個已知非 lL α , 非惡意程式Pb也具有待篩選特徵F1。The feature screening unit 113 filters out a plurality of effective features Fe from the features to be screened F1 to FN. ^Details of the action of the first feature screening unit 113. In the embodiment of the present invention, 'the number of features to be screened is large, and many features to be screened/month b are both features of known non-malicious programs and known non-malicious programs. Therefore, the first feature filtering unit 113 One by one determines whether each to be filtered = sign F1 to FN is a valid feature. Among them, the effective feature Fe system is mainly characterized by one of the known malicious programs and one of the known non-malicious programs. That is, the effective feature Fe is only one of the following two cases.筮 ^ ^ Standing. One case of spears is that the effective feature Fe is essentially the second one: the characteristics of thinking and private. The second case is that the effective feature Fe is essentially a feature of a known non-malicious program. For example, at 1 Π η η / Λ . 已知 a known malware Pm and 1050 known non-spoofed P b, like n xta ... ΟΟΛ d00 known malware Pm has to be filtered feature F1, there are 320 The known non-lL α , non-malicious program Pb also has the feature F1 to be filtered.
如此,已知惡音鉬斗I Λ ^ , 〜式出現待篩選特徵F1的機率與已知非 惡意程式出現待餘 币選特徵F1的機率相當。待筛選特徵F1 13 13^8639 Ξ達編號:TW3715PA 為已知惡意私式所具有的特徵的確定程度很低,且待筛選 特徵Fi為已知非惡意程式所具有的特徵的確定程度也很 低。亦即’存師選特徵F1並非實質上 式所具有的特徵,亦非實質上主要為已知非惡意程;^ 且 有的特’弟一特徵筛選單元ιΐ3將待筛選特徵^ 剔除’不作為有效特徵pe。 另外’舉例來說,有500個已知惡意程式pm具有待 篩選特徵F2,而僅有2〇個已知非惡意程式托具有待筛選 特徵如此’已知惡意程式出現待_選特徵^的機率, 實質上通大於已知非惡意程式出現待篩 率。待;選特徵打為已知惡意程式所具有的特徵: 程度很阿亦即,待篩選特徵F2實質上主4 择式所具有的特徵。因此,第-特徵意 歸選特徵F2為一有效特徵Fe。 &早疋113決定待 類似地,舉例來說’僅有50個已知亞咅 符歸選特徵F3,而卻有個已知非惡具有 篩選特徵F3。如此,已知非惡意程式出現二阳具有待 的機率,實質上遠大於已知亞音浐 ,師忠特徵F3 的機率。㈣選特徵選特徵F3 嫁定租度很高。亦即,待篩選特徵U ^ 的特徵的 #恶意程式所具有的特徵。因此,第—特徵^要為已知 亦決定待筛選特徵F3為-有效特徵Fe。、*〜早70113 如此’第一特徵筛選單元113即 符_徵F1至㈣中,筛選出 辨方式,由N個 刀辨已知惡意程式 1358639Thus, it is known that the probability of the appearance of the characteristic F1 to be screened by the snoring mop I Λ ^ , 〜 is equivalent to the probability that the known non-malicious program appears to be the remaining feature F1. Feature to be filtered F1 13 13^8639 Ξ达号: TW3715PA The degree of certainty of the features known to be malicious private is very low, and the feature to be screened Fi is the degree of certainty of the characteristics of known non-malicious programs. Very low. That is to say, the feature of the teacher selection F1 is not a feature of the substantive formula, nor is it mainly a known non-malicious process; ^ and some special features of the feature screening unit ιΐ3 remove the feature to be screened ^ Not as a valid feature pe. In addition, for example, there are 500 known malicious programs pm having the feature F2 to be filtered, and only 2 known non-malicious programs have the feature to be filtered, so that the known malware appears to be selected. The probability, in fact, is greater than the known non-malicious program. The selection feature is characterized by a known malware: The degree is very high, that is, the feature to be screened F2 is essentially a feature of the main alternative. Therefore, the first feature means that the feature F2 is an effective feature Fe. & 113 decided to wait similarly, for example, 'only 50 known Aachen character selection features F3, but there is a known non-evil with screening feature F3. In this way, it is known that the non-malicious program has a chance to wait for the yang, which is substantially greater than the probability of the known sub-sounds and the loyalty characteristics F3. (4) The feature selection feature F3 has a high degree of rent. That is, the feature of the # malicious program of the feature of the feature U ^ to be filtered. Therefore, the first feature is also known to determine that the feature to be screened F3 is the effective feature Fe. , *~ early 70113 so that the first feature screening unit 113 is in the symbol _ sign F1 to (four), the screening method is selected, and the known malware is identified by N knives 1358639
三達編號:TW3715PA 與已知非惡意程式的有效特徵Fe。 在本發明實施例中,第一特徵篩選單元113係依據對 應每個待篩選特徵的篩選參數,判斷每個待篩選特徵是否 為一有效特徵。一個待篩選特徵所對應的篩選參數係相關 於此待篩選特徵為已知惡意程式與已知非惡意程式其中 之一類程式所具有的特徵之一確定程度。 舉例來說,在本發明實施例中,當欲決定待篩選特徵 F1至FN中之待篩選特徵Fi是否為有效特徵,第一特徵篩 選單元113係依據具有待篩選特徵Fi的已知惡意程式的 個數與具有待篩選特徵Fi的非惡意程式的個數產生對應 待筛選特徵Fi之筛選參數Pi (未繪示)。筛選參數Pi係 相關於待篩選特徵Fi為已知惡意程式與已知非惡意程式 其中之一類程式所具有的特徵之確定程度。其中,i為一 正整數,。 — 若篩選參數Pi高於一門檻值,表示待筛選特徵Fi為 已知惡意程式與已知非惡意程式其中之一類程式所具有 的特徵之確定程度係足夠高,待篩選特徵Fi係實質上主 要為已知惡意程式與已知非惡意程式其中之一類程式所 具有的特徵,特徵篩選單元113即決定待篩選特徵Fi為 有效特徵F e。 在本發明實施例中,第一特徵篩選單元113係計算每 個待篩選特徵之資訊增益(Information gain),作為每 第1式 個待篩選特徵所對應的篩選參數。 j Gain(S, Fi) = Info(S) - InfoFi (S) 15 1358639Sanda number: TW3715PA and the effective feature Fe of known non-malicious programs. In the embodiment of the present invention, the first feature screening unit 113 determines whether each feature to be selected is a valid feature according to a screening parameter corresponding to each feature to be selected. The screening parameter corresponding to a feature to be selected is related to the degree to which the feature to be screened is one of the characteristics of a known malware and one of the known non-malicious programs. For example, in the embodiment of the present invention, when it is determined whether the feature to be selected Fi in the features to be selected F1 to FN is a valid feature, the first feature screening unit 113 is based on a known malicious program having the feature Fi to be filtered. The number of non-malicious programs having the feature Fi to be filtered generates a screening parameter Pi (not shown) corresponding to the feature to be screened Fi. The screening parameter Pi is related to the feature to be screened, Fi, which is the degree of certainty of the features of the known malware and one of the known non-malicious programs. Where i is a positive integer. - If the screening parameter Pi is higher than a threshold, the degree of certainty of the feature to be filtered Fi is one of the known malicious programs and one of the known non-malicious programs. The degree of certainty of the feature to be screened is sufficiently high. The feature screening unit 113 determines that the feature to be screened Fi is a valid feature F e, which is mainly a feature of a known malicious program and one of the known non-malicious programs. In the embodiment of the present invention, the first feature screening unit 113 calculates the information gain of each feature to be filtered as the screening parameter corresponding to each feature to be selected. j Gain(S, Fi) = Info(S) - InfoFi (S) 15 1358639
三達編號·· TW3715PA 、,第1式係為本發明實施例中,待篩選特徵Fi之資訊 增益。其中,S為所有已知惡意程式p m與所有已知非惡意 程式Pb所成的集合。第!式的鄉)為上述集合 (Entropy),其數學描述如第2式所示。The ternary number TW3715PA and the first formula are the information gains of the feature Fi to be screened in the embodiment of the present invention. Where S is the set of all known malware p m and all known non-malicious programs Pb. The first! The township is the above set (Entropy), and its mathematical description is shown in the second formula.
InMS) = -Yip.\〇g^p) y=1 第2式InMS) = -Yip.\〇g^p) y=1 2nd
亞-二第/ί中’ J係等於1或等於2。A為在所有已知 …已知非惡意程式中,所有已知惡意程式職的比 :二2 =所有已知惡意與已知非惡意程式中,所有已知 非惡意程式所佔的比例。 ,另外’第1式中的/♦#)為待篩選特徵以的 1 數學描述如第3式所示。 ’、 娜·-第3式 在第3式中,k係等於〇或卜&為集合3中,具 有待筛選特徵Fl的已知惡意程式與已知非惡意程式所成、 =集合。〜為集合s中’不具有待_特徵Η的已知惡 思程式與已知非惡意程式所成的集合。因此,^為在; 有已知惡思與已知非惡意程式中,具有待筛選特徵Η的 已知惡,5程式與已知非惡意程式所佔的比例;而㈤ 私士 1 , |5| °惡思與已知非惡意程式中,不具有待_選特徵^ 的已知惡意程式與已知非惡意程式所㈣比例。 另外,第3式中的卿為集合%的烟,其數學 1358639The sub-JD / ί ' J system is equal to 1 or equal to 2. A is the ratio of all known malicious programs in all known...known non-malicious programs: 2 2 = the proportion of all known non-malicious programs among all known malicious and known non-malicious programs. Further, /♦# in the first formula is a mathematical description of the feature to be screened as shown in the third formula. ', 娜·- 3rd Formula In the 3rd formula, k is equal to 〇 or 卜 & is the set 3, the known malware with the feature F1 to be filtered and the known non-malicious program, = set. ~ is a collection of known malicious programs in the set s that do not have a feature to be _ and a known non-malicious program. Therefore, ^ is in; there are known evils and known non-malicious programs, the known evils with the features to be filtered, the proportion of 5 programs and known non-malicious programs; and (5) the privates 1, | 5| ° The ratio of known malware and known non-malicious programs (4) that do not have the feature to be selected in the malicious and known non-malware programs. In addition, the Qing in the third formula is a collection of smoke, its mathematics 1358639
三達編號·· TW37丨5PA 描述如第4式所示。其中,為具有待篩選特徵Fi 的已知惡意程式與已知非惡意程式所成的集合的熵;而 為不具有待篩選特徵Fi的已知惡意程式與已知非 惡意程式所成的集合的熵。 1 1〇§2(^7上·^) 第 4 式 /«=0 在第4式中,m係等於0或1。其中,對於/«V·;), 如%.表示在所有具有待篩選特徵Fi的已知惡意程式與已 知非惡意程式中,具有待篩選特徵Fi的已知惡意程式所 佔的比例;而表示在所有具有待篩選特徵Fi的已知 惡意程式與已知非惡意程式中,具有待篩選特徵Fi的已 知非惡意程式所佔的比例。同理,知X&f;)亦以相同方式得 到。 舉例來說,在1000個已知惡意程式Pm與1050個已 知非惡意程式Pb中,有300個已知惡意程式Pm具有待篩 選特徵F1,有320個已知非惡意程式Pb也具有待篩選特 徵F1。則2050個程式中共有620個程式具有待篩選特徵 Fi,有1430個程式不具有待篩選特徵Fi。1000個已知惡 意程式中有700個已知惡意程式不具有待篩選特徵Fi,而 1050個已知惡意程式中有730個已知惡意程式不具有待篩The three-digit number··TW37丨5PA description is shown in the fourth formula. Wherein, the entropy of the set of known malware and known non-malicious programs having the feature Fi to be filtered; and the set of known malware and known non-malicious programs not having the feature Fi to be filtered entropy. 1 1〇§2(^7上·^) 4th formula /«=0 In the 4th formula, m is equal to 0 or 1. Wherein, for /«V·;), such as %. indicates the proportion of known malicious programs having the feature Fi to be filtered among all known malicious programs and known non-mali programs having the feature Fi to be filtered; Represents the proportion of known non-malicious programs with the feature Fi to be filtered among all known malicious programs and known non-mali programs with feature Fi to be filtered. Similarly, X&f;) is also obtained in the same way. For example, among 1000 known malware Pm and 1050 known non-malicious programs Pb, there are 300 known malware Pm with feature F1 to be filtered, and 320 known non-malicious programs Pb also have to be screened. Feature F1. A total of 620 programs in the 2050 programs have the feature Fi to be filtered, and 1430 programs do not have the feature Fi to be selected. 700 known malicious programs in 1000 known malware do not have the feature Fi to be filtered, and 730 known malware in 1050 known malware do not have to be screened.
Adt V · L LL r ,1000 , 1000 1050 , 1050、 選特徵 F 1 。女口 it匕,Info(S) = -(-log2-+-log2-) < J J 2050 2 2050 2050 2 2050 r , 620 300, 300 320, 320、1430,700 , 700 730 , InfoFi (S) = (— log — + — log —) + (77^7 log + 77^7 log ,而 730 ^2050 620 620 620 °620 2050 1430 1430 1430 1430 ))。如 17 1358639Adt V · L LL r , 1000 , 1000 1050 , 1050, feature F 1 . Female mouth it匕, Info(S) = -(-log2-+-log2-) < JJ 2050 2 2050 2050 2 2050 r , 620 300, 300 320, 320, 1430, 700 , 700 730 , InfoFi (S) = (- log — + — log —) + (77^7 log + 77^7 log , and 730 ^2050 620 620 620 °620 2050 1430 1430 1430 1430 )). Such as 17 1358639
三達編號:TW3715PA 此,即得到待篩選特徵Fi的資訊增益,作為其篩 選參數Pi。 4 由於/_f,〇s)係為待篩選特徵Fi的熵,當待篩選特徵 Fi的熵越大,表示待篩選特徵Fi的資料混亂程度越高。 即表示已知惡意程式出現待篩選特徵Fi的機率與已知非 惡意程式出現待篩選特徵Fi的機率越相近。如此,待篩 選特徵Fi的資訊增益越低。待篩選特徵Fi為已知惡意程 I 式所具有的特徵的確定程度很低,且待篩選特徵Fi為已 知非惡意程式所具有的特徵的確定程度也很低。 因此,在本發明實施例中,當待篩選特徵Fi的資訊 增益低於一門檻值時,第一特徵篩選單元113即剔除待篩 選特徵Fi,不作為有效特徵Fe。 反之,當待篩選特徵Fi的熵越大,表示已知惡意程 式出現待篩選特徵Fi的機率與已知非惡意程式出現待篩 選特徵Fi的機率差距越大。如此,待篩選特徵Fi的資訊 φ 增益越高。如此,對於待篩選特徵Fi,以下兩情況只有其 一會成立。第一種情況是待篩選特徵Fi為已知惡意程式 所具有的特徵的球定程度很南’亦即’待師選特徵F i貫 質上主要為已知惡意程式所具有的特徵。第二種情況是待 篩選特徵Fi為已知非惡意程式所具有的特徵的確定程度 很高,亦即,待篩選特徵Fi實質上主要為已知非惡意程 式所具有的特徵。 因此,當待篩選特徵Fi的資訊增益高於門檻值時, 第一特徵篩選單元113即決定待篩選特徵Fi為有效特徵 18Sanda number: TW3715PA This is the information gain of the feature Fi to be filtered as its screening parameter Pi. 4 Since /_f, 〇s) is the entropy of the feature Fi to be filtered, when the entropy of the feature Fi to be screened is larger, the degree of data confusion of the feature Fi to be screened is higher. That is to say, the probability that a known malicious program has a feature to be filtered Fi is similar to the probability that a known non-malicious program has a feature to be screened Fi. Thus, the lower the information gain of the feature Fi to be screened. The feature to be screened Fi is a known malicious program. The degree of certainty of the feature is very low, and the degree of certainty that the feature to be screened Fi is known to be a non-malicious program is also low. Therefore, in the embodiment of the present invention, when the information gain of the feature to be screened Fi is lower than a threshold, the first feature screening unit 113 rejects the feature to be screened, Fi, and does not serve as the effective feature Fe. On the other hand, when the entropy of the feature to be screened Fi is larger, the probability that the known malicious program has a feature to be screened Fi is larger than the probability that a known non-malicious program appears to be screened for the feature Fi. Thus, the information φ gain of the feature Fi to be filtered is higher. Thus, for the feature Fi to be screened, only the following two cases will be established. The first case is that the feature to be screened Fi is a feature of a known malware that is very south. That is, the feature to be selected is mainly a feature of a known malware. The second case is that the feature to be screened Fi is a highly deterministic feature of a known non-malicious program, i.e., the feature to be screened Fi is essentially a feature of a known non-malicious program. Therefore, when the information gain of the feature to be screened Fi is higher than the threshold, the first feature screening unit 113 determines that the feature to be selected Fi is a valid feature.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW96138249A TWI358639B (en) | 2007-10-12 | 2007-10-12 | Malware detection system, data mining module, malw |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW96138249A TWI358639B (en) | 2007-10-12 | 2007-10-12 | Malware detection system, data mining module, malw |
Publications (2)
Publication Number | Publication Date |
---|---|
TW200917020A TW200917020A (en) | 2009-04-16 |
TWI358639B true TWI358639B (en) | 2012-02-21 |
Family
ID=44726250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW96138249A TWI358639B (en) | 2007-10-12 | 2007-10-12 | Malware detection system, data mining module, malw |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI358639B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI404374B (en) * | 2009-12-11 | 2013-08-01 | Univ Nat Taiwan Science Tech | Method for training classifier for detecting web spam |
US9158919B2 (en) * | 2011-06-13 | 2015-10-13 | Microsoft Technology Licensing, Llc | Threat level assessment of applications |
TWI461952B (en) * | 2012-12-26 | 2014-11-21 | Univ Nat Taiwan Science Tech | Method and system for detecting malware applications |
TWI515598B (en) * | 2013-08-23 | 2016-01-01 | 國立交通大學 | Method of generating distillation malware program, method of detecting malware program and system thereof |
-
2007
- 2007-10-12 TW TW96138249A patent/TWI358639B/en not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
TW200917020A (en) | 2009-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103839003B (en) | Malicious file detection method and device | |
Ye et al. | CIMDS: adapting postprocessing techniques of associative classification for malware detection | |
US9348998B2 (en) | System and methods for detecting harmful files of different formats in virtual environments | |
US9419996B2 (en) | Detection and prevention for malicious threats | |
CN101986324B (en) | Asynchronous processing of events for malware detection | |
JP5281717B2 (en) | Using file prevalence in behavioral heuristic notification of aggression | |
US8108931B1 (en) | Method and apparatus for identifying invariants to detect software tampering | |
CN102841999B (en) | A kind of file method and a device for detecting macro virus | |
US20150172303A1 (en) | Malware Detection and Identification | |
US20120174227A1 (en) | System and Method for Detecting Unknown Malware | |
CN107679403B (en) | Lesso software variety detection method based on sequence comparison algorithm | |
CN102034043A (en) | Novel file-static-structure-attribute-based malware detection method | |
US20190188381A9 (en) | Machine learning model for malware dynamic analysis | |
KR101851233B1 (en) | Apparatus and method for detection of malicious threats included in file, recording medium thereof | |
CN109271780A (en) | Method, system and the computer-readable medium of machine learning malware detection model | |
TW201712586A (en) | Method and system for analyzing malicious code, data processing apparatus and electronic apparatus | |
JP6711000B2 (en) | Information processing apparatus, virus detection method, and program | |
KR101132197B1 (en) | Apparatus and Method for Automatically Discriminating Malicious Code | |
US9152791B1 (en) | Removal of fake anti-virus software | |
TWI358639B (en) | Malware detection system, data mining module, malw | |
WO2008098519A1 (en) | A computer protection method based on a program behavior analysis | |
Darshan et al. | Windows malware detection based on cuckoo sandbox generated report using machine learning algorithm | |
Jang et al. | Mal-netminer: malware classification based on social network analysis of call graph | |
CN104504334B (en) | System and method for assessing classifying rules selectivity | |
CN108959930A (en) | Malice PDF detection method, system, data storage device and detection program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |